Jay Hickey

Technology, life, and fascinating web encounters.

Markdownify Instagram

I've been thinking for a while now about how nice it would be to automatically create a Markdown post of the Instagram photos I take. I looked around at Instagram's API's and, while I could probably set up a way to pull directly from the API, I wanted to throw something together quickly. I'm on my way to Maui for vacation, so I spent an evening making this handy little Python script that'll enable me to easily post my Instagram photos to jayhickey.com.

import os
import sys
import re 
from glob import glob
import urllib
from time import localtime, strftime

def read_file(file):
    # Check for filename for Instagram file from IFTTT
    f = open("%s" % (files), mode="r")
    fileLines = f.readlines()
    fileDict = {}

    # Create a dictionary with the Instagram info
    for line in fileLines:
        x = re.search(r'([\w\@\.]+)\s*:\s*(.*)', line)
        if x != None:
            fileDict[x.group(1)] = x.group(2)
    return fileDict

def create_draft(fileDict, draftLoc, imgLink):
    # Replace non alphanumerics with dashes for filename
    c = re.sub(r'[\t !"#$:;%&\'()*\-/<=>?@\[\\\]^_`{|},.]+',"-",fileDict['Caption'])
    while c.endswith('-'):
        c = c[:-1]
    c = c.lower()

    # Embed the photo with Markdown
    draft = open(draftLoc + "/%s.md" % (c), mode="w")
    draft.write(fileDict['Caption'] + '\n')
    draft.write("Link: %s" % (fileDict['URL']) + "\n")
    print imgLink
    draft.write("![%(1)s](%(2)s)\n\n" % {"1" : fileDict['Caption'], "2" : imgLink})
    draft.write("(Via [Instagram](http://instagram.com))")

if __name__ == '__main__':
    # These might not be used, so make them empty
    Local_Image_URL_Path = ''
    Website = ''

    # Read input arguments
    IFTTT_Read_Path = sys.argv[1]
    Draft_Write_Path = sys.argv[2]

    # These parameters are optional
    if len(sys.argv) >= 4:
        Local_Image_URL_Path = sys.argv[3]
    if len(sys.argv) == 5:
        Website = sys.argv[4]

    # Make sure the file is a text file from Instagram
    fileList = glob(IFTTT_Read_Path + '*instagr.am*.txt')

    for files in fileList:

        # Read the Instagram data
        fileDict = read_file(files)

        if Local_Image_URL_Path != '' or Website != '':
            # Make a local copy of the image and date it
            image = urllib.URLopener()
            eventTime = strftime("%Y-%m-%d_%H%M%S", localtime())
            fileName, fileExtension = os.path.splitext(fileDict['Source'])
            imgLinkPath = IFTTT_Read_Path + eventTime + fileExtension
            image.retrieve(fileDict['Source'], imgLinkPath)
            imgURL = Website + Local_Image_URL_Path + eventTime + fileExtension
            # Use the image hosted by Instagram
            imgURL = fileDict['Source']

        # Create a Markdown draft
        create_draft(fileDict, Draft_Write_Path, imgURL)

        # Delete the Instagram text file from IFTTT


Here's the input to run it (the last two parameters are optional):

python MarkdownifyInstagram.py {{IFTTT_Read_Path}}, {{Draft_Write_Path}}, {{Local_Image_URL_Path}}, {{Website}}


python MarkdownifyInstagram.py /home/blog/secondcrack/www/media/instagram/ /home/blog/Dropbox/Blog/drafts/ /media/instagram/ http://jayhickey.com

The downside to using IFTTT, according to their hilariously titled /wtf page, is that it only polls for new data every 15 minutes. So this won't happen instantly.2

GitHub is where you can find all the instructions for setting up the IFTTT recipe, running Markdownify Instagram, and even how to use a shell script and iNotify to automate the process. I won't go into any of that here. However, I do think it's interesting to dive a little deeper and see how the script works.


Here's a look at a few of the design decisions I made while writing Markdownify Instagram.

Reading the text file from IFTTT

When triggered, the IFTTT recipe will create a plain text file with a name like http-instagr.ampoefb-ihvv0.txt and these contents:

URL: http://instagram.com/p/JF1n/
Source: http://distillery.s3.amazonaws.com/media/2010/11/03/217c074328864f76b5d730837403f371_7.jpg
Caption: Deadly.

Python's built in glob module is first used to create a list of all the Instagram text files in your {{IFTTT_Read_Path}}:

fileList = glob.glob(IFTTT_Read_Path + '*instagr.am*.txt')

The * will match zero or more characters, so a file that starts with anything, contains instagr.am, ends with .txt, and has anything in between will be appended to fileList. This is the first time I've actually used the glob module. Although less powerful than regular expressions, it's perfect for finding specific files in directories.

Creating a dictionary from the data

After getting a list of files, read_file uses readlines() to split the text into individual lines. The regex pattern r'([\w\@\.]+)\s*:\s*(.*)' creates matchobject's out of those lines, then lastly the dictionary fileDict is created from the matches:

for line in fileLines:
    x = re.search(r'([\w\@\.]+)\s*:\s*(.*)', line)
    if x != None:
        fileDict[x.group(1)] = x.group(2)

Regex is hard to look at, but it's not too bad once you grasp the syntax. Google's regular expressions tutorial is a good place to start, and you can learn a whole lot more just by searching the web. Each set of parenthesis corresponds to a group match that will be stored in an additional element of x. Because there are two sets of parenthesis, x will have a group size of 3. x.group(0) will contain all of line, x.group(1) will contain the text before the : (the first set of parenthesis in the regex), and x.group(2) will have everything after (the second set). Like this:

  • x.group(0) = 'URL: http://instagram.com/p/JF1n/'
  • x.group(1) = 'URL'
  • x.group(2) = 'http://instagram.com/p/JF1n/'

After the for loop completes, fileDict will look something like this

fileDict = {'URL':'http://instagram.com/p/JF1n/', 'Source':'http://distillery.s3.amazonaws.com/media/2010/11/03/217c074328864f76b5d730837403f371_7.jpg', 'Caption':'Deadly.'}

So printing the URL is as simple as typing print fileDict['URL'].

Saving the local image file

If the last two parameters are entered, your photo will be saved locally—so you aren't relying on Instagram's S3 hosting. If Facebook ever decides to shut down Instagram, you won't have a broken embedded image. To accomplish this, I used the Python urllib module to save an image from a URL by setting image = urllib.URLopener(), then

image.retrieve(fileDict['Source'], imgLinkPath)

where fileDict['Source'] is the link to the Instagram image and imgLinkPath is where the image saved will be saved, e.g., /home/blog/secondcrack/www/media/instagram/2012-11-30_062759.jpg

The local image is named using strftime("%Y-%m-%d_%H%M%S", localtime()). This creates a string formatted something like 2012-11-30_062759, where:

  • %Y is the year with century as a decimal (use %y for without century, resulting in 12 instead of 2012)
  • %m is the month from [01-12]
  • %d is the day of the month from [01-31]
  • %H is the hour in 24-hour time (use %I for 12-hour time)
  • %M is the minute from [00-59]
  • %S is the second from [00-61]1

You can see the see the other strftime directives here

I've been using this convention ever since Gabe over at MacDrifter recommended it. It's very easy to both sort and quickly look at. I love it so much I even set this as a TextExpander shortcut for naming files and photos on my Mac:


Creating the Markdown draft

The final function, create_draft, is pretty straightforward and will create a Markdown file that looks like this:

type: Link

(Via [Instagram](http://instagram.com))


Overall, I'm pretty happy with the way Markdownify Instagram turned out. It produces a nice, simple post that's easy to modify.3 There's also tons of possiblities for new features and expansions. I find it extremely useful, so get prepared for an influx of pictures posted here. Aloha!

Update: I've made a small addition to the script's create_draft function. Non-alphanumerics are now replaced with dashes in the filename. So any spaces or special characters will be turned into dashes for the .md slug.

I used a little more regex magic to make the substitution. Here's what it looks like:

 # Repace non-alphanumerics with dashes for filename
    c = re.sub(r'[\t !"#$:;%&\'()*\-/<=>?@\[\\\]^_`{|},.]+',"-",fileDict['Caption'])
    while c.endswith('-'):
        c = c[:-1]
    c = c.lower()

Make sure to grab the latest commit off GitHub so you have this feature.

  1. Why [00-61] instead of [00-59]? To account for both leap seconds and the very rare double leap seconds, when they occur.  

  2. Although it would be nice if it were instant, I don't think this is a very big deal. 

  3. If you wanted, you could even tweak the create_draft function to include some additional html tags before and after the image for easy CSS tweaking. Make it look like a Polaroid or something.