retrieve github pull requests in JSON

The following python function returns a map associating each pull request number to its JSON description for the given repo. The OAuth token is needed so github will allow more requests to be processed during a given time frame. The result is cached in a file and refreshed every 24 hours.

import urllib2
import json
import re
import os
import time

def get_pull_requests(repo, token):
    pulls_file = "/tmp/pulls.json"
    if ( not os.access(pulls_file, 0) or
         time.time() - os.stat(pulls_file).st_mtime > 24 * 60 * 60 ):
        pulls = {}
        url = ("" + repo +
               "/pulls?state=all&access_token=" + token )
        while url:
            github = urllib2.Request(url=url)
            f = urllib2.urlopen(github)
            for pull in json.loads(
                pulls[pull['number']] = pull
            url = None
            for link in['Link'].split(','):
                if 'rel="next"' in link:
                    m ='<(.*)>', link)
                    if m:
                        url =
        with open(pulls_file, 'w') as f:
            json.dump(pulls, f)
        with open(pulls_file, 'r') as f:
            pulls = json.load(f)
    return pulls

For instance

pulls = get_pull_requests('ceph/ceph', '64933d355fda984108b4aad2c5cd4c4a224aad')

The same pagination logic applies to all API calls (see Web Linking RFC 5988 for more information) and parsing could use the LinkHeader module instead of rudimentary regexp parsing.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>