Accessing the API of Plos One with Python requests

Lun, 03/21/2016 - 22:19 -- michelangelo.puliga
In this post we will show how to use the APIs of PLOSone article metrics to get the subject areas  of an article from the metrics : http://article-level-metrics.plos.org/ using basic python methods and libraries (requests,json,re).

To use the article metrics APIs you need to register an account to the PLOSone website (http://journals.plos.org/plosone/user/secure/login?page=%2Fplosone%2F) and get an alphanumeric API_KEY (a confirmation email will be sent to the email chosen for the registration). Go to the website http://alm.plos.org/ to get the apikey with the credential you registered on the main plos.org website.

The basics of each request in the alm metrics is the id of each paper i.e. 

0001543

points to the article: http://journals.plos.org/plosone/article?id=info%3Adoi%2F10.1371%2Fjournal.pone.0001543

a request to the API must contain the call to this article_id plus the apikey parameter:

http://alm.plos.org/api/v5/articles?api_key=YOUR_API_KEY&ids=10.1371%2Fjournal.pone.0001543&info=detail

here the parameters to pass to the requests module  are:

api_key=YOUR_API_KEY
info=detail
ids=10.1371%2Fjournal.pone.0001543

here the code for the function:

#!/usr/bin/python

import requests
import sys
import json
import pickle
import re
import time
import numpy as np

def getPage(ids,API_KEY):
    baseurl = "http://alm.plos.org/api/v5/articles" 
    reqparam = { 'ids' : '10.1371%s2Fjournal.pone.%s'%("%",ids), 'info' : 'detail' ,'api_key' : API_KEY}
    r = requests.get(baseurl, params = reqparam)
    return r.text

The code returns a json body that cam be easily interpreted using the json, however as the number of internal keys is huge and the structure extremely nested we can flatten the structure with the recursive function:

def flatten_json(y):
    """
    hacked to accept unicode
    """
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + unicode(i) + '_') 
                i += 1
        else:
            out[str(name[:-1])] = unicode(x)

    flatten(y)
    return out

notice the recursion of the flatten calls. Another issue is the use of the "unicode" statement to force the type of several entries in the json file (the UTF-8 chars appear often in the names of the authors).

To extract the final entries we are interested in we can use the regular expressions:

def getSubjects(jsout):
    subjects = []
    flat = flatten_json(js)
    for k,v in  flat.items():
        if(re.match(".*subject_area$",k)):
            #print k,v
            subjects.append(v)
    return subjects

 

here for instance we are interested in the subject_area entry of the file. The flatten structure has the form "X.Y.Z..." (dotted notation).

Finally the main of the python script is

## main

op = open("PaperCategories.csv","wb")
idss = ['0001543', '0060899','0090705'] ## a list of ids
API_KEY = "YOUR_API_KEY" ## get one registering in plosone 

for ids in idss:
    jsout = getPage(ids,API_KEY) ### original call
    js = json.loads(jsout)
    subjects = getSubjects(jsout)
    stout = '"%s";%s'%(ids,'"'  + ",".join(subjects) + '"'  ) + "\n"
    op.write(stout)
    print (ids, subjects)
    rndtime = np.random.uniform(1.,2.)
    time.sleep(rndtime)

notice here that in the for loop we put a sleep to avoid overload the servers of PLOSone.