To use the article metrics APIs you need to register an account to the PLOSone website (http://journals.plos.org/plosone/user/secure/login?page=%2Fplosone%2F) and get an alphanumeric API_KEY (a confirmation email will be sent to the email chosen for the registration). Go to the website http://alm.plos.org/ to get the apikey with the credential you registered on the main plos.org website.
The basics of each request in the alm metrics is the id of each paper i.e.
0001543
points to the article: http://journals.plos.org/plosone/article?id=info%3Adoi%2F10.1371%2Fjournal.pone.0001543
a request to the API must contain the call to this article_id plus the apikey parameter:
here the parameters to pass to the requests module are:
api_key=YOUR_API_KEY info=detail ids=10.1371%2Fjournal.pone.0001543
here the code for the function:
#!/usr/bin/python
import requests
import sys
import json
import pickle
import re
import time
import numpy as np
def getPage(ids,API_KEY):
baseurl = "http://alm.plos.org/api/v5/articles"
reqparam = { 'ids' : '10.1371%s2Fjournal.pone.%s'%("%",ids), 'info' : 'detail' ,'api_key' : API_KEY}
r = requests.get(baseurl, params = reqparam)
return r.text
The code returns a json body that cam be easily interpreted using the json, however as the number of internal keys is huge and the structure extremely nested we can flatten the structure with the recursive function:
def flatten_json(y): """ hacked to accept unicode """ out = {} def flatten(x, name=''): if type(x) is dict: for a in x: flatten(x[a], name + a + '_') elif type(x) is list: i = 0 for a in x: flatten(a, name + unicode(i) + '_') i += 1 else: out[str(name[:-1])] = unicode(x) flatten(y) return out
notice the recursion of the flatten calls. Another issue is the use of the "unicode" statement to force the type of several entries in the json file (the UTF-8 chars appear often in the names of the authors).
To extract the final entries we are interested in we can use the regular expressions:
def getSubjects(jsout): subjects = [] flat = flatten_json(js) for k,v in flat.items(): if(re.match(".*subject_area$",k)): #print k,v subjects.append(v) return subjects
here for instance we are interested in the subject_area entry of the file. The flatten structure has the form "X.Y.Z..." (dotted notation).
Finally the main of the python script is
## main op = open("PaperCategories.csv","wb") idss = ['0001543', '0060899','0090705'] ## a list of ids API_KEY = "YOUR_API_KEY" ## get one registering in plosone for ids in idss: jsout = getPage(ids,API_KEY) ### original call js = json.loads(jsout) subjects = getSubjects(jsout) stout = '"%s";%s'%(ids,'"' + ",".join(subjects) + '"' ) + "\n" op.write(stout) print (ids, subjects) rndtime = np.random.uniform(1.,2.) time.sleep(rndtime)
notice here that in the for loop we put a sleep to avoid overload the servers of PLOSone.