An application to catalog and measure NIH IRP publications.
Assuming that the user is interested in updating these measures in subsequent years, follow these steps:
-
Check the authors in
data/raw/au_ids.txt
to ensure all PIs are represented. [Note1: we have discovered that some of the EIDs in this file returned data from investigators not in the IRP] -
Obtain a listing of the previously used publications in this analysis (henceforth: old publications. See complete files at
/data/complete
). -
Use the
scopus_search.py
script to obtain all the papers listed in scopus since a given date (henceforth: new publications). This script takes as input a list of investigators, and a cutoff date. An example output from this script is located at/data/interim/output_file.csv
. [Note2: we have found that this function still returns some papers that precede the cutoff date] -
Clean up the new publications with the
01.0-TAR-irp_pubcount.ipynb
notebook (described further below), and distribute to the investigators so they can mark which papers of theirs used the scanners. here is a version of this spreadsheet from 2019. -
Use the
update_citation_counts.py
script to update the number of citations for the old publications. This script takes as input the file produced in the previous year, and pulls all the unique EID (paper identifiers), updating the citation counts for those articles. We have experienced EIDs that change a bit year-to-year (on the order of one or two percent). Consequently, this script will print (and write to disk) EIDs that are missed. It is up to the user to go through an manually correct these by, for instance, creating a new csv file with a column labeledEID
and using this as input for a second run ofupdate_citation_counts.py
. The 2020 update from the 2019 papers is found atdata/interim/2019_complete_update_2020.csv
. -
Download the completed tabulation of which new publications used the scanners from the google doc. The csv from 2020 is found at
/data/interim/New Publications from FMRIF Investigators Aug 2020 - 2020_new_papers
. -
The notebook
01.0-TAR-irp_pubcount.ipynb
aggregates the data from the previous steps and generates a paragraph that describes the "productivity" of the reseaerch group as a whole. It also writes out a complete listing of papers for use next year. This notebook will rely on a PI <-> IC linking file (/data/raw/investigator_ics.csv
)
There have been uneven historical attempts to remove editorials, reviews, errata, and the like. The above does not attempt to describe how that might happen.
Search SCOPUS for publications given a list of AU-IDs.
usage: scopus_search.py [-h] --ids idfile --out output --start startdate
Query Scopus Search API and output to csv file.
optional arguments:
-h, --help show this help message and exit
--ids idfile Scopus search file
--out output Output CSV filename
--start startdate Start date for query (YYYYMMDD).
- Export your
SCOPUS_API_KEY
environment variable:
export SCOPUS_API_KEY=1234567890abcdef1234567890abcdef
- Prepare a list of AU-IDs to query. These should go in a text file, one
AU-ID per line. Comments are optional. Example
au-ids.txt
:
# Alice
0123456789
# Bob
1234567890
- Run scopus_search.py. First argument is the file containing the AU-IDs and the second argument is the output csv file:
python scopus_search.py --ids au-ids.txt --out output.csv --start 20160807
Given a CSV file with an EID for each publication, write to a CSV file with updated citation counts for each publication.
usage: update_citation_counts.py [-h] --input csvfile --output outfile
Take CSV file with EID column and output CSV file with updated citation
counts.
optional arguments:
-h, --help show this help message and exit
--input csvfile Citation count csv filename
--output outfile Output CSV filename
- Export your
SCOPUS_API_KEY
environment variable:
export SCOPUS_API_KEY=1234567890abcdef1234567890abcdef
- Run the update_citation_counts.py:
python update_citation_counts.py --input old_publist.csv --output updated_publist.csv