pubmed-eponyms

This is a collection of Python scripts for searching pubmed using BioPython and working with eponymous terms.

This project serves to deposit the code used in the manuscript indicated below. In addition to being of interest to those studying medical eponyms, it should also be of general use for anyone looking to develop software for automatically searching Pubmed / Medline. The script pubmed_search_to_csv.py provides a good example of how to use BioPython's Entrez.esearch and Entrez.efetch to search Pubmed and return search results even when they exceed the NCBI's Entrez eutils built-in limits. While the eutils API can be used directly, BioPython greatly simplifies the tedious aspects of making http requests (such as throttling, re-attempting and error handling) and is highly recommended for this task.

Citing

In addition to citing this GitHub repository (https://github.com/cornish/pubmed-eponyms), please cite the following paper:

Toby C. Cornish, Larry J. Kricka, and Jason Y. Park. A Biopython-based method for comprehensively searching for eponyms in Pubmed. MethodsX. 2021; vol 8. doi: 10.1016/j.mex.2021.101264.

License

Gnu Public License v3, see text of the full license in project.

Dependencies:

Python 3.6 and up
BioPython

Script files

rebase_terms.py
permute_terms.py
pubmed_search_to_csv.py
remove_pmid_dupes.py
pubmed_journals_by_year.py

A diagrammatic representation of data flow indicating the scripts used in the process. Please see individual scripts for usage and details.

An example of the term permutations created by permute_terms.py for terms with zero, one, and two separate names.

Configuration file:

config.ini
- This is an INI-style configuration file where the scripts will look for Entrez-related credentials including your email and API key
- An API key for the e-utilities is not required at the time this was written, but may be in the future; currently it permits more requests per second to Entrez
- See here for more information about API keys for NCBI's E-utilities

Data files:

gastrointestinal eponyms.txt
- This is the original list of terms collected from review articles:
  - Kanne JP, Rohrmann CA, Lichtenstein JE. Eponyms in radiology of the digestive tract: historical perspectives and imaging appearances. Part I. Pharynx, esophagus, stomach, and intestine. Radiographics. 26(1) (2006) 129-42.
  - Kanne JP, Rohrmann CA, Lichtenstein JE. Eponyms in radiology of the digestive tract: historical perspectives and imaging appearances. Part 2. Liver, biliary system, pancreas, peritoneum, and systemic disease. Radiographics. 26(2) (2006) 465-80.
gi_eponyms_split.csv
- This is the original list with terms split into Name(s) and Term fields; multiple name eponyms should be separated by by hyphens to distinguish them from last names with internal spaces (i.e. "Van Slyke")
- Input to rebase_terms.py
data/terms_re-base.csv
- Output of the rebase_terms.py script
- Input to permute_terms.py
- Standardized version of base names including removal of possessives and use of hyphens for multiple names
- Version of the data from the paper
data/terms_permuted.csv
- Output of the permute_terms.py script
- Input to pubmed_search_to_csv.py
- Permutations of terms to include possesives, various forms of joining multiple names, and inversions
- See examples above
- Version of the data from the paper
data/term_results.csv
- Output of the pubmed_search_to_csv.py script
- Summarizes pubmed search results for all terms (including terms with no results)
- One row per term permutation
- Version of the data from the paper
data/pmid_results.csv
- Output of the pubmed_search_to_csv.py script
- Input to remove_pmid_dupes.py
- Input to pubmed_journals_by_year.py
- Pubmed search results for all terms with hits
- One row per PMID
- Version of the data from the paper
data/pmid_results - dupes removed.csv
- Output of the remove_pmid_dupes.py script
- Duplicate PMIDs removed within base terms
- Version of the data from the paper
data/journal_counts.csv
- Output of the pubmed_journals_by_year.py script
- A matrix of total publications per year for all journals for which we have hits
- Ranges from the earliest year with hits to the latest year with hits
- Version of the data from the paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pubmed-eponyms

Citing

License

Dependencies:

Script files

A diagrammatic representation of data flow indicating the scripts used in the process. Please see individual scripts for usage and details.

An example of the term permutations created by permute_terms.py for terms with zero, one, and two separate names.

Configuration file:

Data files:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data flow diagram		data flow diagram
data		data
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
permute_terms.py		permute_terms.py
pubmed_journals_by_year.py		pubmed_journals_by_year.py
pubmed_search_to_csv.py		pubmed_search_to_csv.py
rebase_terms.py		rebase_terms.py
remove_pmid_dupes.py		remove_pmid_dupes.py

License

cornish/pubmed-eponyms

Folders and files

Latest commit

History

Repository files navigation

pubmed-eponyms

Citing

License

Dependencies:

Script files

A diagrammatic representation of data flow indicating the scripts used in the process. Please see individual scripts for usage and details.

An example of the term permutations created by permute_terms.py for terms with zero, one, and two separate names.

Configuration file:

Data files:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages