Skip to content

Get rid of laborious manual file management in scientific literature collection. Enter the DOI, receive the paper details and PDF !

License

Notifications You must be signed in to change notification settings

dietschleo/Literature-Review-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Populate Excel with DOI Information

Here is to assist with your next literature review ! This script retrieves article information from Google Scholar using a DOI and tracks the information in an Excel file. It also saves the PDF of the article if available : No more laborious manual file management !

Features

  • Validate DOI and fetch article metadata from Google Scholar.
  • Save metadata into an Excel file (papers.xlsx).
  • Avoid duplicate entries in the Excel file.
  • Download PDFs of articles into a pdfs folder, with filenames formatted as year_authors_title.pdf.

Prerequisites

The script requires the following Python packages:

  • scholarly
  • requests
  • pandas
  • openpyxl

If these are not installed, you can run the provided example script to check and install missing dependencies.

How to Use

  1. Clone the repository or copy the script into your Python environment.
  2. Run the script in your terminal or IDE.
  3. When prompted, enter a DOI or type escape to exit.

Example Input/Output

  • Enter DOI: 10.1257/jel.41.3.788
  • The script will retrieve the article information, save it in papers.xlsx, and download the PDF (if available).
  • Type escape or press Enter on an empty input to stop the program.

Installation and Dependencies

To ensure you have all required packages installed, you can use the following script:

import subprocess
import sys

def check_and_install(package):
    try:
        __import__(package)
        print(f"{package} is already installed.")
    except ImportError:
        print(f"{package} is not installed. Installing...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

required_packages = ["scholarly", "requests", "pandas", "openpyxl"]

for package in required_packages:
    check_and_install(package)

Run the above script before using the main program to ensure all dependencies are available.

Code Example

The file can be run as is, simply press 'Enter' or type 'escape' to leave the console and exit the program. It can also be modified pretty simply to potentially retrieve a list of DOI from a csv file.

from scholarly import scholarly
import re
import os
import requests
import pandas as pd

# Example DOI list, could also be read from csv file
dois = [
    "10.1257/jel.41.3.788",
    "10.1016/j.jfineco.2020.04.003"
]

populate_excel_from_dois(dois)

Directory Structure

The script will create and manage the following files and folders:

  • papers.xlsx: The Excel file where metadata is stored.
  • pdfs/: A folder where downloaded PDFs are stored.

Notes

  • The script ensures no duplicate DOIs are added to the Excel file.
  • PDF filenames are sanitized to prevent issues with invalid characters.

License

This project is licensed under the MIT License.

About

Get rid of laborious manual file management in scientific literature collection. Enter the DOI, receive the paper details and PDF !

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages