WARNING

This scraper gets detected by the server and you will get bolcked.

DO NOT try on public networks to avoid blocking your IP.

Use on Google Colab is recommended. Feel free to make a copy of the same code available at this link for use.

Limitations

The server detects activities. Continuous scraping or scrapning data for an author with more than a couple of hundred publications throws erros. As a temporary solution, the partial results are saved into a json file and are later loaded to try and scrape the remaining data. The following solutions did not work:

Using sleep between requests.
Using proxies (due to connection challenges).

The returned HTML sometimes does not contain some of the details (marked with a #TODO comment). As of October, 2022, these details include the link to the publication, the paper description, and the link to the author's photo.

Google Scholar Scraper

If on Google Colab, mount your Google Drive.
Define your DATA_PATH variable.
Set the AUTHORID variable.
Create your author object using:

author_obj = create_author(AUTHORID)

Scrape the data about the author using:

author_obj.scrape()

To see if all publications details are retrieved, check:

author_obj.all_publications_extracted

If the previous step gives you False, run:

author_obj.scrape()

The data will be saved as json files in your DATA_PATH. If you are trying to re-scrape data on an author from scratch, destroy the json file named AUTHORID.json before creating the author_object again.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
google_scholar_scraper.ipynb		google_scholar_scraper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WARNING

Limitations

Google Scholar Scraper

About

Releases

Packages

Languages

License

mahtab-nejati/google-scholar-scraper

Folders and files

Latest commit

History

Repository files navigation

WARNING

Limitations

Google Scholar Scraper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages