Goodreads Scraper and Interest Predictor

This repository includes code for both a Goodreads scraper and a jupyter notebook that performs EDA and modeling using the dataset collected during web scraping. BeautifulSoup4 and Selenium were used for scraping and sklearn was used to create and experiment with linear regression models. These highly interpretable models are then analyzed and feature coefficient variances are plotted in order to observe the most significant user interactions on Goodreads. This code is meant to accompany my blog post, What are the best predictors for interest in a book on Goodreads?.

Getting Started

Use the .py scraper script in order to scrape Goodreads (using your own API key) and load up the jupyter notebook for example analysis and modeling.

Prerequisites

Install all libraries listed under the Built With section. Part of the scrape code requires a Goodreads login to gather Goodreads' book statistics.

Running the tests

Follow the jupyter notebook in order to redo the original processing pipeline.

Built With

Python 3.7
BeautifulSoup4
Selenium
SKlearn
Seaborn

Versioning

Version 1.0

Current Concerns/Issues

Some of the scrape code needs to be debugged and additional features can be added.

Authors

Dan Roth - Initial work

Acknowledgments

Thanks to my teachers and classmates at Metis.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
plots		plots
README.md		README.md
goodread_model.ipynb		goodread_model.ipynb
goodread_scrape.py		goodread_scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goodreads Scraper and Interest Predictor

Getting Started

Prerequisites

Running the tests

Built With

Versioning

Current Concerns/Issues

Authors

Acknowledgments

About

Releases

Packages

Languages

danroth-nyt/goodreads_scrape_predict

Folders and files

Latest commit

History

Repository files navigation

Goodreads Scraper and Interest Predictor

Getting Started

Prerequisites

Running the tests

Built With

Versioning

Current Concerns/Issues

Authors

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages