spider-projects

a collection of all personal web crawler projects

settings.py and middlewares.py have been improved to enable rotating proxy and random user agent functionality

Automated Data Mining and Related Impacts on the Real Estate Industry

scrapes HomeFinder, Realtor and Homes sites for real estate listing information by City and State terms
aggregates data to one master list, joined on full address and preserving sources
accesses website's structured json response instead of referencing html

see homefinder_spider.py for spider code
see homefinder_data.json for its sample output
see realtor_spider.py for spider code
see realtor_data.json for its sample output
see homes_spider.py for spider code
see homes_data.json for its sample output
see merge_data.py for data aggregator code
see master_list.csv for its sample output

Steam spider

scrapes from the Steam Top Sellers list and outputs curated deals (under $10) in an email to the user

see prices_spider.py for spider code
see scrape_send.py for emailer code with AWS SES

sample email output:

HackerNews spider

scrapes HackerNews article titles, source links, and upvote points

uses pagination to access subsequent articles pages
see hackernews_spider.py for spider code
see news_data.json for sample output in json

sample news output:

Amazon spider

scrapes Amazon market results by search term.
user can provide category= <some-search-term> in cmd line to scrape that term's results

see amazon_spider.py for spider code
see amazon_data.json for sample output in json

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
common_user_agents		common_user_agents
forfun		forfun
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spider-projects

Automated Data Mining and Related Impacts on the Real Estate Industry

Steam spider

sample email output:

HackerNews spider

sample news output:

Amazon spider

sample search results output:

Credits

About

Releases

Packages

Languages

weefatboi/spider-projects

Folders and files

Latest commit

History

Repository files navigation

spider-projects

Automated Data Mining and Related Impacts on the Real Estate Industry

Steam spider

sample email output:

HackerNews spider

sample news output:

Amazon spider

sample search results output:

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages