Skip to content

Latest commit

 

History

History
66 lines (41 loc) · 3.09 KB

README.md

File metadata and controls

66 lines (41 loc) · 3.09 KB

spider-projects

a collection of all personal web crawler projects

Automated Data Mining and Related Impacts on the Real Estate Industry

scrapes HomeFinder, Realtor and Homes sites for real estate listing information by City and State terms
aggregates data to one master list, joined on full address and preserving sources
accesses website's structured json response instead of referencing html

Steam spider

scrapes from the Steam Top Sellers list and outputs curated deals (under $10) in an email to the user

sample email output:

HackerNews spider

scrapes HackerNews article titles, source links, and upvote points

sample news output:

Amazon spider

scrapes Amazon market results by search term.
user can provide category= <some-search-term> in cmd line to scrape that term's results

sample search results output:

Credits

  1. scrapy-rotating-proxies
  2. scrapy-user-agents
  3. Scrapy
  4. AWS SES
  5. Postman