Skip to content

I am working on this web scraper for zoocasa and needed some help in getting it to run on multiple pages. Currently I am generating and rotating random proxies as well as changing user agents on each page however I am only able to get the first page of data returned. Any suggestions here are highly welcomed!

Notifications You must be signed in to change notification settings

mikeg5/zoocasa_real_estate_web_scraper

Repository files navigation

zoocasa_real_estate_web_scraper

I am working on this web scraper for zoocasa.com and needed some help in getting it to run over multiple pages. There are 2 scripts here, one to scrape the links for the homes/condos (Scrape Toronto Housing Data Links) and one to gather the data from those links and clean it (Scrape Data From Links). Currently, I am generating and rotating random proxies as well as changing user agents on each page, however, I am only able to get the first page of data returned. I am using random crawl rates between 3 and 5 seconds which was suggested in the robots.txt. I am not being blocked by the site but I am always getting duplicates in my dataframe. Any suggestions here are highly welcomed!

About

I am working on this web scraper for zoocasa and needed some help in getting it to run on multiple pages. Currently I am generating and rotating random proxies as well as changing user agents on each page however I am only able to get the first page of data returned. Any suggestions here are highly welcomed!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published