GitHub - faezetta/VMM-Scraper: Crawler for Craigslist vehicle ads in all U.S. states

Python crawler for Craigslist vehicle ads in all U.S. states using Scrapy

Prerequisities

To modify the code for your own purpose:

items.py includes the items you want to parse in each page. Current version includes the ad URL, posting title, date, location, images and price of the vehicle.
runVMMR.py defines the initial URL, how to navigate the pages or follow links and extract and parse the fields defined above for the scraper. In the current setting, only the first 5 pages showing up in search results are visited.
settings.py defines the directory where you want to save the images and parsed results. You can modify the download delay based on your project. It makes use of different middleware available online such as this and this.
urls.txt includes the list craigslist URLS for different US. Cities that the scraper goes through.

To run the scraper, you just need to run the spider:

$ scrapy crawl craigstlistDemo

The output is a csv file with the posts found including the target fields.

Beware of IP ban from craigslist.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
craigslist_VMMR		craigslist_VMMR
README.md		README.md
scrapy.cfg		scrapy.cfg
urls.txt		urls.txt