A Python script that reads Craigslist listings for a provided search URL. If a new search result is found then an email is sent with the posting image, price, and link to the posting.
Two versions of the script are available; one for running on unix and one for running in Docker.
- Docker Installation
- Manual Installation
- Query Configuration
- Dev Environment
- Project Structure
- Further Improvements
Clone the repo:
git clone https://github.com/michaelpappas/craigslist-scraper-emailer
cd craigslist-scraper-emailer
Copy the sample-docker-compose.yml file to docker-compose.yml and fill in the environment variables.
cp sample-docker-compose.yml docker-compose.yml
# open .env and modify the environment variables
Run
docker compose build
Note: this will take a long time. Firefox alone takes about 10 minutes to install on my 2019 macbook pro.
Once done you can run
docker compose up
Confirm that the containers are running and navigate to localhost:8000 to get to the main page. Here you can add, remove, activate, and deactivate search queries.
By default the frequency the script executes is every 2 minutes. You can change this by modifying the cronjob file in the dockerscraper directory before building the containers or by editing the crontab in the container. You can access the container crontab using
docker exec -it craigslist_scraper crontab -e
Clone the repo:
git clone https://github.com/michaelpappas/craigslist-scraper-emailer
cd craigslist-scraper-emailer
Set the environment variables:
touch .env
# open .env and modify the environment variables
or
cp .env.example .env
# open .env and modify the environment variables
SECRET_KEY - Choose any string
DATABASE_URL - Replace the {postgres username} and {postgres password} with your personal postgres username and password.
More info regarding configuring postgres on a raspberry pi can be found here
The email environment variables are configured to work with a gmail account as the sending email address configured to use an app password. More info on configuring Gmail to work with an app password can be seen here.
You will need also sender email address, sender email address app password, and a recipient email address.
To find a search query, search to something on Craigslist, configure the view to "list" and copy the url. Your URL should end with 1~list~0~0
when correctly configured.
Paste this url into the search url field on the flask app and give the search query a unique name.
Once the search query has been successfully added you can toggle the query active/inactive.
The scraper will only search active search queries.
In the cloned directory create a virtual environment
python3 -m venv venv
Activate that venv
source venv/bin/activate
Install the requirements
pip3 install -r requirements.txt
chromium-chromdriver
sudo apt-get install chromium-chromedriver
xvfb
sudo apt-get install xvfb
Once all of the dependencies and packages have been installed you can now seed the database with the two tables.
python3 seed.py
If this returns no errors then the database has been correctly seeded.
Start the flask server with:
flask run -p 5000
Navigate to localhost:5000 where you can start configuring search queries to scrape.
To test the script you can run:
python3 scraperPi.py
#or
python3 scraper.py
You should receive an email once the script has finished running if new search results are found.
Caution! It is likely that Craigslist will soft ban your IP if you run this too frequently. It is recommended that you route your traffic through a VPN. I used Openvpn with Surfshark and have it configured to start on boot on the RaspberryPi. Info on how to configure Openvpn to start on boot with systemctl can be found here Look for the answer dated March 30, 2017.
To run the script you will need to create a CRONjon.
Example CRONjob
*/2 * * * * [path_to_venv]/bin/python3 [path_to_directory]/scraperPi.py
# this script will run every 2 minutes forever
You will also need to change permissions to execute the file.
sudo chmod -x [filepath]
more info about setting up CRONjobs can be found here.
\ # project directory
|--.env.example # example environment variables
|--craigslist_scraper.py # main script for OSX
|--craigslist_scraperPi.py # main script for Raspberry Pi
- Write tests for scraper logic and for Flask app
- Make page to see list of search results for a given query.