Skip to content

A simple CLI tool to get everything you need from Craigslist

License

Notifications You must be signed in to change notification settings

gavink97/cl-search

Repository files navigation

cl search

Why

My interest in web scraping began in 2018 when I was desperate to buy a Modcan Dual Delay for my Eurorack collection and stumbled across WiggleHunt. Since then, I’ve found the utility of organizing used items for sale from across a variety of different websites genius. I built CL Search to solve this problem.

Features

  • Webdriver Agnostic: Supports Chrome, Chromium, Edge, Firefox, & Safari Webdriver

  • Supports all Craigslist locations + categories

  • Supports a variety of formats to export data

  • Supports headless mode in all browsers

  • Full SQLite 3 support

  • Download images

Installation

File tree

CWD
├── [cl_search]
├── images
│   ├── no_image.png
│   └── cl_images
├── foo_bar.json
├── location_search_query.csv
└── craigslist.db

Building from source

I recommend building from source with uv

gh repo clone gavink97/cl-search
uv venv
source .venv/bin/activate
uv pip install -e cl-search

Getting Started

Using the CLI

The cl search CLI is available at cl.

Here is an example of how you might search for iphones in Austin, Texas using a headless browser and exporting the results to an sql database.

Resulting database will be in your current working directory.

 cl -s iphone -L austin --headless -o sql

Changing your timezone

Modify to your preferred timezone in preferences.py

tz = os.environ.get("TZ", "US/Central")

Commands

Location

Location is a required flag

Supports URLs, City Names, States, Provinces, Countries, Continents, or Craigslist

-L or --location foo

Examples:

cl -L 'New York'

💡
Use Lower 48 to search thru the Contiguous US 🦅

Output

ℹ️
Default Output is CSV

Currently supporting a few different formats:

  • csv

  • json

  • excel[1]

  • sqlite 3

-o or --output foo

Examples:

Simply type in the name of the format

cl -L foo -o json

or just use the extension for ease of use!

cl -L bar -o xlsx

Browser

ℹ️
Defaults to Firefox

Supports the following browsers:

  • Chrome

  • Chromium

  • Edge

  • Firefox

  • Safari.

-b or --browser foo

Headless mode

ℹ️
False by Default

Supports Headless mode in all major browsers!

--headless

ℹ️
No Default / Not Required

Query a search or take every listing!

-s foo

-s or --search 'foo bar'

Image

ℹ️
False by Default

Downloads images from the listings.

-i or --image

Image defaults can be set in class_cl_item.py by subclass.

if image_url_src.strip() == "":
    image_url = "No image"
    image_path = f'{path}/images/no_image.png'

Category

ℹ️
Default All for sale

Select the category or subcategory you wish to search in.

-C or --category 'foo bar'

All categories are listed in categories.py

You can customize these categories by appending to the end of the dict.

Delete

ℹ️
False by Default

Deletes old listings from SQL tables

-D or --delete

You can modify the timedelta in database.py to adjust when listings are deleted

time_to_stale = current_time - timedelta(weeks=1)

Path

ℹ️
Defaults to Current Working Directory

Select an absolute or relative path to save sheets & images.

cl -L austin -s iphone /app

Contributing

Contributions are welcomed to this project.

Take advantage of pre-commit to lint and test your PRs before submission.

Road Map

Planned additions

  • ✓ Add SQL support

  • ✓ Output Path Arguement

  • ❏ Views: Add support for Preview View (detailed view)

  • ❏ Improve CLI Experience

  • ❏ Make simple python api

  • ❏ Multiprocessing

Feature ideas

  • ❏ Filter Search

  • ❏ Spam filters

Star History


1. experimental / incomplete feature

About

A simple CLI tool to get everything you need from Craigslist

Topics

Resources

License

Stars

Watchers

Forks

Languages