GitHub - scrapinghub/scrapyrt: HTTP API for Scrapy spiders

ScrapyRT (Scrapy realtime)

https://readthedocs.org/projects/scrapyrt/badge/?version=latest

Add HTTP API for your Scrapy project in minutes.

You send a request to ScrapyRT with spider name and URL, and in response, you get items collected by a spider visiting this URL.

All Scrapy project components (e.g. middleware, pipelines, extensions) are supported
You run Scrapyrt in Scrapy project directory. It starts HTTP server allowing you to schedule spiders and get spider output in JSON.

Quickstart

1. install

> pip install scrapyrt

2. switch to Scrapy project (e.g. quotesbot project)

> cd my/project_path/is/quotesbot

3. launch ScrapyRT

> scrapyrt

4. run your spiders

> curl "localhost:9080/crawl.json?spider_name=toscrape-css&url=http://quotes.toscrape.com/"

5. run more complex query, e.g. specify callback for Scrapy request and zipcode argument for spider

>  curl --data '{"request": {"url": "http://quotes.toscrape.com/page/2/", "callback":"some_callback"}, "spider_name": "toscrape-css", "crawl_args": {"zipcode":"14000"}}' http://localhost:9080/crawl.json -v

Scrapyrt will look for scrapy.cfg file to determine your project settings, and will raise error if it won't find one. Note that you need to have all your project requirements installed.

Note

Project is not a replacement for Scrapyd or Scrapy Cloud or other infrastructure to run long running crawls
Not suitable for long running spiders, good for spiders that will fetch one response from some website and return items quickly

Documentation

Documentation is available on readthedocs.

Support

Open source support is provided here in Github. Please create a question issue (ie. issue with "question" label).

Commercial support is also available by Zyte.

License

ScrapyRT is offered under BSD 3-Clause license.

Development

Development taking place on Github.

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
.github/workflows		.github/workflows
artwork		artwork
docs		docs
scrapyrt		scrapyrt
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.rst		README.rst
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScrapyRT (Scrapy realtime)

Quickstart

Note

Documentation

Support

License

Development

About

Releases 7

Packages

Contributors 15

Languages

License

scrapinghub/scrapyrt

Folders and files

Latest commit

History

Repository files navigation

ScrapyRT (Scrapy realtime)

Quickstart

Note

Documentation

Support

License

Development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 15

Languages

Packages