Recursive Async crawler for news.ycombinator.com

About

crawls top N news from site:

Downloads page from news
Traverse comments and downloads all links in them
Results are been cached, i.e. crawler doesn't visit same pages
Redo every interval seconds

How to use

Install requirements

pip install -r requirements.txt

Run

python ycrawler.py -n 30 -i 60

There are several arguments:

-n tells number of top news to download
-i interval of fetching of n top news in seconds
-d debug logging
-l where to store log file

Links

https://realpython.com/async-io-python/#other-features-async-for-and-async-generators-comprehensions https://medium.com/python-pandemonium/asyncio-coroutine-patterns-beyond-await-a6121486656f https://medium.com/@yeraydiazdiaz/asyncio-coroutine-patterns-errors-and-cancellation-3bb422e961ff https://github.com/yeraydiazdiaz/asyncio-coroutine-patterns/blob/master/05_cancelling_coroutines/04_cancelling_coroutines.py

TODO

Add pdf support

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
hn_api.py		hn_api.py
requirements.txt		requirements.txt
ycrawler.py		ycrawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recursive Async crawler for news.ycombinator.com

About

How to use

Install requirements

Run

Links

TODO

About

Releases

Packages

Languages

nihilSup/ycrawler

Folders and files

Latest commit

History

Repository files navigation

Recursive Async crawler for news.ycombinator.com

About

How to use

Install requirements

Run

Links

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages