Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 670 Bytes

README.md

File metadata and controls

20 lines (15 loc) · 670 Bytes

Code Climate

If you need an industrial-level python scraper, you should check out Scrapy https://github.com/scrapy/scrapy

Current features:

  • Is also secretly a web spider
  • Download files by extension
  • Site and Extensions as command-line params

Future features:

  • Testing
  • Code cleanup, better exceptions
  • Keywords as cl-params
  • Multithreaded
  • Follow links by keyword
  • Download files by keyword and/or extension
  • Ignore relative location links (eg., #top, #comments...)
  • Various performance improvements