If you need an industrial-level python scraper, you should check out Scrapy https://github.com/scrapy/scrapy
Current features:
- Is also secretly a web spider
- Download files by extension
- Site and Extensions as command-line params
Future features:
- Testing
- Code cleanup, better exceptions
- Keywords as cl-params
- Multithreaded
- Follow links by keyword
- Download files by keyword and/or extension
- Ignore relative location links (eg., #top, #comments...)
- Various performance improvements