Seen is a lightweight web crawling framework for everyone.
Written with asyncio
,aiohttp/requests
.
It is useful for writing a web crawling quickly and get FULL JavaScript Support.
- Python 3.5+
- aiohttp or requests
- pyquery
pip install seen
Get JavaScript support!
pip install pyppeteer
- Write spider.py
from seen import Spider, Parser, Item, Css
class Post(Item):
title = Css('title')
img = Css('img', 'src')
def save(self):
print(self.result['title'])
print(self.result['img'])
class MySpider(Spider):
roots = 'https://www.v2ex.com'
url_limit = ('www.v2ex.com')
concurrency = 1
# if you want to load JavaScript, set use_browser = True
# by default is False.
use_browser = False
parsers = [Parser(Post)]
if __name__ == '__main__':
spider = MySpider()
spider.start()
- Run
python spider.py
. - Check result.
- Pull request.
- Open an issue.