Skip to content

Latest commit

 

History

History
43 lines (22 loc) · 829 Bytes

README.md

File metadata and controls

43 lines (22 loc) · 829 Bytes

CRAWL-STACKOVERFLOW

learning

  • crawl data by scrapy and save to db by sqlalchemy

  • copy and test XPath (using Chrome)

  • crawl pagiation data

  • crawl data from api

command

  • init project:

scrapy startproject stack

  • test XPath from Chrome console:

$x("//img")

  • run project:

scrapy crawl stack

scrapy crawl stack -o items.json -t json

  • generate spider:

scrapy genspider stack_crawler stackoverflow.com -t crawl