JDSpider

目标：分布式爬取京东商品详情，评论和评论总结

Feature

总体框架划分四部分（一总控，三爬虫）灵活分配
爬虫皆为分布式部署，解决带宽和性能瓶颈
proxy_pool解决ip封禁
禁用cookie防止浏览器记忆爬虫
mysql底层数据存储

Power by:

Python 3.6
Scrapy 1.4
pymysql
json
redis

Project blog: http://blog.csdn.net/sinat_34200786/article/details/78770356

How to use ？

git clone https://github.com/Dengqlbq/JDSpider.git

Override the following content

ProjectStart/Test.py (redis configuration, keywords, page_count)
JDUrlsSpider/settings.py (redis configuration)
JDDetailSpider/settings.py (redis configuration, mysql configuration， DOWNLOAD_DELAY)
JDCommentSpider/settings.py (redis configuratin, mysql configuration， DOWNLOAD_DELAY)

cd ProjectStart
python Test.py

cd JDUrlsSpider
scrapy crawl JDUrlsSpider

cd JDDetailSpider
scrapy crawl JDDetailSpider
(This is distributed crawler, you can run more than one JDDetailSpider)

cd JDCommentSpider
scrapy crawl JDCommentSpider
(This is distributed crawler, you can run more than one JDCommentSpider)

Note:

Before you run the project, make sure that you have created tables match the requirement.
If you did not build a proxy_pool, disable the "ProxyMiddleware" in JDCommetSpider/settings.py

Achievement

Product detail and comment summary
Some comments

Full comment

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Image		Image
JDComment		JDComment
JDDetail		JDDetail
JDUrls		JDUrls
ProjectStart		ProjectStart
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JDSpider

目标：分布式爬取京东商品详情，评论和评论总结

Feature

Power by:

Project blog: http://blog.csdn.net/sinat_34200786/article/details/78770356

How to use ？

Achievement

About

Releases

Packages

Languages

License

Dengqlbq/JDSpider

Folders and files

Latest commit

History

Repository files navigation

JDSpider

目标：分布式爬取京东商品详情，评论和评论总结

Feature

Power by:

Project blog: http://blog.csdn.net/sinat_34200786/article/details/78770356

How to use ？

Achievement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages