千万级图片、视频爬虫 [开源版本]
OpenYspider 是一个使用 Java 编写的简单爬虫。主要用到的技术栈有:
- spring-boot-starter-web
- spring-boot-starter-test
- mybatis-plus-boot-starter
- springfox-boot-starter
- lombok
- jsoup
- mockito + jacoco
当前 LTS 的网站有:
tujidao.com
Deprecated 的网站(请于历史提交中查看):
tangyun365.com
yalayi.com
rosmm88.com
mzsock.com
meinvla.net
leetcode-cn.com
Windows 11
+ JDK 17
+ Mysql 8.x
$ java --version
openjdk 17.0.1 2021-10-19
OpenJDK Runtime Environment (build 17.0.1+12-39)
OpenJDK 64-Bit Server VM (build 17.0.1+12-39, mixed mode, sharing)
运行启动类 OpenYspiderApplication
后,浏览器访问 http://localhost:23333/swagger-ui/index.html#/
数据库脚本: sql_scripts
数据统计截止 2022-02-12
- 目标网站:https://www.tujidao.com/
- 特点:图片路径可遍历
select count(*) from oys_tujidao_album_t where album_id > 0 and album_id <= 10000; -- 9995 ok
select count(*) from oys_tujidao_album_t where album_id > 10000 and album_id <= 20000; -- 10000
select count(*) from oys_tujidao_album_t where album_id > 20000 and album_id <= 30000; -- 9999 [23001]
select count(*) from oys_tujidao_album_t where album_id > 30000 and album_id <= 40000; -- 10000
select count(*) from oys_tujidao_album_t where album_id > 40000 and album_id <= 50000; -- 8925 [46018]