Skip to content

4 分布式爬虫

邹嵩 edited this page Apr 4, 2020 · 1 revision

1. Install docker

https://docs.docker.com/install/

2. Install docker compose

https://github.com/docker/compose/releases

3. Install mysql

$ docker run --name mysql -d -p 3306:3306 --restart always -e MYSQL_ROOT_PASSWORD=1qazZAQ! mysql:latest

4. Docker remote api for mac (if you work on mac)

$ docker run -d  --restart always --name socat -v /var/run/docker.sock:/var/run/docker.sock -p 2376:2375 bobrik/socat TCP4-LISTEN:2375,fork,reuseaddr UNIX-CONNECT:/var/run/docker.sock

5. RabbitMQ

version: '3'
services:
  rabbitmq:
    image: 'rabbitmq:3-management'
    restart: always
    container_name: 'rabbitmq'
    ports:
      - 4369:4369
      - 5671:5671
      - 5672:5672
      - 25672:25672
      - 15671:15671
      - 15672:15672
    environment:
      - TZ=Asia/Shanghai
      - RABBITMQ_DEFAULT_USER=user
      - RABBITMQ_DEFAULT_PASS=password

使用 docker-compose 部署 rabbitmq 实例

6. Portal

portal.yml

version: '3'

services:
  dotnetspider.portal:
    image: 'dotnetspider/portal:latest'
    restart: always
    container_name: dotnetspider.portal
    ports:
      - '7896:7896'
    volumes:
      - {your config path}:/portal/appsettings.json

appsettings.json

{
  "ConnectionString": "Database='dotnetspider2';Data Source=192.168.124.200;password=1qazZAQ!;User ID=root;Port=3306;",
  "DatabaseType": "MySql",
  "RabbitMQ": {
    "Exchange": "DOTNET_SPIDER",
    "Host": "192.168.124.200",
    "UserName": "user",
    "Password": "password"
  },
  "Database": "dotnetspider2",
  "Docker": "http://192.168.124.200:2376",
  "DockerVolumes": ""
}

After the instance started, check it's ok: http://localhost:7896

7. Agent

agent.yml

version: '3'

services:
  dotnetspider.agent:
    image: 'dotnetspider/agent:latest'
    restart: always
    container_name: dotnetspider.agent
    volumes:
      - {your config path}:/agent/appsettings.json    

appsettings.json

{
  "RabbitMQ": {
    "Exchange": "DOTNET_SPIDER",
    "Host": "192.168.124.200",
    "UserName": "user",
    "Password": "password"
  },
  "AgentId": "AGENT_001",
  "AgentName": "AGENT_001",
  "ADSLAccount": "",
  "ADSLPassword": "",
  "ADSLInterface": "",
  "SupportPuppeteer": false
}

8. Try run the sample spider

docker run --rm \
 -e DOTNET_SPIDER_TYPE=DotnetSpider.Spiders.EntitySpider \
 -e DOTNET_SPIDER_ID=xxxx \
 -e DOTNET_SPIDER_NAME=cnblogs \
 dotnetspider/spiders:latest 

WX20200404-201803@2x