Skip to content

Latest commit

 

History

History
59 lines (35 loc) · 1.02 KB

README.md

File metadata and controls

59 lines (35 loc) · 1.02 KB

1.The need for python version:

Python-3.5.2 or upper

2.The need for support library:

import re
import requests
import sys
import queue
import threading
import getopt
import sqlite3
import random
import urllib.parse
import time

3.The need for database:

Sqlite3-3.11.0 or upper

4.Other requirements:

Need to build folder named by "url"
Need to build text file named by "url.txt" to store sites with not any empty row

5.Example:

Spider for url  -version-1.0.3 --Written by WJY
	
	-h   Print the information of help
	-f   specified site file was crawling
	-k   specified keyword in url.Multiple key separated by ',' 
	-n  specified keyword not in url.Multiple key separated by ','
	-t   specified the count of threads
	-l   specified by crawling site depth
	-r   specified the frequency of crawling
	-m 	 specified the mode of store 'text' or 'db'
	
	Exampe:python3 Spider.py -f url.txt

6.Notice for default value:

hreads		=>	"1"
keyword		=>	""
site file	=>	"./url.txt"
level		=>	"2"
rate		=>	"100"