-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
有其他提供免费代理的网站在这里说下,我添加到项目里 #71
Comments
@CokkyWoo 这两个网站都是有的 |
@jhao104 你好。这边我看有人提供了几个墙外的代理网址,似乎都不错。可以抽空添加一下吗?(我自己没搞定,有的做了反爬,有的浏览器能打开,但是request连不上····) 目前代理墙外的代理网址只有3个,能抓到的太少了 |
@jhao104 那你有时间添加这些网址吗? |
墙外的你可以自己先搞 |
这两个网址都做了反爬···搞不定。 |
就是js动态生成的,你把这段j s扣出来用pyv8或者pyexecjs执行就能拿到了 |
我主要是不会js,只会一点python,当时用 pyexecjs试了一会没搞出来😂
空了我再试试,谢谢!
J_hao104 <notifications@github.com> 于2019年11月25日周一 下午12:29写道:
… @jhao104 <https://github.com/jhao104> 那你有时间添加这些网址吗?
没空我就自己琢磨了···
墙外的你可以自己先搞
这两个网址都做了反爬···搞不定。
大佬空了可以搞定一个,我学习学习?
http://free-proxy.cz/zh/proxylist/country/US/https/ping/all
http://proxydb.net/?protocol=https&anonlvl=4
@jhao104 <https://github.com/jhao104>
[image: image]
<https://user-images.githubusercontent.com/15058920/69512436-f9d4f300-0f7e-11ea-8710-be649443a79c.png>
[image: image]
<https://user-images.githubusercontent.com/15058920/69512474-24bf4700-0f7f-11ea-8a46-142b7c9197d2.png>
就是js动态生成的,你把这段j s扣出来用pyv8或者pyexecjs执行就能拿到了
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#71?email_source=notifications&email_token=AIUSUJANS6QA4BH7CP5I6NTQVNIDFA5CNFSM4D34MZLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFBCKLQ#issuecomment-557983022>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIUSUJFTYOXHWMW6FBQYA23QVNIDFANCNFSM4D34MZLA>
.
|
@staticmethod
def proxyDBNet():
urls = [
'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CN',
'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=',
'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=SG',
'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=US',
'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=CZ',
'http://proxydb.net/?protocol=https&anonlvl=4&min_uptime=75&max_response_time=5&country=AR',
]
request = WebRequest()
for url in urls:
r = request.get(url, timeout=20)
proxies = re.findall(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d+)', r.text)
for proxy in proxies:
yield proxy |
找到这个元素,script里面的内容定义一个函数,pyv8执行一下。 |
https://github.com/scrapinghub/splash 有个这个东西,html丢过去就完事了。但是不确定会不会影响爬虫效率/。 |
我看好多人要这个网站的代理,我刚好才爬过,贴一下代码如下,需要安装scrapy包, 主要是我用scrapy用习惯了,当然用其它各种包做xpath解析也行。 @staticmethod
def freeProxy21():
url = 'http://free-proxy.cz/en/proxylist'
request = WebRequest()
r = request.get(url, timeout=10)
sel = scrapy.Selector(text=r.text)
max_page = max([int(v) for v in sel.xpath('//div[@class="paginator"]/a/text()').extract() if v.isdigit()])
print(max_page)
for page in range(1, max_page + 1):
r = request.get(url+'/main/{}'.format(page), timeout=10)
sel = scrapy.Selector(text=r.text)
proxies = sel.xpath('//table[@id="proxy_list"]/tbody/tr/td/script[contains(text(),"decode")]/text()').extract()
ports = sel.xpath('//table[@id="proxy_list"]/tbody/tr/td/span/text()').extract()
for index, value in enumerate(proxies):
try:
proxy_ip = re.search('.*decode\(\"(.*)\"\)', value).group(1)
if proxy_ip:
proxy = '{}:{}'.format(base64.b64decode(proxy_ip).decode('utf-8'), ports[index])
yield proxy
except Exception as e:
pass |
@hanjackcyw scrapy会带来很大体积,如果只是为了使用 Selector可以用Scrapy底层的库。
|
好像这个代理也不错:https://proxy.mimvp.com/freeopen |
可以看看这个,蜻蜓的免费 另外大佬,用docker搭在云服务器上,命令里的redis是改成自己的吗? |
A new proxy list: http://pzzqz.com/ |
已添加 |
他这个免费的代码很挫,更新时间都很久了 |
This comment was marked as spam.
This comment was marked as spam.
https://openproxylist.xyz/http.txt 这种的添加模式要怎么弄 |
|
现在的代理网站不是很多,这样可用的代理IP就很少。我也尝试过扫描的方法,但是效率比较低
The text was updated successfully, but these errors were encountered: