Skip to content

jenifly/ProxyPool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

基于asyncio、aiohttp、uvloop实现的并发代理池,具有简洁、高效、易扩展等特点。

How to use

  
import re

from proxyPool.poxyPool import PoxyPool
from proxyPool.freeProxyGetter import FreeProxyGetter


class MyFreeProxyGetter(FreeProxyGetter):
    async def crawl_kuaidaili(self):
        return await self.get_proxies(
            [f'https://www.kuaidaili.com/free/inha/{page}/' for page in range(1, 2)],
            re.compile(r'<td data-title="IP">([\d\.]+?)</td>\s*<td data-title="PORT">(\w+)</td>'))

    async def crawl_xicidaili(self):
        return await self.get_proxies(
            [f'http://www.xicidaili.com/wt/{page}/' for page in range(1, 3)],
            re.compile(r'<td>([\d\.]+?)</td>\s*<td>(\d+?)</td>'))

    async def crawl_66ip(self):
        return await self.get_proxies(
            [f'http://www.66ip.cn/{page}.html' for page in range(1, 5)],
            re.compile(r'<td>([\d\.]+?)</td>\s*<td>(\d+?)</td>'))

    async def crawl_kxdaili(self):
        return await self.get_proxies(
            [f'http://www.kxdaili.com/dailiip/1/{page}.html' for page in range(1, 4)],
            re.compile(r'<td>([\d\.]+?)</td>\s*<td>(\d+?)</td>'))


if __name__ == "__main__":
    PoxyPool.start(MyFreeProxyGetter)

继承FreeProxyGetter类,实现async def crawl_xxx()方法,方法名必须为crawl_前缀。self.get_proxies()接受两个参数,第一个为代理网页url的容器序列类型,第二个为正则表达式。

在asyncio版本中,因所有方法都在一个事件循环中运行,在代理池上限设置的比较高时,可能会造成某些系统(如windows)select资源耗尽而抛出异常使得程序中断。对此,请使用asyncio.Semaphore限制并发量。

在浏览器中访问http://127.0.0.1:2345/get即可获取可用代理。

也可重写poxyPool.pyPoxyPool类的web的类方法实现自定义。伪代码:

from aiohttp import web

@staticmethod
def web():
    app = web.Application()

    def get_proxy(request):
        return web.Response(text=str(conn.pop()))

    def get_counts(request):
        return web.Response(text=str(conn.queue_len))

    app.add_routes([
            web.get('/get', get_proxy),
            web.get('/count', get_counts)
        ])
    return app

About

A proxyPool based on aiohttp and asyncio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages