What you will find here: Unlimited collecting, validating, and caching free proxies. Collect form any endpoints includes: Text APIs, JSON APIs, or web pages, by simply adding URLs to the proxy_sources.txt file, Automatically handle collecting (scraping) validation, and caching. All this done very fast ✨
We support HTTP, HTTPS proxies for validating (soon Socks4 & 5)
-
✨ Unique IP!: Ensure only proxies with unique IP addresses are returned.
-
⚡ Asynchronous Power: Asynchronously scrape URLs and validate proxies simultaneously, which will result in a very fast processing time 🚀.
-
🧹 Scraping & Collect: Extract proxies from URLs listed in proxy_source.txt using regular expressions for Webpages, JSON, and Text content.
-
✅ Validating: Validate proxies concurrently. We don't wait for all URLs to finish; validation happens as soon as each proxy is ready 💪.
-
💾 Caching: Optionally cache valid proxies and set a duration for automatic revalidation.
-
🐞 Monitoring: Track runtime details, including valid/invalid proxies, scraping status, source-specific proxy counts, and errors.
Here's basics example without any options or configuration:
import asyncio
from get_proxy import ProxyFetcher # import the module
async def main():
async with ProxyFetcher() as proxy_fetcher:
valid_proxies = await proxy_fetcher.get_valid_proxies()
# process proxies as you want
print(valid_proxies)
asyncio.run(main())
Lets enable proxy caching, and set cache duration to 5m.
So, proxies will reuse as long the cache is valid, else will revalidate it.
import asyncio
from get_proxy import ProxyFetcher, ProxyConfig
async def main():
config = ProxyConfig(
cache_enabled=True,
enforce_unique_ip=False,
cache_duration_minutes=5,
)
proxy_fetcher = ProxyFetcher(config)
proxies = await proxy_fetcher.get_valid_proxies()
print(proxies)
# after end!
await proxy_fetcher.close()
if __name__ == "__main__":
asyncio.run(main())
We handle various types of content: Webpages, JSON APIs, and Text APIs.
- https://api.proxyscrape.com/v3/free-proxy-list/get?request=displayproxies&protocol=http&proxy_format=protocolipport&format=text&timeout=20000
- https://spys.me/proxy.txt
JSON sources might provide IP and port numbers in different fields. Here’s how to configure them:
- Add a URL to your proxy resources file.
- Add the following after the URL:
json=true&ip=<ip_field>&port=<port_field>
- Replace
<ip_field>
with the key for the IP address. - Replace
<port_field>
with the key for the port number. - Make sure there is a space between the URL and the parameters.
- Replace
Example:
If your JSON response looks like this:
[
{
"IP": "314.235.43.2",
"PORT": "80",
"foo": "bar"
},
{"..."},
]
And your URL is http://example.com/api/free-proxy?format=json
, you should write:
http://example.com/api/free-proxy?format=json json=true&ip=IP&port=PORT
INFO: Ensure there is a space between the URL and the parameters.
-
Requirements📋:
- aiohttp
-
Clone repo, and navigate to working director:
git clone https://github.com/abdelrahman-mh/get-proxy
cd get-proxy
- Setup working directory:
# create python venv (optional!) and activate it
python3 -m venv .venv && source .venv/bin/activate
# install requirement
pip install -r requirements.txt
- Try it!:
python3 get_proxy.py
ProxyFetcher(config: ProxyConfig = ProxyConfig())
Options
config
: ProxyConfig class! (default:ProxyConfig()
)
Methods
ProxyFetcher.get_valid_proxies() -> list[str]
: return valid proxy list ready to use- Asynchronous, must call with
await
keyword
- Asynchronous, must call with
ProxyConfig(
prefix: str = "http://",
user_agent: str = "Mozil...",
ip_check_api: str = "http://httpbin.org/ip",
request_timeout: int = 15,
retry: int = 0,
concurrency_limit: int = 500,
proxy_sources_file: str = "proxy_sources.txt",
proxy_cache_file: str = "proxy_cache.txt",
cache_enabled: bool = False,
cache_duration_minutes: int = 20,
enforce_unique_ip: bool = True,
strict_x_forwarded_for: bool = False
)
Options
prefix
: Proxy URL prefix (default:"http://"
).user_agent
: User-agent string (default:"Mozil..."
).ip_check_api
: API for public IP check and proxy validation (default:"http://httpbin.org/ip"
).request_timeout
: Timeout for proxy validity checks (default:15
seconds).retry
: Number of retries for failed proxy requests (default:0
).concurrency_limit
: Maximum concurrent proxy validation requests (default:500
).proxy_sources_file
: File containing proxy source URLs (default:"proxy_sources.txt"
).proxy_cache_file
: File for storing cached proxies (default:"proxy_cache.txt"
).cache_enabled
: Whether to enable caching (default:False
).cache_duration_minutes
: Duration for caching proxies (default:20
minutes).enforce_unique_ip
: Ensure each proxy has a unique IP (default:True
).strict_x_forwarded_for
: Enforce strict handling ofX-Forwarded-For
headers, there's some proxies not really hide your IP! (default:False
).
PRs
are welcoming!
- Add an option to limit the number of working proxies that returns.
- Design Patterns:
- Use caching to store configurations during initialization, avoiding repeated checks at runtime.
- Consider patterns like Strategy or Factory to manage varying behaviors based on configuration.
- Implement a method for handling proxy limits and use asyncio.as_completed() for processing results as they finish, instead of asyncio.gather().
- Apply these patterns to improve configuration handling for options like enforce_unique_ip and cache_enabled.
- Socks 4 & 5: Add support for Socks4 and Socks5 proxies.
- Separate proxy scraping and validating
- Add type annotations and hints to the code.