Proxy Configuration
Proxies are essential for professional web automation, enabling you to bypass rate limits, access geo-restricted content, and maintain anonymity. Pydoll provides native proxy support with automatic authentication handling.
Related Documentation
- Browser Options - Command-line proxy arguments
- Request Interception - How proxy authentication works internally
- Stealth Automation - Combine proxies with anti-detection
- Proxy Architecture Deep Dive - Network fundamentals, protocols, security, and building your own proxy
Why Use Proxies?
Proxies provide critical capabilities for automation:
| Benefit | Description | Use Case |
|---|---|---|
| IP Rotation | Distribute requests across multiple IPs | Avoid rate limits, scrape at scale |
| Geographic Access | Access region-locked content | Test geo-targeted features, bypass restrictions |
| Anonymity | Hide your real IP address | Privacy-focused automation, competitor analysis |
| Load Distribution | Spread traffic across multiple endpoints | High-volume scraping, stress testing |
| Ban Avoidance | Prevent permanent IP bans | Long-running automation, aggressive scraping |
When to Use Proxies
Always use proxies for:
- Production web scraping (>100 requests/hour)
- Accessing geo-restricted content
- Bypassing rate limits or IP-based blocks
- Testing from different regions
- Maintaining anonymity
You may skip proxies for:
- Local development and testing
- Internal/corporate automation
- Low-volume automation (<50 requests/day)
- When scraping your own infrastructure
Proxy Types
Different proxy protocols serve different purposes:
| Type | Port | Authentication | Speed | Security | Use Case |
|---|---|---|---|---|---|
| HTTP | 80, 8080 | Optional | Fast | Low | Basic web scraping, non-sensitive data |
| HTTPS | 443, 8443 | Optional | Fast | Medium | Secure web scraping, encrypted traffic |
| SOCKS5 | 1080, 1081 | Optional | Medium | High | Full TCP/UDP support, advanced use cases |
HTTP/HTTPS Proxies
Standard web proxies, ideal for most automation tasks:
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def http_proxy_example():
options = ChromiumOptions()
# HTTP proxy (unencrypted)
options.add_argument('--proxy-server=http://proxy.example.com:8080')
# Or HTTPS proxy (encrypted)
# options.add_argument('--proxy-server=https://proxy.example.com:8443')
async with Chrome(options=options) as browser:
tab = await browser.start()
# All traffic goes through proxy
await tab.go_to('https://httpbin.org/ip')
# Verify proxy IP
ip = await tab.execute_script('return document.body.textContent')
print(f"Current IP: {ip}")
asyncio.run(http_proxy_example())
Pros:
- Fast and efficient
- Wide support across services
- Easy to configure
Cons:
- HTTP: No encryption (traffic visible to proxy)
- Can be detected more easily than SOCKS5
SOCKS5 Proxies
Advanced proxies with full TCP/UDP support:
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def socks5_proxy_example():
options = ChromiumOptions()
# SOCKS5 proxy
options.add_argument('--proxy-server=socks5://proxy.example.com:1080')
async with Chrome(options=options) as browser:
tab = await browser.start()
await tab.go_to('https://httpbin.org/ip')
asyncio.run(socks5_proxy_example())
Pros:
- Protocol-agnostic (works with any TCP/UDP traffic)
- Better for advanced use cases (WebSockets, WebRTC)
- More stealthy (harder to detect)
Cons:
- Slightly slower than HTTP/HTTPS
- Less common in free/cheap proxy services
SOCKS4 vs SOCKS5
SOCKS5 is recommended over SOCKS4 because it:
- Supports authentication (username/password)
- Handles UDP traffic (for WebRTC, DNS, etc.)
- Provides better error handling
Use socks5:// unless you specifically need SOCKS4 (socks4://).
Authenticated Proxies
Pydoll automatically handles proxy authentication without manual intervention.
How Authentication Works
When you provide credentials in the proxy URL, Pydoll:
- Intercepts the authentication challenge using the Fetch domain
- Automatically responds with credentials
- Continues navigation sea@mlessly
This happens transparently, you don't need to handle authentication manually!
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def authenticated_proxy_example():
options = ChromiumOptions()
# Proxy with authentication (username:password)
options.add_argument('--proxy-server=http://user:pass@proxy.example.com:8080')
async with Chrome(options=options) as browser:
tab = await browser.start()
# Authentication handled automatically!
await tab.go_to('https://example.com')
print("Connected through authenticated proxy")
asyncio.run(authenticated_proxy_example())
Credential Format
Include credentials directly in the proxy URL:
- HTTP:
http://username:password@host:port - HTTPS:
https://username:password@host:port - SOCKS5:
socks5://username:password@host:port
Pydoll automatically extracts and uses these credentials.
Authentication Implementation Details
Pydoll uses Chrome's Fetch domain at the browser level to intercept and handle authentication challenges:
# This is handled internally by Pydoll
# You don't need to write this code!
async def _handle_proxy_auth(event):
"""Pydoll's internal proxy authentication handler."""
if event['params']['authChallenge']['source'] == 'Proxy':
await browser.continue_request_with_auth(
request_id=event['params']['requestId'],
username='user',
password='pass'
)
Under the Hood
For technical details on how Pydoll intercepts and handles proxy authentication, see:
- Request Interception - Fetch domain and request handling
- Event System - Event-driven authentication
Fetch Domain Conflicts
When using authenticated proxies + tab-level request interception, be aware:
- Pydoll enables Fetch at the Browser level for proxy auth
- If you enable Fetch at the Tab level, they share the same domain
- Solution: Call
tab.go_to()once before enabling tab-level interception
async with Chrome(options=options) as browser:
tab = await browser.start()
# 1. First navigation triggers proxy auth (Browser-level Fetch)
await tab.go_to('https://example.com')
# 2. Then enable tab-level interception safely
await tab.enable_fetch_events()
await tab.on('Fetch.requestPaused', my_interceptor)
# 3. Continue with your automation
await tab.go_to('https://example.com/page2')
See Request Interception - Proxy + Interception for details.
Proxy Bypass List
Exclude specific domains from using the proxy:
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def proxy_bypass_example():
options = ChromiumOptions()
# Use proxy for most traffic
options.add_argument('--proxy-server=http://proxy.example.com:8080')
# But bypass proxy for these domains
options.add_argument('--proxy-bypass-list=localhost,127.0.0.1,*.local,internal.company.com')
async with Chrome(options=options) as browser:
tab = await browser.start()
# Uses proxy
await tab.go_to('https://external-site.com')
# Bypasses proxy (direct connection)
await tab.go_to('http://localhost:8000')
await tab.go_to('http://internal.company.com')
asyncio.run(proxy_bypass_example())
Bypass list patterns:
| Pattern | Matches | Example |
|---|---|---|
localhost |
Localhost only | http://localhost |
127.0.0.1 |
Loopback IP | http://127.0.0.1 |
*.local |
All .local domains |
http://server.local |
internal.company.com |
Specific domain | http://internal.company.com |
192.168.1.* |
IP range | http://192.168.1.100 |
When to Use Bypass List
Bypass proxy for:
- Local development servers (
localhost,127.0.0.1) - Internal company resources (VPN, intranet)
- Testing environments (
.local,.testdomains) - High-bandwidth resources (when proxy is slow)
PAC (Proxy Auto-Config)
Use a PAC file for complex proxy routing rules:
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def pac_proxy_example():
options = ChromiumOptions()
# Load PAC file from URL
options.add_argument('--proxy-pac-url=http://proxy.example.com/proxy.pac')
# Or use local PAC file
# options.add_argument('--proxy-pac-url=file:///path/to/proxy.pac')
async with Chrome(options=options) as browser:
tab = await browser.start()
await tab.go_to('https://example.com')
asyncio.run(pac_proxy_example())
Example PAC file:
function FindProxyForURL(url, host) {
// Direct connection for local addresses
if (isInNet(host, "192.168.0.0", "255.255.0.0") ||
isInNet(host, "127.0.0.0", "255.0.0.0")) {
return "DIRECT";
}
// Use specific proxy for certain domains
if (dnsDomainIs(host, ".example.com")) {
return "PROXY proxy1.example.com:8080";
}
// Default proxy for everything else
return "PROXY proxy2.example.com:8080";
}
PAC File Use Cases
PAC files are useful for:
- Complex routing rules (domain-based, IP-based)
- Proxy failover (try multiple proxies)
- Load balancing (distribute across proxy pool)
- Enterprise environments (centralized proxy management)
Rotating Proxies
Rotate through multiple proxies for better distribution:
import asyncio
from itertools import cycle
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def rotating_proxy_example():
# List of proxies
proxies = [
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
'http://user:pass@proxy3.example.com:8080',
]
# Cycle through proxies
proxy_pool = cycle(proxies)
# Scrape multiple URLs with different proxies
urls = [
'https://example.com/page1',
'https://example.com/page2',
'https://example.com/page3',
]
for url in urls:
# Get next proxy
proxy = next(proxy_pool)
# Configure options with this proxy
options = ChromiumOptions()
options.add_argument(f'--proxy-server={proxy}')
# Use proxy for this browser instance
async with Chrome(options=options) as browser:
tab = await browser.start()
await tab.go_to(url)
title = await tab.execute_script('return document.title')
print(f"[{proxy.split('@')[1]}] {url}: {title}")
asyncio.run(rotating_proxy_example())
Proxy Rotation Strategies
Per-browser rotation (above):
- Each browser instance uses a different proxy
- Best for isolation and avoiding session conflicts
Per-request rotation:
- More complex, requires request interception
- See Request Interception for implementation
Residential vs Datacenter Proxies
Understanding proxy types helps you choose the right service:
| Feature | Residential | Datacenter |
|---|---|---|
| IP Source | Real residential ISPs | Data centers |
| Legitimacy | High (real users) | Low (known ranges) |
| Detection Risk | Very low | High |
| Speed | Medium (150-500ms) | Very fast (<50ms) |
| Cost | Expensive ($5-15/GB) | Cheap ($0.10-1/GB) |
| Best For | Anti-bot sites, e-commerce | APIs, internal tools |
Residential Proxies
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def residential_proxy_example():
"""Use residential proxy for anti-bot sites."""
options = ChromiumOptions()
# Residential proxy with high trust score
options.add_argument('--proxy-server=http://user:pass@residential.proxy.com:8080')
# Combine with stealth options
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36')
async with Chrome(options=options) as browser:
tab = await browser.start()
# Access protected site
await tab.go_to('https://protected-site.com')
print("Successfully accessed through residential proxy")
asyncio.run(residential_proxy_example())
When to use Residential:
- Sites with strong anti-bot protection (Cloudflare, DataDome)
- E-commerce scraping (Amazon, eBay, etc.)
- Social media automation
- Financial services
- Any site that actively blocks datacenter IPs
Datacenter Proxies
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def datacenter_proxy_example():
"""Use fast datacenter proxy for APIs and unprotected sites."""
options = ChromiumOptions()
# Fast datacenter proxy
options.add_argument('--proxy-server=http://user:pass@datacenter.proxy.com:8080')
async with Chrome(options=options) as browser:
tab = await browser.start()
# Fast API scraping
await tab.go_to('https://api.example.com/data')
asyncio.run(datacenter_proxy_example())
When to use Datacenter:
- Public APIs without rate limits
- Internal/corporate automation
- Sites without anti-bot measures
- High-volume, speed-critical scraping
- Development and testing
Proxy Quality Matters
Bad proxies cause more problems than they solve:
- Slow response times (timeouts)
- Connection failures (error rates)
- Blacklisted IPs (immediate bans)
- Leaked real IP (privacy breach)
Invest in quality proxies from reputable providers. Free proxies are almost never worth it.
Testing Your Proxy
Verify proxy configuration before running production automation:
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions
async def test_proxy():
"""Test proxy connection and configuration."""
proxy_url = 'http://user:pass@proxy.example.com:8080'
options = ChromiumOptions()
options.add_argument(f'--proxy-server={proxy_url}')
try:
async with Chrome(options=options) as browser:
tab = await browser.start()
# Test 1: Connection
print("Testing proxy connection...")
await tab.go_to('https://httpbin.org/ip', timeout=10)
# Test 2: IP verification
print("Verifying proxy IP...")
ip_response = await tab.execute_script('return document.body.textContent')
print(f"[OK] Proxy IP: {ip_response}")
# Test 3: Geographic location (if available)
await tab.go_to('https://ipapi.co/json/')
geo_data = await tab.execute_script('return document.body.textContent')
print(f"[OK] Geographic data: {geo_data}")
# Test 4: Speed test
import time
start = time.time()
await tab.go_to('https://example.com')
load_time = time.time() - start
print(f"[OK] Load time: {load_time:.2f}s")
if load_time > 5:
print("[WARNING] Slow proxy response time")
print("\n[SUCCESS] All proxy tests passed!")
except asyncio.TimeoutError:
print("[ERROR] Proxy connection timeout")
except Exception as e:
print(f"[ERROR] Proxy test failed: {e}")
asyncio.run(test_proxy())
Further Reading
- Proxy Architecture Deep Dive - Network fundamentals, TCP/UDP, HTTP/2/3, SOCKS5 internals, security analysis, and building your own proxy server
- Browser Options - Command-line arguments and configuration
- Request Interception - How proxy authentication works
- Browser Preferences - Stealth and fingerprinting
- Contexts - Using different proxies per context
Start Simple
Begin with a simple proxy setup, test thoroughly, then add complexity (rotation, retry logic, monitoring) as needed. Quality proxies are more important than complex rotation strategies.
For those interested in understanding proxies at a deeper level, the Proxy Architecture Deep Dive provides comprehensive coverage of network protocols, security considerations, and even guides you through building your own proxy server.