Proxy Configuration

Proxies are essential for professional web automation, enabling you to bypass rate limits, access geo-restricted content, and maintain anonymity. Pydoll provides native proxy support with automatic authentication handling.

Why Use Proxies?

Proxies provide critical capabilities for automation:

Benefit	Description	Use Case
IP Rotation	Distribute requests across multiple IPs	Avoid rate limits, scrape at scale
Geographic Access	Access region-locked content	Test geo-targeted features, bypass restrictions
Anonymity	Hide your real IP address	Privacy-focused automation, competitor analysis
Load Distribution	Spread traffic across multiple endpoints	High-volume scraping, stress testing
Ban Avoidance	Prevent permanent IP bans	Long-running automation, aggressive scraping

When to Use Proxies

Always use proxies for:

Production web scraping (>100 requests/hour)
Accessing geo-restricted content
Bypassing rate limits or IP-based blocks
Testing from different regions
Maintaining anonymity

You may skip proxies for:

Local development and testing
Internal/corporate automation
Low-volume automation (<50 requests/day)
When scraping your own infrastructure

Proxy Types

Different proxy protocols serve different purposes:

Type	Port	Authentication	Speed	Security	Use Case
HTTP	80, 8080	Optional	Fast	Low	Basic web scraping, non-sensitive data
HTTPS	443, 8443	Optional	Fast	Medium	Secure web scraping, encrypted traffic
SOCKS5	1080, 1081	Optional	Medium	High	Full TCP/UDP support, advanced use cases

HTTP/HTTPS Proxies

Standard web proxies, ideal for most automation tasks:

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def http_proxy_example():
    options = ChromiumOptions()

    # HTTP proxy (unencrypted)
    options.add_argument('--proxy-server=http://proxy.example.com:8080')

    # Or HTTPS proxy (encrypted)
    # options.add_argument('--proxy-server=https://proxy.example.com:8443')

    async with Chrome(options=options) as browser:
        tab = await browser.start()

        # All traffic goes through proxy
        await tab.go_to('https://httpbin.org/ip')

        # Verify proxy IP
        ip = await tab.execute_script('return document.body.textContent')
        print(f"Current IP: {ip}")

asyncio.run(http_proxy_example())

Pros:

Fast and efficient
Wide support across services
Easy to configure

Cons:

HTTP: No encryption (traffic visible to proxy)
Can be detected more easily than SOCKS5

SOCKS5 Proxies

Advanced proxies with full TCP/UDP support:

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def socks5_proxy_example():
    options = ChromiumOptions()

    # SOCKS5 proxy
    options.add_argument('--proxy-server=socks5://proxy.example.com:1080')

    async with Chrome(options=options) as browser:
        tab = await browser.start()
        await tab.go_to('https://httpbin.org/ip')

asyncio.run(socks5_proxy_example())

Pros:

Protocol-agnostic (works with any TCP/UDP traffic)
Better for advanced use cases (WebSockets, WebRTC)
More stealthy (harder to detect)

Cons:

Slightly slower than HTTP/HTTPS
Less common in free/cheap proxy services

SOCKS4 vs SOCKS5

SOCKS5 is recommended over SOCKS4 because it:

Supports authentication (username/password)
Handles UDP traffic (for WebRTC, DNS, etc.)
Provides better error handling

Use socks5:// unless you specifically need SOCKS4 (socks4://).

Authenticated Proxies

Pydoll automatically handles proxy authentication without manual intervention.

How Authentication Works

When you provide credentials in the proxy URL, Pydoll:

Intercepts the authentication challenge using the Fetch domain
Automatically responds with credentials
Continues navigation sea@mlessly

This happens transparently, you don't need to handle authentication manually!

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def authenticated_proxy_example():
    options = ChromiumOptions()

    # Proxy with authentication (username:password)
    options.add_argument('--proxy-server=http://user:pass@proxy.example.com:8080')

    async with Chrome(options=options) as browser:
        tab = await browser.start()

        # Authentication handled automatically!
        await tab.go_to('https://example.com')
        print("Connected through authenticated proxy")

asyncio.run(authenticated_proxy_example())

Credential Format

Include credentials directly in the proxy URL:

HTTP: http://username:password@host:port
HTTPS: https://username:password@host:port
SOCKS5: socks5://username:password@host:port

Pydoll automatically extracts and uses these credentials.

Authentication Implementation Details

Pydoll uses Chrome's Fetch domain at the browser level to intercept and handle authentication challenges:

# This is handled internally by Pydoll
# You don't need to write this code!

async def _handle_proxy_auth(event):
    """Pydoll's internal proxy authentication handler."""
    if event['params']['authChallenge']['source'] == 'Proxy':
        await browser.continue_request_with_auth(
            request_id=event['params']['requestId'],
            username='user',
            password='pass'
        )

Under the Hood

For technical details on how Pydoll intercepts and handles proxy authentication, see:

Request Interception - Fetch domain and request handling
Event System - Event-driven authentication

Fetch Domain Conflicts

When using authenticated proxies + tab-level request interception, be aware:

Pydoll enables Fetch at the Browser level for proxy auth
If you enable Fetch at the Tab level, they share the same domain
Solution: Call tab.go_to() once before enabling tab-level interception

async with Chrome(options=options) as browser:
    tab = await browser.start()

    # 1. First navigation triggers proxy auth (Browser-level Fetch)
    await tab.go_to('https://example.com')

    # 2. Then enable tab-level interception safely
    await tab.enable_fetch_events()
    await tab.on('Fetch.requestPaused', my_interceptor)

    # 3. Continue with your automation
    await tab.go_to('https://example.com/page2')

See Request Interception - Proxy + Interception for details.

Proxy Bypass List

Exclude specific domains from using the proxy:

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def proxy_bypass_example():
    options = ChromiumOptions()

    # Use proxy for most traffic
    options.add_argument('--proxy-server=http://proxy.example.com:8080')

    # But bypass proxy for these domains
    options.add_argument('--proxy-bypass-list=localhost,127.0.0.1,*.local,internal.company.com')

    async with Chrome(options=options) as browser:
        tab = await browser.start()

        # Uses proxy
        await tab.go_to('https://external-site.com')

        # Bypasses proxy (direct connection)
        await tab.go_to('http://localhost:8000')
        await tab.go_to('http://internal.company.com')

asyncio.run(proxy_bypass_example())

Bypass list patterns:

Pattern	Matches	Example
`localhost`	Localhost only	`http://localhost`
`127.0.0.1`	Loopback IP	`http://127.0.0.1`
`*.local`	All `.local` domains	`http://server.local`
`internal.company.com`	Specific domain	`http://internal.company.com`
`192.168.1.*`	IP range	`http://192.168.1.100`

When to Use Bypass List

Bypass proxy for:

Local development servers (localhost, 127.0.0.1)
Internal company resources (VPN, intranet)
Testing environments (.local, .test domains)
High-bandwidth resources (when proxy is slow)

PAC (Proxy Auto-Config)

Use a PAC file for complex proxy routing rules:

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def pac_proxy_example():
    options = ChromiumOptions()

    # Load PAC file from URL
    options.add_argument('--proxy-pac-url=http://proxy.example.com/proxy.pac')

    # Or use local PAC file
    # options.add_argument('--proxy-pac-url=file:///path/to/proxy.pac')

    async with Chrome(options=options) as browser:
        tab = await browser.start()
        await tab.go_to('https://example.com')

asyncio.run(pac_proxy_example())

Example PAC file:

function FindProxyForURL(url, host) {
    // Direct connection for local addresses
    if (isInNet(host, "192.168.0.0", "255.255.0.0") ||
        isInNet(host, "127.0.0.0", "255.0.0.0")) {
        return "DIRECT";
    }

    // Use specific proxy for certain domains
    if (dnsDomainIs(host, ".example.com")) {
        return "PROXY proxy1.example.com:8080";
    }

    // Default proxy for everything else
    return "PROXY proxy2.example.com:8080";
}

PAC File Use Cases

PAC files are useful for:

Complex routing rules (domain-based, IP-based)
Proxy failover (try multiple proxies)
Load balancing (distribute across proxy pool)
Enterprise environments (centralized proxy management)

Rotating Proxies

Rotate through multiple proxies for better distribution:

import asyncio
from itertools import cycle
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def rotating_proxy_example():
    # List of proxies
    proxies = [
        'http://user:pass@proxy1.example.com:8080',
        'http://user:pass@proxy2.example.com:8080',
        'http://user:pass@proxy3.example.com:8080',
    ]

    # Cycle through proxies
    proxy_pool = cycle(proxies)

    # Scrape multiple URLs with different proxies
    urls = [
        'https://example.com/page1',
        'https://example.com/page2',
        'https://example.com/page3',
    ]

    for url in urls:
        # Get next proxy
        proxy = next(proxy_pool)

        # Configure options with this proxy
        options = ChromiumOptions()
        options.add_argument(f'--proxy-server={proxy}')

        # Use proxy for this browser instance
        async with Chrome(options=options) as browser:
            tab = await browser.start()
            await tab.go_to(url)

            title = await tab.execute_script('return document.title')
            print(f"[{proxy.split('@')[1]}] {url}: {title}")

asyncio.run(rotating_proxy_example())

Proxy Rotation Strategies

Per-browser rotation (above):

Each browser instance uses a different proxy
Best for isolation and avoiding session conflicts

Per-request rotation:

More complex, requires request interception
See Request Interception for implementation

Residential vs Datacenter Proxies

Understanding proxy types helps you choose the right service:

Feature	Residential	Datacenter
IP Source	Real residential ISPs	Data centers
Legitimacy	High (real users)	Low (known ranges)
Detection Risk	Very low	High
Speed	Medium (150-500ms)	Very fast (<50ms)
Cost	Expensive ($5-15/GB)	Cheap ($0.10-1/GB)
Best For	Anti-bot sites, e-commerce	APIs, internal tools

Residential Proxies

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def residential_proxy_example():
    """Use residential proxy for anti-bot sites."""
    options = ChromiumOptions()

    # Residential proxy with high trust score
    options.add_argument('--proxy-server=http://user:pass@residential.proxy.com:8080')

    # Combine with stealth options
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36')

    async with Chrome(options=options) as browser:
        tab = await browser.start()

        # Access protected site
        await tab.go_to('https://protected-site.com')
        print("Successfully accessed through residential proxy")

asyncio.run(residential_proxy_example())

When to use Residential:

Sites with strong anti-bot protection (Cloudflare, DataDome)
E-commerce scraping (Amazon, eBay, etc.)
Social media automation
Financial services
Any site that actively blocks datacenter IPs

Datacenter Proxies

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def datacenter_proxy_example():
    """Use fast datacenter proxy for APIs and unprotected sites."""
    options = ChromiumOptions()

    # Fast datacenter proxy
    options.add_argument('--proxy-server=http://user:pass@datacenter.proxy.com:8080')

    async with Chrome(options=options) as browser:
        tab = await browser.start()

        # Fast API scraping
        await tab.go_to('https://api.example.com/data')

asyncio.run(datacenter_proxy_example())

When to use Datacenter:

Public APIs without rate limits
Internal/corporate automation
Sites without anti-bot measures
High-volume, speed-critical scraping
Development and testing

Proxy Quality Matters

Bad proxies cause more problems than they solve:

Slow response times (timeouts)
Connection failures (error rates)
Blacklisted IPs (immediate bans)
Leaked real IP (privacy breach)

Invest in quality proxies from reputable providers. Free proxies are almost never worth it.

Testing Your Proxy

Verify proxy configuration before running production automation:

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

async def test_proxy():
    """Test proxy connection and configuration."""
    proxy_url = 'http://user:pass@proxy.example.com:8080'

    options = ChromiumOptions()
    options.add_argument(f'--proxy-server={proxy_url}')

    try:
        async with Chrome(options=options) as browser:
            tab = await browser.start()

            # Test 1: Connection
            print("Testing proxy connection...")
            await tab.go_to('https://httpbin.org/ip', timeout=10)

            # Test 2: IP verification
            print("Verifying proxy IP...")
            ip_response = await tab.execute_script('return document.body.textContent')
            print(f"[OK] Proxy IP: {ip_response}")

            # Test 3: Geographic location (if available)
            await tab.go_to('https://ipapi.co/json/')
            geo_data = await tab.execute_script('return document.body.textContent')
            print(f"[OK] Geographic data: {geo_data}")

            # Test 4: Speed test
            import time
            start = time.time()
            await tab.go_to('https://example.com')
            load_time = time.time() - start
            print(f"[OK] Load time: {load_time:.2f}s")

            if load_time > 5:
                print("[WARNING] Slow proxy response time")

            print("\n[SUCCESS] All proxy tests passed!")

    except asyncio.TimeoutError:
        print("[ERROR] Proxy connection timeout")
    except Exception as e:
        print(f"[ERROR] Proxy test failed: {e}")

asyncio.run(test_proxy())