HTTP/HTTPS Proxy Architecture

This document explores HTTP and HTTPS proxies in depth, the most common but also most limited proxy protocol. Despite their ubiquity (nearly every corporate network uses them), HTTP proxies have fundamental security and architectural limitations that make them unsuitable for many privacy-critical use cases.

Understanding how they work at the protocol level, their security implications, header manipulation, caching semantics, and modern protocol variations (HTTP/2, HTTP/3) is crucial for informed proxy selection and effective browser automation.

Module Navigation

← Network Fundamentals - TCP/IP, UDP, OSI model
← Network & Security Overview - Module introduction
→ SOCKS Proxies - More secure alternative
→ Proxy Detection - How to avoid detection

For practical configuration, see Proxy Configuration.

Security Limitations

HTTP proxies operate at Layer 7 (Application Layer in the OSI model). This positioning gives them:

Full visibility into unencrypted HTTP traffic (URLs, headers, bodies) Modification capabilities (can alter requests/responses in-flight) Intelligent caching (understands HTTP semantics)

But also means they:

Can read, log, and modify all unencrypted HTTP traffic Cannot proxy non-HTTP protocols (FTP, SSH, SMTP, custom protocols) Must terminate TLS to inspect HTTPS (breaking end-to-end encryption)

For true privacy, use SOCKS5 or ensure end-to-end TLS encryption remains intact.

Introduction: Layer 7 Proxying

HTTP proxies are application-layer proxies, operating at Layer 7 of the OSI model. Unlike SOCKS proxies (Layer 5) which blindly forward bytes, HTTP proxies understand HTTP protocol semantics. They parse requests, interpret headers, apply caching logic, and can modify traffic based on HTTP rules.

Historical Context:

HTTP proxies emerged in the mid-1990s as corporate networks grew and organizations needed:

Content filtering - Block inappropriate websites
Bandwidth optimization - Cache frequently accessed resources
Access control - Enforce usage policies
Security - Inspect traffic for malware

The HTTP/1.0 specification (RFC 1945, 1996) formalized proxy behavior, and HTTP/1.1 (RFC 2616, 1999, updated by RFC 7230-7237) refined it significantly. These RFCs define how proxies should handle caching, persistent connections, and forwarding semantics.

Why Application-Layer Positioning Matters:

Operating at Layer 7 means HTTP proxies:

See full HTTP requests - Method, URL, headers, body (if unencrypted)
Can make intelligent decisions - Cache based on Cache-Control, rewrite URLs, compress responses
Leave fingerprints - Add headers like Via, X-Forwarded-For revealing proxy usage
Are protocol-specific - Can only proxy HTTP/HTTPS, not other protocols

This deep integration with HTTP is both their strength (rich functionality) and weakness (limited scope, privacy concerns).

HTTP Proxy Operation: Request Forwarding

An HTTP proxy intercepts HTTP requests from clients, processes them, and forwards them to target servers. The proxy acts as both a server (to the client) and a client (to the target server), maintaining two separate TCP connections.

The HTTP Proxy Request Flow

sequenceDiagram
    participant Client as Client Browser
    participant Proxy as HTTP Proxy
    participant Server as Target Server

    Note over Client: Client configured to use proxy<br/>(explicit proxy setting)

    Client->>Proxy: GET http://example.com/page HTTP/1.1<br/>Host: example.com<br/>User-Agent: Mozilla/5.0
    Note over Client,Proxy: TCP connection #1:<br/>Client ⟷ Proxy

    Note over Proxy: Proxy receives FULL HTTP request<br/>Parses method, URL, headers, body

    Note over Proxy: Proxy processing:<br/>1. Validate authentication<br/>2. Check cache (if GET)<br/>3. Apply access control<br/>4. Modify headers

    Proxy->>Proxy: Add proxy headers:<br/>Via: 1.1 proxy.example.com<br/>X-Forwarded-For: 192.168.1.100<br/>X-Real-IP: 192.168.1.100

    Proxy->>Server: GET /page HTTP/1.1<br/>Host: example.com<br/>Via: 1.1 proxy.example.com<br/>X-Forwarded-For: 192.168.1.100
    Note over Proxy,Server: TCP connection #2:<br/>Proxy ⟷ Server

    Server->>Proxy: HTTP/1.1 200 OK<br/>Content-Type: text/html<br/>Cache-Control: max-age=3600<br/>[response body]

    Note over Proxy: Proxy processing:<br/>1. Cache response (if cacheable)<br/>2. Filter content (malware, ads)<br/>3. Compress (if Accept-Encoding)<br/>4. Log transaction

    Proxy->>Client: HTTP/1.1 200 OK<br/>Content-Type: text/html<br/>Via: 1.1 proxy.example.com<br/>X-Cache: HIT<br/>[possibly modified body]

    Note over Client: Client sees response<br/>May include proxy headers

Detailed Breakdown:

1. Request Initiation (Client → Proxy)

The client sends a full HTTP request to the proxy, including the absolute URI (not just path):

GET http://example.com/page HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
Accept: text/html,application/xhtml+xml
Accept-Encoding: gzip, deflate
Connection: keep-alive

Key difference from direct requests: The GET line includes the full URL (http://example.com/page), not just the path (/page). This tells the proxy where to forward the request.

Absolute vs Relative URIs

Direct to server: GET /page HTTP/1.1
Through proxy: GET http://example.com/page HTTP/1.1

This difference is how the proxy knows the destination. The Host header alone isn't sufficient because the proxy might be forwarding to multiple domains.

2. Proxy Processing - Request Analysis

The proxy parses the entire HTTP request and makes decisions:

# Simplified proxy processing logic
def process_request(request):
    # 1. Authentication check
    if not authenticate_user(request.headers.get('Proxy-Authorization')):
        return HTTP_407_PROXY_AUTH_REQUIRED

    # 2. Access control
    if is_blocked(request.url):
        return HTTP_403_FORBIDDEN

    # 3. Cache check (for GET requests)
    if request.method == 'GET':
        cached = check_cache(request.url)
        if cached and not_expired(cached):
            return cached  # Cache hit - no server request needed

    # 4. Header modification
    request.headers['Via'] = '1.1 proxy.example.com'
    request.headers['X-Forwarded-For'] = client_ip
    request.headers['X-Real-IP'] = client_ip

    # 5. Forward to server
    return forward_to_server(request)

3. Request Forwarding (Proxy → Server)

The proxy establishes a separate TCP connection to the target server and forwards the request. The proxy may modify headers:

Headers added by proxy:

Via: 1.1 proxy.example.com - Identifies proxy in the chain (RFC 7230)
X-Forwarded-For: 192.168.1.100 - Original client IP (de facto standard)
X-Real-IP: 192.168.1.100 - Original client IP (alternative header)
X-Forwarded-Proto: http - Original protocol (http vs https)
X-Forwarded-Host: example.com - Original Host header

Privacy Leakage via Proxy Headers

These headers reveal you're using a proxy! Detection systems look for: - Via header present → Proxy confirmed - X-Forwarded-For with multiple IPs → Proxy chain - X-Real-IP doesn't match connecting IP → Proxy confirmed

Sophisticated proxies can strip these headers, but many don't by default.

4. Server Response (Server → Proxy)

The server responds to the proxy:

HTTP/1.1 200 OK
Date: Mon, 01 Jan 2024 12:00:00 GMT
Server: nginx/1.18.0
Content-Type: text/html; charset=UTF-8
Content-Length: 1234
Cache-Control: public, max-age=3600
ETag: "abc123"

<!DOCTYPE html>
<html>...

5. Proxy Processing - Response Handling

The proxy can perform various operations on the response:

def process_response(response, request):
    # 1. Caching decision
    if should_cache(response):
        cache_entry = {
            'url': request.url,
            'response': response,
            'expires': now() + parse_cache_control(response.headers['Cache-Control'])
        }
        save_to_cache(cache_entry)

    # 2. Content filtering
    if contains_malware(response.body):
        return HTTP_403_FORBIDDEN

    # 3. Compression (if client supports)
    if 'gzip' in request.headers.get('Accept-Encoding', ''):
        response.body = gzip_compress(response.body)
        response.headers['Content-Encoding'] = 'gzip'

    # 4. Header modification
    response.headers['Via'] = '1.1 proxy.example.com'
    response.headers['X-Cache'] = 'HIT' if from_cache else 'MISS'

    # 5. Logging
    log_transaction(request, response)

    return response

6. Response Delivery (Proxy → Client)

The proxy sends the (possibly modified) response back to the client:

HTTP/1.1 200 OK
Date: Mon, 01 Jan 2024 12:00:00 GMT
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
Via: 1.1 proxy.example.com
X-Cache: HIT
Age: 120

[compressed response body]

Key Capabilities and Limitations

What HTTP Proxies CAN Do:

Read entire HTTP request/response (if unencrypted)
Modify headers (add Via, X-Forwarded-For, remove sensitive headers)
Cache responses based on HTTP semantics (Cache-Control, ETag)
Compress/decompress content (gzip, deflate, br)
Filter content (block URLs, scan for malware, remove ads)
Authenticate users (via Proxy-Authorization header)
Log all traffic (URLs visited, data transferred, timing)
Rewrite URLs (redirect, canonicalize)
Inject content (ads, tracking scripts, warnings)

What HTTP Proxies CANNOT Do:

Proxy non-HTTP protocols (FTP, SSH, SMTP, WebSocket upgrade, custom protocols)
Inspect HTTPS content without TLS termination (breaking end-to-end encryption)
Hide that you're using a proxy (unless headers are carefully stripped)
Proxy UDP traffic (WebRTC, DNS, QUIC)
Preserve end-to-end TLS while inspecting content (fundamental incompatibility)

The Fundamental HTTPS Dilemma

HTTP proxies face an impossible choice with HTTPS:

Option A: Blind tunnel (CONNECT method)

Proxy cannot read/cache/filter HTTPS content
End-to-end encryption preserved
Proxy only sees destination IP:port, not URLs

Option B: TLS termination (MITM)

Proxy decrypts HTTPS, inspects content, re-encrypts
Breaks end-to-end encryption
Requires installing proxy's CA certificate on client
Detectable by certificate pinning, CT logs

Most corporate proxies choose Option B (TLS termination) for content filtering. Most privacy-focused proxies choose Option A (blind tunnel).

The HTTP CONNECT Method: HTTPS Tunneling

The CONNECT method (defined in RFC 7231 Section 4.3.6) solves the HTTPS problem: how can an HTTP proxy forward encrypted traffic it cannot read? Answer: become a blind TCP tunnel.

When a client wants to access an HTTPS site through a proxy, it uses CONNECT to ask the proxy to establish a raw TCP tunnel to the destination. After the tunnel is established, the proxy simply forwards bytes in both directions without interpreting them. It becomes Layer 4 (transport) instead of Layer 7 (application).

sequenceDiagram
    participant Client
    participant Proxy
    participant Server

    Note over Client: Client wants https://example.com

    Client->>Proxy: CONNECT example.com:443 HTTP/1.1<br/>Host: example.com:443<br/>Proxy-Authorization: Basic dXNlcjpwYXNz
    Note over Client,Proxy: HTTP request (unencrypted)

    Note over Proxy: Proxy validates credentials<br/>Checks access control for example.com:443

    Proxy->>Server: TCP SYN (three-way handshake)
    Server->>Proxy: TCP SYN-ACK
    Proxy->>Server: TCP ACK
    Note over Proxy,Server: TCP connection established

    Proxy->>Client: HTTP/1.1 200 Connection Established<br/><br/>(blank line = tunnel ready)

    Note right of Proxy: From this point, proxy is a<br/>transparent TCP relay (Layer 4)

    Client->>Server: TLS ClientHello (encrypted handshake)
    Note over Client,Server: TLS 1.3 handshake
    Server->>Client: TLS ServerHello, Certificate, etc.

    Note over Proxy: Proxy sees encrypted bytes,<br/>cannot decrypt or inspect

    Client->>Server: HTTP/2 GET /page<br/>(encrypted inside TLS)
    Server->>Client: HTTP/2 200 OK<br/>(encrypted inside TLS)

    Note over Proxy: Proxy blindly forwards<br/>all encrypted data

CONNECT Request Format

CONNECT example.com:443 HTTP/1.1
Host: example.com:443
Proxy-Authorization: Basic dXNlcjpwYXNz
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)

Key characteristics:

Method: CONNECT (not GET/POST)
Request-URI: host:port (not a path like /page)
Port required: Usually 443 for HTTPS, but any port is valid
Authentication: Proxy-Authorization if proxy requires it
No request body: CONNECT requests have no body

CONNECT Response Format

Success (tunnel established):

HTTP/1.1 200 Connection Established

That's it! Just the status line and a blank line. After the blank line, the HTTP conversation ends and the proxy becomes a transparent TCP tunnel.

Failure responses:

HTTP/1.1 407 Proxy Authentication Required
Proxy-Authenticate: Basic realm="proxy"

HTTP/1.1 403 Forbidden
Content-Type: text/plain

Access to example.com:443 is blocked by policy.

HTTP/1.1 502 Bad Gateway
Content-Type: text/plain

Cannot establish connection to example.com:443

What Happens After CONNECT Succeeds

Once the proxy sends 200 Connection Established, it stops being an HTTP proxy and becomes a Layer 4 TCP relay. The client and server establish TLS directly, with the proxy blindly forwarding bytes:

# Simplified proxy tunnel implementation
def handle_connect(client_socket, target_host, target_port):
    # 1. Establish TCP connection to target
    server_socket = socket.create_connection((target_host, target_port))

    # 2. Send 200 to client
    client_socket.send(b'HTTP/1.1 200 Connection Established\r\n\r\n')

    # 3. Become transparent relay (bidirectional forwarding)
    while True:
        # Wait for data from either side
        readable, _, _ = select.select([client_socket, server_socket], [], [])

        for sock in readable:
            data = sock.recv(8192)
            if not data:
                # Connection closed
                return

            # Forward data to other side
            if sock is client_socket:
                server_socket.send(data)  # Client → Server
            else:
                client_socket.send(data)  # Server → Client

What the proxy can see:

Destination hostname and port - From CONNECT request
Connection timing - When established, how long open
Data volume - Total bytes transferred in each direction
Connection close - When either side terminates

What the proxy CANNOT see:

TLS handshake details - Encrypted, but observable patterns exist
HTTP method/URL - Encrypted inside TLS
Request/response headers - Encrypted inside TLS
Response content - Encrypted inside TLS
Cookies, session tokens - Encrypted inside TLS

TLS Handshake Fingerprinting

While the proxy can't decrypt the TLS handshake, it can observe its structure. The first few packets after CONNECT reveal: - TLS version (1.2 vs 1.3) - ClientHello size and timing - Cipher suite order (via packet sizes) - Extensions used (via packet patterns)

This enables passive TLS fingerprinting (JA3) even through CONNECT tunnels. See Network Fingerprinting for details.

CONNECT vs Direct HTTP Proxying

Aspect	HTTP (no CONNECT)	HTTPS (CONNECT tunnel)
Proxy visibility	Full HTTP request/response	Only destination host:port
Encryption	No (unless TLS termination)	End-to-end TLS (client ⟷ server)
Caching	Yes, based on HTTP semantics	No (encrypted content)
Content filtering	Yes (can inspect/block content)	No (only hostname-based blocking)
Header modification	Yes (can add/remove headers)	No (encrypted headers)
URL visibility	Full URL visible	Only hostname visible
Protocol	HTTP-only	Any protocol over TCP (HTTPS, SSH, FTP-over-TLS)

Security Implications of CONNECT

For Privacy (Good):

End-to-end TLS preserved - Client verifies server certificate directly
No MITM possible (unless client trusts proxy's CA)
Certificate pinning works - Client sees real server certificate

For Privacy (Bad):

Hostname leaked - Proxy knows you accessed example.com:443
Timing analysis possible - Traffic patterns observable
TLS fingerprinting - Passive analysis of ClientHello

For Corporate Security (Bad):

No content inspection - Cannot scan for malware, data loss
Blind to threats - Encrypted C2 traffic, exfiltration invisible
Policy bypass - Users can tunnel arbitrary protocols

This is why corporate proxies often use TLS termination (MITM) instead of CONNECT tunnels.

CONNECT for Non-HTTPS Protocols

While CONNECT is primarily used for HTTPS, it can tunnel any TCP-based protocol:

CONNECT mail.example.com:993 HTTP/1.1
Host: mail.example.com:993

[Tunnel established, IMAPS traffic flows]

CONNECT ssh.example.com:22 HTTP/1.1
Host: ssh.example.com:22

[Tunnel established, SSH traffic flows]

This makes HTTP proxies with CONNECT support surprisingly versatile. They can proxy SSH, FTP-over-TLS, IMAPS, SMTPS, and other encrypted protocols, not just HTTPS.

CONNECT Abuse and Restrictions

Because CONNECT enables tunneling arbitrary TCP connections, many proxies:

Restrict allowed ports (often only 443 for HTTPS)
Block suspicious hosts (known malware C2, Tor nodes)
Log all CONNECT attempts (for security auditing)
Require authentication (to track users)

If you try CONNECT example.com:22 (SSH), many corporate proxies will return 403 Forbidden.

HTTP Proxy Authentication: Access Control

Unlike SOCKS proxies (which support authentication as part of the protocol handshake), HTTP proxies use HTTP authentication headers to control access. This means authentication happens at the application layer using standard HTTP status codes and headers.

The authentication flow follows RFC 7235 (HTTP Authentication), using the 407 Proxy Authentication Required status code (analogous to 401 Unauthorized for servers) and the Proxy-Authorization request header (analogous to Authorization).

Authentication Flow

sequenceDiagram
    participant Client
    participant Proxy
    participant Server

    Note over Client: Client attempts request<br/>(no credentials yet)

    Client->>Proxy: GET http://example.com/ HTTP/1.1<br/>Host: example.com

    Note over Proxy: Proxy requires authentication

    Proxy->>Client: HTTP/1.1 407 Proxy Authentication Required<br/>Proxy-Authenticate: Basic realm="Corporate Proxy"<br/>Proxy-Authenticate: Digest realm="proxy", nonce="abc123"

    Note over Client: Client selects auth method<br/>Encodes credentials<br/>(browser may prompt user)

    Client->>Proxy: GET http://example.com/ HTTP/1.1<br/>Host: example.com<br/>Proxy-Authorization: Basic dXNlcjpwYXNz

    Note over Proxy: Validates credentials<br/>(checks username/password)

    alt Authentication Success
        Proxy->>Server: GET / HTTP/1.1<br/>Host: example.com<br/>(original request forwarded)
        Server->>Proxy: HTTP/1.1 200 OK<br/>[response body]
        Proxy->>Client: HTTP/1.1 200 OK<br/>[response body]
    else Authentication Failure
        Proxy->>Client: HTTP/1.1 407 Proxy Authentication Required<br/>(or 403 Forbidden if credentials invalid)
    end

Authentication Schemes: Detailed Comparison

HTTP supports multiple authentication schemes, each with different security characteristics:

Scheme	RFC	Security Level	Mechanism	Strengths	Weaknesses
Basic	RFC 7617	Low	Base64-encoded `username:password`	Simple, universal support	Cleartext (trivial to decode), no replay protection
Digest	RFC 7616	Medium	Challenge-response with MD5/SHA-256	Prevents cleartext transmission, replay protection	Vulnerable to rainbow tables, rarely implemented
NTLM	Proprietary (Microsoft)	Medium	Challenge-response (NT hash)	Windows integration, SSO	Proprietary, complex, known vulnerabilities
Negotiate	RFC 4559	High	Kerberos/SPNEGO	Strong cryptography, SSO, mutual auth	Complex setup, Active Directory dependency
Bearer	RFC 6750	High (if token secure)	OAuth 2.0 token	API-friendly, revocable tokens	Token theft = full access, requires token infrastructure

Basic Authentication: The Simplest (and Weakest)

Format:

Proxy-Authorization: Basic base64(username:password)

Example:

# Username: "user", Password: "pass"
credentials = "user:pass"
encoded = base64.b64encode(credentials.encode()).decode()
# Result: "dXNlcjpwYXNz"

# In HTTP header:
Proxy-Authorization: Basic dXNlcjpwYXNz

Security Issues:

Base64 is NOT encryption - It's trivial to decode:

>>> import base64
>>> base64.b64decode('dXNlcjpwYXNz').decode()
'user:pass'  # Credentials exposed!

Visible in logs - Proxy logs often include headers, leaking credentials
Replayable - Anyone who intercepts the header can reuse it indefinitely
No integrity protection - Can be modified in transit

Basic Auth Over Unencrypted Connections

Sending Proxy-Authorization: Basic over HTTP (not HTTPS) is catastrophically insecure:

Credentials transmitted in cleartext (base64 is not encryption!)
Visible to any network observer (ISP, WiFi snoopers, MITM attackers)
Logged by intermediate proxies and servers

Always use Basic Auth over TLS/HTTPS, or use a more secure scheme like Digest or Negotiate.

When to use Basic:

Proxy connection is over TLS (HTTPS to proxy)
Quick testing/development (never production!)
Legacy systems with no alternative

Digest Authentication: Challenge-Response

Format:

# Server challenge
Proxy-Authenticate: Digest realm="proxy",
                          qop="auth",
                          nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093",
                          opaque="5ccc069c403ebaf9f0171e9517f40e41"

# Client response
Proxy-Authorization: Digest username="user",
                            realm="proxy",
                            nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093",
                            uri="http://example.com/",
                            qop=auth,
                            nc=00000001,
                            cnonce="0a4f113b",
                            response="6629fae49393a05397450978507c4ef1",
                            opaque="5ccc069c403ebaf9f0171e9517f40e41"

How it works:

Proxy sends challenge with random nonce (number used once)

Client computes hash:

HA1 = MD5(username + ":" + realm + ":" + password)
HA2 = MD5(method + ":" + uri)
response = MD5(HA1 + ":" + nonce + ":" + nc + ":" + cnonce + ":" + qop + ":" + HA2)

Proxy verifies by computing same hash and comparing

Security Improvements over Basic:

Password never transmitted (only hash)
Replay protection (nonce changes each challenge)
Integrity protection (hash includes method and URI)

Remaining Issues:

MD5 is cryptographically weak (rainbow table attacks)
No encryption (content still visible if not using TLS)
Complex to implement correctly

Digest with SHA-256

RFC 7616 (2015) updated Digest to support SHA-256 instead of MD5, addressing the cryptographic weakness. However, support is still limited. Many proxies only implement MD5.

NTLM: Windows-Integrated Authentication

NTLM (NT LAN Manager) is Microsoft's proprietary challenge-response protocol, commonly used in Windows enterprise environments.

Authentication Flow:

Type 1 (Negotiation): Client announces capabilities
Type 2 (Challenge): Server sends 8-byte challenge
Type 3 (Authentication): Client sends NT hash response

Example (simplified):

# Step 1: Negotiation
Proxy-Authorization: NTLM TlRMTVNTUAABAAAAB4IIogAAAAAAAAAAAAAAAAAAAAAFASgKAAAADw==

# Step 2: Challenge
Proxy-Authenticate: NTLM TlRMTVNTUAACAAAADAAMADgAAAAFgooCBqqVKFrKPCMAAAAAAAAAAAAAAAAAAP8=

# Step 3: Authentication
Proxy-Authorization: NTLM TlRMTVNTUAADAAAAGAAYAEgAAAAYABgAYAAAAAwADAB4AAAACAAIAIQAAAAAAAAAAAAAABVCSUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgA==

Advantages:

Seamless Windows integration (SSO with Active Directory)
No password transmitted
Supports domain authentication

Disadvantages:

Proprietary (reverse-engineered, not standardized)
Known cryptographic weaknesses (NTLMv1 broken, NTLMv2 vulnerable)
Complex protocol (multiple round-trips)
Connection-bound (breaks with HTTP/2 multiplexing)

NTLM Security Concerns

NTLM has known vulnerabilities: - Pass-the-hash attacks: Stolen hash can authenticate without knowing password - Relay attacks: Attacker relays authentication to another server - Weak cryptography: NTLMv1 uses DES (broken), NTLMv2 uses MD5 (weak)

Microsoft recommends Kerberos (via Negotiate) over NTLM for new deployments.

Negotiate (Kerberos): Enterprise SSO

Negotiate (RFC 4559) uses SPNEGO (Simple and Protected GSSAPI Negotiation) to select between Kerberos and NTLM, preferring Kerberos.

Kerberos Flow:

Client requests ticket-granting ticket (TGT) from Key Distribution Center (KDC)
Client requests service ticket for proxy service
Client presents service ticket to proxy
Proxy validates ticket with KDC

Example:

Proxy-Authorization: Negotiate YIIFyQYGKwYBBQUCoIIFvTCCBbmgMDAuBgkqhkiC9xIBAgIGCSqGSIb3EgECAgYKKwYBBAGCNwICHgYKKwYBBAGCNwICCqKCBYMEggV/...

Advantages:

Strongest security: AES encryption, mutual authentication
True SSO: No password prompts for domain users
Ticket expiration: Time-limited authentication
Auditing: Centralized KDC logging

Disadvantages:

Complex setup: Requires Active Directory infrastructure
Limited cross-platform: Best on Windows, limited macOS/Linux support
Time synchronization: Requires accurate clocks (Kerberos tickets time-sensitive)

Kerberos in Browser Automation

Pydoll (and most headless browsers) have limited Kerberos support because:

Requires OS-level integration (ticket cache, keytab)
Needs domain-joined machine
Complex to configure programmatically

For automated testing in Kerberos environments, consider using a service account with Basic/Digest auth instead.

Practical Pydoll Configuration

Basic Authentication:

from pydoll import Chrome, ChromiumOptions

options = ChromiumOptions()
options.add_argument('--proxy-server=http://user:pass@ip:port')

async with Chrome(options=options) as browser:
    tab = await browser.start()
    await tab.go_to('http://example.com')

Authentication via Fetch Domain (Advanced):

Pydoll uses Chrome's Fetch domain to automatically handle proxy authentication for any scheme the browser supports (Basic, Digest, NTLM, Negotiate):

# Internal Pydoll implementation (simplified)
async def handle_auth_required(self, event):
    # Browser detected 407 Proxy Authentication Required
    auth_challenge_id = event['requestId']

    # Respond with credentials
    await self._execute_command('Fetch.continueWithAuth', {
        'requestId': auth_challenge_id,
        'authChallengeResponse': {
            'response': 'ProvideCredentials',
            'username': self.proxy_username,
            'password': self.proxy_password
        }
    })

This approach works for all authentication schemes without Pydoll needing to implement protocol-specific logic. Chrome handles Digest/NTLM/Negotiate internally.

Authentication Best Practices

Use TLS-encrypted proxy connections (HTTPS proxy or SSH tunnel)
Prefer Bearer tokens for API proxies (revocable, time-limited)
Use Digest over Basic if TLS isn't available
Rotate credentials regularly
Monitor auth failures (may indicate credential theft)

Never use Basic auth over HTTP (unencrypted connection)
Don't hardcode credentials (use environment variables)
Don't reuse credentials across proxies
Don't log Proxy-Authorization headers

Modern HTTP Protocols and Proxying

Traditional HTTP/1.1 proxies are well-understood, but modern protocols introduce new considerations.

HTTP/2 Proxies

HTTP/2 introduced multiplexing, which fundamentally changes how proxies handle connections:

Key differences:

Feature	HTTP/1.1	HTTP/2
Connections	One request per connection (or sequential)	Multiple requests over single connection
Multiplexing	No (head-of-line blocking)	Yes (concurrent streams)
Header Compression	None	HPACK compression
Server Push	Not supported	Server can push resources
Proxy Complexity	Simple request/response forwarding	Must manage stream prioritization

Implications for proxying:

# HTTP/1.1 Proxy: Simple one-to-one mapping
client_conn_1 → proxy → server_conn_1  # Request A
client_conn_2 → proxy → server_conn_2  # Request B

# HTTP/2 Proxy: Complex stream management
client_conn (streams 1,3,5) → proxy → server_conn (streams 2,4,6)
#     ↓ Must maintain stream IDs and prioritization

Performance impact:

Positive: Reduced connection overhead, better bandwidth utilization
Negative: Proxy must parse binary framing, manage stream states
Leakage risk: Stream IDs and priorities can fingerprint client behavior

HTTP/2 Proxy Detection

HTTP/2 multiplexing makes it harder to correlate requests with clients when multiple users share a proxy, but stream metadata (window sizes, priority settings) can still fingerprint individual clients.

HTTP/3 (QUIC) Proxies

HTTP/3 runs over QUIC (UDP-based) instead of TCP, introducing new challenges:

QUIC characteristics:

Aspect	TCP + TLS	QUIC (UDP)
Transport	TCP (connection-oriented)	UDP (connectionless)
Handshake	Separate TCP + TLS (2 RTT)	Combined (0-1 RTT)
Head-of-line blocking	Yes (TCP level)	No (stream-level only)
Connection migration	Not supported	Supported (survives IP changes)
Proxy compatibility	Excellent	Limited (UDP support required)

Implications for proxying:

UDP requirement: Traditional HTTP proxies (TCP-only) cannot handle HTTP/3
Connection migration: QUIC connections can survive IP changes, complicating proxy session management
Encrypted transport: QUIC encrypts almost everything, including connection metadata
CONNECT-UDP: New method required (RFC 9298) for proxying QUIC

# Traditional proxy chain
Client --TCP--> HTTP Proxy --TCP--> Server

# HTTP/3 proxy chain (requires CONNECT-UDP)
Client --UDP/QUIC--> Proxy --UDP/QUIC--> Server
       (or)
Client --TCP/CONNECT-UDP--> Proxy --UDP/QUIC--> Server

HTTP/3 Proxy Support

Most traditional proxies (including many commercial services) don't support HTTP/3. Browsers typically fall back to HTTP/2 or HTTP/1.1 when proxies don't support QUIC.

This fallback can be a privacy issue: If you expect HTTP/3's encrypted metadata but fall back to HTTP/1.1, more information may leak through the proxy.

Protocol Negotiation and Downgrade Attacks

sequenceDiagram
    participant Client
    participant Proxy
    participant Server

    Client->>Proxy: HTTP/3 request over QUIC
    Note over Proxy: Proxy doesn't support UDP/QUIC
    Proxy->>Client: Connection failed

    Client->>Proxy: Fallback to HTTP/2 over TCP
    Note over Proxy: Proxy supports HTTP/2
    Proxy->>Server: Forward as HTTP/1.1
    Note over Proxy: Server metadata visible to proxy

    Server->>Proxy: HTTP/1.1 response
    Proxy->>Client: Upgrade to HTTP/2 response

Security considerations:

Downgrade attacks: Attacker forces fallback to less secure protocol
Metadata leakage: HTTP/1.1 exposes headers that HTTP/3 would encrypt
Performance degradation: Multiplexing benefits lost in downgrade

Modern Protocol Best Practices

Test your proxy with HTTP/2 and HTTP/3 to understand fallback behavior
Monitor for unexpected protocol downgrades (may indicate MITM)
Consider QUIC-aware proxies if working with modern web apps
Be aware that HTTP/3 adoption varies by region and CDN

Summary and Key Takeaways

HTTP and HTTPS proxies are the most common but also most limited proxy protocol. Understanding their architecture, capabilities, and fundamental security limitations is essential for making informed decisions in browser automation.

Core Concepts Covered

1. Layer 7 Operation:

HTTP proxies operate at the Application Layer, giving them full visibility into HTTP traffic
Can read/modify URLs, headers, cookies, request bodies (for unencrypted HTTP)
Protocol-specific: only works with HTTP/HTTPS, not FTP/SSH/SMTP/custom protocols

2. The CONNECT Method:

HTTPS tunneling via CONNECT transforms the proxy into a blind TCP relay
After 200 Connection Established, proxy becomes Layer 4 (cannot inspect encrypted traffic)
Preserves end-to-end TLS between client and server
Leaks hostname and port to proxy, but not URLs or content

3. Request/Response Proxying:

HTTP proxies use absolute URIs in requests (GET http://example.com/page)
Maintain two TCP connections: client↔proxy and proxy↔server
Can add headers (Via, X-Forwarded-For, X-Real-IP) revealing proxy usage
Support caching based on HTTP semantics (Cache-Control, ETag)

4. Authentication:

Uses HTTP status 407 and Proxy-Authorization header
Basic: Simple but insecure (base64-encoded, not encrypted)
Digest: Challenge-response with hashing (better, but MD5 weak)
NTLM: Windows integration (complex, proprietary)
Negotiate/Kerberos: Strongest (enterprise SSO, Active Directory)

5. Modern Protocols:

HTTP/2: Multiplexing, binary framing, HPACK compression
HTTP/3/QUIC: UDP-based, 0-RTT, connection migration
Most proxies don't support HTTP/3 (requires UDP relay support)
Browsers fall back to HTTP/2 or HTTP/1.1 if proxy lacks QUIC support

Security Implications

Strengths:

Mature protocol with widespread support
Intelligent caching reduces bandwidth and latency
Content filtering enables malware scanning, ad blocking
CONNECT tunneling preserves end-to-end TLS for HTTPS

Weaknesses:

Can read all unencrypted HTTP traffic (URLs, headers, bodies)
Adds identifying headers (Via, X-Forwarded-For) revealing proxy usage
Cannot proxy non-HTTP protocols (FTP, SSH, custom)
No UDP support (WebRTC leaks real IP)
TLS termination for inspection breaks end-to-end encryption

When to Use HTTP Proxies

Good Use Cases:

Corporate networks requiring content filtering and monitoring
Caching proxies for bandwidth optimization
Simple web scraping where stealth isn't critical
Legacy systems only supporting HTTP proxies
URL-based access control (blocking specific domains)

Bad Use Cases:

Privacy-critical automation (use SOCKS5 instead)
Non-HTTP protocols (FTP, SSH, custom → use SOCKS5)
WebRTC applications (UDP not supported → use SOCKS5 or VPN)
Certificate pinning environments (CONNECT breaks MITM)
Stealth automation (headers leak proxy usage)

HTTP Proxy vs SOCKS5: Quick Decision Matrix

Need	HTTP Proxy	SOCKS5
Content filtering	Yes	No
URL-based blocking	Yes	No (only IP:port)
Caching	Yes	No
UDP support	No	Yes (SOCKS5)
Protocol flexibility	HTTP only	Any TCP/UDP
Privacy	Low (sees HTTP)	High (blind forwarding)
Stealth	Low (headers leak)	High (transparent)
DNS privacy	Client resolves	Remote resolution
Complexity	Simple	Moderate

General Recommendation: - Corporate/Enterprise: HTTP proxy (content control, caching) - Privacy/Automation: SOCKS5 (stealth, protocol flexibility) - Maximum Security: SOCKS5 over SSH tunnel or VPN

Final Thoughts

HTTP proxies are a double-edged sword: powerful for content control and caching, but fundamentally incompatible with strong privacy due to their application-layer positioning.

For browser automation requiring stealth and flexibility, SOCKS5 is almost always the better choice. HTTP proxies should be used when:

You control the proxy (corporate environment)
You need HTTP-specific features (caching, URL filtering)
SOCKS5 isn't available

Understanding HTTP proxy architecture (its capabilities, limitations, and security model) enables you to make informed decisions rather than blindly copying proxy configurations.

Next steps:

Read SOCKS Proxies to understand the superior alternative
Learn Proxy Detection techniques to avoid leaking proxy usage
Configure proxies in Pydoll using Proxy Configuration

HTTP/HTTPS Proxy Architecture

Introduction: Layer 7 Proxying

HTTP Proxy Operation: Request Forwarding

The HTTP Proxy Request Flow

Key Capabilities and Limitations

The HTTP CONNECT Method: HTTPS Tunneling

CONNECT Request Format

CONNECT Response Format

What Happens After CONNECT Succeeds

CONNECT vs Direct HTTP Proxying

Security Implications of CONNECT

CONNECT for Non-HTTPS Protocols

HTTP Proxy Authentication: Access Control

Authentication Flow

Authentication Schemes: Detailed Comparison

Basic Authentication: The Simplest (and Weakest)

Digest Authentication: Challenge-Response

NTLM: Windows-Integrated Authentication

Negotiate (Kerberos): Enterprise SSO

Practical Pydoll Configuration

Modern HTTP Protocols and Proxying

HTTP/2 Proxies

HTTP/3 (QUIC) Proxies

Protocol Negotiation and Downgrade Attacks

Summary and Key Takeaways

Core Concepts Covered

Security Implications

When to Use HTTP Proxies

HTTP Proxy vs SOCKS5: Quick Decision Matrix

Further Reading and Next Steps

External References

Practical Testing Tools

Advanced Topics (Beyond This Document)

Final Thoughts

HTTP/HTTPS Proxy Architecture

Introduction: Layer 7 Proxying

HTTP Proxy Operation: Request Forwarding

The HTTP Proxy Request Flow

Key Capabilities and Limitations

The HTTP CONNECT Method: HTTPS Tunneling

CONNECT Request Format

CONNECT Response Format

What Happens After CONNECT Succeeds

CONNECT vs Direct HTTP Proxying

Security Implications of CONNECT

CONNECT for Non-HTTPS Protocols

HTTP Proxy Authentication: Access Control

Authentication Flow

Authentication Schemes: Detailed Comparison

Basic Authentication: The Simplest (and Weakest)

Digest Authentication: Challenge-Response

NTLM: Windows-Integrated Authentication

Negotiate (Kerberos): Enterprise SSO

Practical Pydoll Configuration

Modern HTTP Protocols and Proxying

HTTP/2 Proxies

HTTP/3 (QUIC) Proxies

Protocol Negotiation and Downgrade Attacks

Summary and Key Takeaways

Core Concepts Covered

Security Implications

When to Use HTTP Proxies

HTTP Proxy vs SOCKS5: Quick Decision Matrix

Further Reading and Next Steps

Related Documentation

External References

Practical Testing Tools

Advanced Topics (Beyond This Document)

Final Thoughts