Network Monitoring

Network monitoring in Pydoll allows you to observe and analyze HTTP requests, responses, and other network activity during browser automation. This is essential for debugging, performance analysis, API testing, and understanding how web applications communicate with servers.

Network vs Fetch Domain

Network domain is for passive monitoring (observing traffic). Fetch domain is for active interception (modifying requests/responses). This guide focuses on monitoring. For request interception, see the advanced documentation.

Enabling Network Events

Before you can monitor network activity, you must enable the Network domain:

import asyncio
from pydoll.browser.chromium import Chrome

async def main():
    async with Chrome() as browser:
        tab = await browser.start()

        # Enable network monitoring
        await tab.enable_network_events()

        # Now navigate
        await tab.go_to('https://api.github.com')

        # Don't forget to disable when done (optional but recommended)
        await tab.disable_network_events()

asyncio.run(main())

Enable Before Navigation

Always enable network events before navigating to capture all requests. Requests made before enabling won't be captured.

Getting Network Logs

Pydoll automatically stores network logs when network events are enabled. You can retrieve them using get_network_logs():

import asyncio
from pydoll.browser.chromium import Chrome

async def analyze_requests():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()

        # Navigate to a page
        await tab.go_to('https://httpbin.org/json')

        # Wait for page to fully load
        await asyncio.sleep(2)

        # Get all network logs
        logs = await tab.get_network_logs()

        print(f"Total requests captured: {len(logs)}")

        for log in logs:
            request = log['params']['request']
            print(f"→ {request['method']} {request['url']}")

asyncio.run(analyze_requests())

Production-Ready Waiting

The examples above use asyncio.sleep(2) for simplicity. In production code, consider using more explicit waiting strategies:

Wait for specific elements to appear
Use the Event System to detect when all resources have loaded
Implement network idle detection (see Real-Time Network Monitoring section)

This ensures your automation waits exactly as long as needed, no more, no less.

Filtering Network Logs

You can filter logs by URL pattern:

import asyncio
from pydoll.browser.chromium import Chrome

async def filter_logs_example():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()
        await tab.go_to('https://example.com')
        await asyncio.sleep(2)

        # Get all logs
        all_logs = await tab.get_network_logs()

        # Get logs for specific domain
        api_logs = await tab.get_network_logs(filter='api.example.com')

        # Get logs for specific endpoint
        user_logs = await tab.get_network_logs(filter='/api/users')

asyncio.run(filter_logs_example())

Understanding Network Event Structure

Network logs contain detailed information about each request. Here's the structure:

RequestWillBeSentEvent

This event is fired when a request is about to be sent:

{
    'method': 'Network.requestWillBeSent',
    'params': {
        'requestId': 'unique-request-id',
        'loaderId': 'loader-id',
        'documentURL': 'https://example.com',
        'request': {
            'url': 'https://api.example.com/data',
            'method': 'GET',  # or 'POST', 'PUT', 'DELETE', etc.
            'headers': {
                'User-Agent': 'Chrome/...',
                'Accept': 'application/json',
                ...
            },
            'postData': '...',  # Only present for POST/PUT requests
            'initialPriority': 'High',
            'referrerPolicy': 'strict-origin-when-cross-origin'
        },
        'timestamp': 1234567890.123,
        'wallTime': 1234567890.123,
        'initiator': {
            'type': 'script',  # or 'parser', 'other'
            'stack': {...}  # Call stack if initiated from script
        },
        'type': 'XHR',  # Resource type: Document, Script, Image, XHR, etc.
        'frameId': 'frame-id',
        'hasUserGesture': False
    }
}

Key Fields Reference

Field	Location	Type	Description
`requestId`	`params.requestId`	`str`	Unique identifier for this request
`url`	`params.request.url`	`str`	Complete request URL
`method`	`params.request.method`	`str`	HTTP method (GET, POST, etc.)
`headers`	`params.request.headers`	`dict`	Request headers
`postData`	`params.request.postData`	`str`	Request body (POST/PUT)
`timestamp`	`params.timestamp`	`float`	Monotonic time when request started
`type`	`params.type`	`str`	Resource type (Document, XHR, Image, etc.)
`initiator`	`params.initiator`	`dict`	What triggered this request

Getting Response Bodies

To get the actual response content, use get_network_response_body():

import asyncio
from pydoll.browser.chromium import Chrome

async def fetch_api_response():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()

        # Navigate to API endpoint
        await tab.go_to('https://httpbin.org/json')
        await asyncio.sleep(2)

        # Get all requests
        logs = await tab.get_network_logs()

        for log in logs:
            request_id = log['params']['requestId']
            url = log['params']['request']['url']

            # Only get response for JSON endpoint
            if 'httpbin.org/json' in url:
                try:
                    # Get response body
                    response_body = await tab.get_network_response_body(request_id)
                    print(f"Response from {url}:")
                    print(response_body)
                except Exception as e:
                    print(f"Could not get response body: {e}")

asyncio.run(fetch_api_response())

Response Body Availability

Response bodies are only available for requests that have completed. Also, some response types (like images or redirects) may not have accessible bodies.

Practical Use Cases

1. API Testing and Validation

Monitor API calls to verify correct requests are being made:

import asyncio
from pydoll.browser.chromium import Chrome

async def validate_api_calls():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()

        # Navigate to your app
        await tab.go_to('https://your-app.com')

        # Trigger some action that makes API calls
        button = await tab.find(id='load-data-button')
        await button.click()
        await asyncio.sleep(2)

        # Get API logs
        api_logs = await tab.get_network_logs(filter='/api/')

        print(f"\n📊 API Calls Summary:")
        print(f"Total API calls: {len(api_logs)}")

        for log in api_logs:
            request = log['params']['request']
            method = request['method']
            url = request['url']

            # Check if correct auth header is present
            headers = request.get('headers', {})
            has_auth = 'Authorization' in headers or 'authorization' in headers

            print(f"\n{method} {url}")
            print(f"  ✓ Has Authorization: {has_auth}")

            # Validate POST data if applicable
            if method == 'POST' and 'postData' in request:
                print(f"  📤 Body: {request['postData'][:100]}...")

asyncio.run(validate_api_calls())

2. Performance Analysis

Analyze request timing and identify slow resources:

import asyncio
from pydoll.browser.chromium import Chrome

async def analyze_performance():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()

        await tab.go_to('https://example.com')
        await asyncio.sleep(5)

        logs = await tab.get_network_logs()

        # Store timing data
        timings = []

        for log in logs:
            params = log['params']
            request_id = params['requestId']
            url = params['request']['url']
            resource_type = params.get('type', 'Other')

            timings.append({
                'url': url,
                'type': resource_type,
                'timestamp': params['timestamp']
            })

        # Sort by timestamp
        timings.sort(key=lambda x: x['timestamp'])

        print("\n⏱️  Request Timeline:")
        start_time = timings[0]['timestamp'] if timings else 0

        for timing in timings[:20]:  # Show first 20
            elapsed = (timing['timestamp'] - start_time) * 1000  # Convert to ms
            print(f"{elapsed:7.0f}ms | {timing['type']:12} | {timing['url'][:80]}")

asyncio.run(analyze_performance())

3. Detecting External Resources

Find all external domains your page connects to:

import asyncio
from urllib.parse import urlparse
from collections import Counter
from pydoll.browser.chromium import Chrome

async def analyze_domains():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()

        await tab.go_to('https://news.ycombinator.com')
        await asyncio.sleep(5)

        logs = await tab.get_network_logs()

        # Count requests per domain
        domains = Counter()

        for log in logs:
            url = log['params']['request']['url']
            try:
                domain = urlparse(url).netloc
                if domain:
                    domains[domain] += 1
            except:
                pass

        print("\n🌐 External Domains:")
        for domain, count in domains.most_common(10):
            print(f"  {count:3} requests | {domain}")

asyncio.run(analyze_domains())

4. Monitoring Specific Resource Types

Track specific types of resources like images or scripts:

import asyncio
from pydoll.browser.chromium import Chrome

async def track_resource_types():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()

        await tab.go_to('https://example.com')
        await asyncio.sleep(3)

        logs = await tab.get_network_logs()

        # Group by resource type
        by_type = {}

        for log in logs:
            params = log['params']
            resource_type = params.get('type', 'Other')
            url = params['request']['url']

            if resource_type not in by_type:
                by_type[resource_type] = []

            by_type[resource_type].append(url)

        print("\n📦 Resources by Type:")
        for rtype in sorted(by_type.keys()):
            urls = by_type[rtype]
            print(f"\n{rtype}: {len(urls)} resource(s)")
            for url in urls[:3]:  # Show first 3
                print(f"  • {url}")
            if len(urls) > 3:
                print(f"  ... and {len(urls) - 3} more")

asyncio.run(track_resource_types())

Real-Time Network Monitoring

For real-time monitoring, use event callbacks instead of polling get_network_logs():

Understanding Events

Real-time monitoring uses Pydoll's event system to react to network activity as it happens. For a deep dive into how events work, see Event System.

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.protocol.network.events import (
    NetworkEvent,
    RequestWillBeSentEvent,
    ResponseReceivedEvent,
    LoadingFailedEvent
)

async def real_time_monitoring():
    async with Chrome() as browser:
        tab = await browser.start()

        # Statistics
        stats = {
            'requests': 0,
            'responses': 0,
            'failed': 0
        }

        # Request callback
        async def on_request(event: RequestWillBeSentEvent):
            stats['requests'] += 1
            url = event['params']['request']['url']
            method = event['params']['request']['method']
            print(f"→ {method:6} | {url}")

        # Response callback
        async def on_response(event: ResponseReceivedEvent):
            stats['responses'] += 1
            response = event['params']['response']
            status = response['status']
            url = response['url']

            # Color code by status
            if 200 <= status < 300:
                color = '\033[92m'  # Green
            elif 300 <= status < 400:
                color = '\033[93m'  # Yellow
            else:
                color = '\033[91m'  # Red
            reset = '\033[0m'

            print(f"← {color}{status}{reset} | {url}")

        # Failed callback
        async def on_failed(event: LoadingFailedEvent):
            stats['failed'] += 1
            error = event['params']['errorText']
            print(f"✗ FAILED: {error}")

        # Enable and register callbacks
        await tab.enable_network_events()
        await tab.on(NetworkEvent.REQUEST_WILL_BE_SENT, on_request)
        await tab.on(NetworkEvent.RESPONSE_RECEIVED, on_response)
        await tab.on(NetworkEvent.LOADING_FAILED, on_failed)

        # Navigate
        await tab.go_to('https://example.com')
        await asyncio.sleep(5)

        print(f"\n📊 Summary:")
        print(f"  Requests: {stats['requests']}")
        print(f"  Responses: {stats['responses']}")
        print(f"  Failed: {stats['failed']}")

asyncio.run(real_time_monitoring())

Resource Types Reference

Pydoll captures the following resource types:

Type	Description	Examples
`Document`	Main HTML documents	Page loads, iframe sources
`Stylesheet`	CSS files	External .css, inline styles
`Image`	Image resources	.jpg, .png, .gif, .webp, .svg
`Media`	Audio/video files	.mp4, .webm, .mp3, .ogg
`Font`	Web fonts	.woff, .woff2, .ttf, .otf
`Script`	JavaScript files	.js files, inline scripts
`TextTrack`	Subtitle files	.vtt, .srt
`XHR`	XMLHttpRequest	AJAX requests, legacy API calls
`Fetch`	Fetch API requests	Modern API calls
`EventSource`	Server-Sent Events	Real-time streams
`WebSocket`	WebSocket connections	Bidirectional communication
`Manifest`	Web app manifests	PWA configuration
`Other`	Other resource types	Miscellaneous

Advanced: Extracting Response Timing

Network events include detailed timing information:

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.protocol.network.events import NetworkEvent, ResponseReceivedEvent

async def analyze_timing():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()

        # Custom callback to capture timing
        timing_data = []

        async def on_response(event: ResponseReceivedEvent):
            response = event['params']['response']
            timing = response.get('timing')

            if timing:
                # Calculate different phases
                dns_time = timing.get('dnsEnd', 0) - timing.get('dnsStart', 0)
                connect_time = timing.get('connectEnd', 0) - timing.get('connectStart', 0)
                ssl_time = timing.get('sslEnd', 0) - timing.get('sslStart', 0)
                send_time = timing.get('sendEnd', 0) - timing.get('sendStart', 0)
                wait_time = timing.get('receiveHeadersStart', 0) - timing.get('sendEnd', 0)
                receive_time = timing.get('receiveHeadersEnd', 0) - timing.get('receiveHeadersStart', 0)

                timing_data.append({
                    'url': response['url'][:50],
                    'dns': dns_time if dns_time > 0 else 0,
                    'connect': connect_time if connect_time > 0 else 0,
                    'ssl': ssl_time if ssl_time > 0 else 0,
                    'send': send_time,
                    'wait': wait_time,
                    'receive': receive_time,
                    'total': receive_time + wait_time + send_time
                })

        await tab.on(NetworkEvent.RESPONSE_RECEIVED, on_response)
        await tab.go_to('https://github.com')
        await asyncio.sleep(5)

        # Print timing breakdown
        print("\n⏱️  Request Timing Breakdown (ms):")
        print(f"{'URL':<50} | {'DNS':>6} | {'Connect':>8} | {'SSL':>6} | {'Send':>6} | {'Wait':>6} | {'Receive':>8} | {'Total':>7}")
        print("-" * 120)

        for data in sorted(timing_data, key=lambda x: x['total'], reverse=True)[:10]:
            print(f"{data['url']:<50} | {data['dns']:6.1f} | {data['connect']:8.1f} | {data['ssl']:6.1f} | "
                  f"{data['send']:6.1f} | {data['wait']:6.1f} | {data['receive']:8.1f} | {data['total']:7.1f}")

asyncio.run(analyze_timing())

Timing Fields Explanation

Phase	Fields	Description
DNS	`dnsStart` → `dnsEnd`	DNS lookup time
Connect	`connectStart` → `connectEnd`	TCP connection establishment
SSL	`sslStart` → `sslEnd`	SSL/TLS handshake
Send	`sendStart` → `sendEnd`	Time to send request
Wait	`sendEnd` → `receiveHeadersStart`	Waiting for server response (TTFB)
Receive	`receiveHeadersStart` → `receiveHeadersEnd`	Time to receive response headers

Time to First Byte (TTFB)

TTFB is the "Wait" phase - the time between sending the request and receiving the first byte of the response. This is crucial for performance analysis.

Best Practices

1. Enable Network Events Only When Needed

import asyncio
from pydoll.browser.chromium import Chrome

async def best_practice_enable():
    async with Chrome() as browser:
        tab = await browser.start()

        # ✅ Good: Enable before navigation, disable after
        await tab.enable_network_events()
        await tab.go_to('https://example.com')
        await asyncio.sleep(2)
        logs = await tab.get_network_logs()
        await tab.disable_network_events()

        # ❌ Bad: Leaving it enabled throughout entire session
        # await tab.enable_network_events()
        # ... long automation session ...

2. Filter Logs to Reduce Memory Usage

import asyncio
from pydoll.browser.chromium import Chrome

async def best_practice_filter():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()
        await tab.go_to('https://example.com')
        await asyncio.sleep(2)

        # ✅ Good: Filter for specific requests
        api_logs = await tab.get_network_logs(filter='/api/')

        # ❌ Bad: Getting all logs when you only need specific ones
        all_logs = await tab.get_network_logs()
        filtered = [log for log in all_logs if '/api/' in log['params']['request']['url']]

3. Handle Missing Fields Safely

import asyncio
from pydoll.browser.chromium import Chrome

async def best_practice_safe_access():
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.enable_network_events()
        await tab.go_to('https://example.com')
        await asyncio.sleep(2)

        logs = await tab.get_network_logs()

        # ✅ Good: Safe access with .get()
        for log in logs:
            params = log.get('params', {})
            request = params.get('request', {})
            url = request.get('url', 'Unknown')
            post_data = request.get('postData')  # May be None

            if post_data:
                print(f"POST data: {post_data}")

        # ❌ Bad: Direct access can raise KeyError
        # url = log['params']['request']['url']
        # post_data = log['params']['request']['postData']  # May not exist!

4. Use Event Callbacks for Real-Time Needs

import asyncio
from pydoll.protocol.network.events import NetworkEvent, RequestWillBeSentEvent

# ✅ Good: Real-time monitoring with callbacks
async def on_request(event: RequestWillBeSentEvent):
    print(f"New request: {event['params']['request']['url']}")

await tab.on(NetworkEvent.REQUEST_WILL_BE_SENT, on_request)

# ❌ Bad: Polling logs repeatedly (inefficient)
while True:
    logs = await tab.get_network_logs()
    # Process logs...
    await asyncio.sleep(0.5)  # Wasteful!