User Guide

This guide covers all the features of iops-profiler and how to use them effectively.

Loading the Extension

Before using any iops-profiler magic commands, you must load the extension in your notebook:

%load_ext iops_profiler

You only need to do this once per notebook session. The extension will be available for the rest of your session.

Basic Usage

Line Magic (`%iops`)

Use %iops to profile a single line of code:

%iops open('test.txt', 'w').write('Hello World' * 1000)

This is perfect for quick measurements of one-line operations.

Cell Magic (`%%iops`)

Use %%iops to profile an entire cell of code:

%%iops
# Your code here
with open('test.txt', 'w') as f:
    for i in range(1000):
        f.write(f'Line {i}\n')

This is ideal for profiling code blocks, loops, and complex operations.

Understanding the Results

When you run a profiled cell, you’ll see a results table with these metrics:

Basic Metrics

Time (seconds): Total execution time of your code
Read Ops: Number of read operations performed
Write Ops: Number of write operations performed
Bytes Read: Total bytes read from disk
Bytes Written: Total bytes written to disk

Performance Metrics

Read IOPS: Read operations per second (Read Ops / Time)
Write IOPS: Write operations per second (Write Ops / Time)
Read Throughput (bytes/sec): Bytes read per second (Bytes Read / Time)
Write Throughput (bytes/sec): Bytes written per second (Bytes Written / Time)

Advanced Features

Histogram Visualization

Use the --histogram flag to visualize I/O operation distributions:

%%iops --histogram
import tempfile
import os

# Create files with different sizes
test_dir = tempfile.mkdtemp()
for i in range(5):
    with open(os.path.join(test_dir, f'file_{i}.txt'), 'w') as f:
        f.write('x' * (1024 * (i + 1)))  # Varying sizes

This generates two histogram charts:

Operation Count Distribution: Shows how many I/O operations fall into each size bucket
Total Bytes Distribution: Shows the total bytes transferred in each size bucket

Both charts use logarithmic scale for the x-axis and display separate lines for reads, writes, and combined operations.

When to Use Histograms

Histograms are useful when:

You want to understand the distribution of I/O operation sizes
Your code performs many operations of varying sizes
You’re optimizing buffer sizes or chunk sizes
You’re comparing different I/O strategies

Note: Histogram mode is available when using strace on Linux and fs_usage on macOS. These tools provide operation-level detail needed for histogram generation. If strace is not available on Linux, the extension falls back to psutil (without histogram support). Histogram collection adds some overhead due to detailed tracking.

Practical Examples

Example 1: Comparing Write Strategies

# Strategy 1: Many small writes
%%iops
with open('test1.txt', 'w') as f:
    for i in range(10000):
        f.write('a')

# Strategy 2: Fewer large writes
%%iops
with open('test2.txt', 'w') as f:
    data = 'a' * 10000
    f.write(data)

Compare the IOPS and throughput to see which is more efficient.

Example 2: Buffer Size Optimization

%%iops --histogram
# Test different buffer sizes
with open('large_file.bin', 'wb') as f:
    data = b'x' * 1024 * 1024  # 1 MB
    f.write(data)

Use the histogram to see how the system batches your writes.

Example 3: Read vs Write Performance

%%iops
# Write test data
with open('data.txt', 'w') as f:
    f.write('data' * 100000)

# Read it back
with open('data.txt', 'r') as f:
    content = f.read()

Compare read and write IOPS for your specific use case.

Best Practices

Warm Up: Run your code once before profiling to account for caching and initialization
Multiple Runs: Profile the same operation multiple times to account for variability
Clean Environment: Clear caches and close files between runs when testing
Realistic Data: Use data sizes similar to your production workload
Avoid Timing Noise: Don’t profile in the same cell as print statements or other I/O

Limitations and Caveats

Measurement Accuracy

Very fast operations (< 1 millisecond) may not be measured accurately
Operating system caching can affect results significantly
Network file systems (NFS, SMBFS) may report inaccurate I/O counts

Platform Differences

macOS requires password input for privilege elevation
Windows may not track all I/O operations as precisely as Linux
Different platforms may report operations differently (e.g., buffering behavior)

Overhead

The profiling itself adds some overhead (typically 1-5%)
Histogram mode adds additional overhead due to detailed tracking
Very high frequency operations may be impacted more

Next Steps

See notebooks/index for detailed examples
Read Platform-Specific Notes for platform-specific information
Check Troubleshooting if you encounter issues