Histogram Visualization

This notebook demonstrates how to use the histogram feature to visualize I/O operation distributions.

Note: Histogram mode is available on Linux (with strace) and macOS (with fs_usage), but not on Windows.

Setup

Load the extension and prepare our test environment.

[1]:

%load_ext iops_profiler

[2]:

import tempfile
import os
import shutil

# Create a temporary directory
test_dir = tempfile.mkdtemp()
print(f"Working directory: {test_dir}")

Working directory: /tmp/tmpp_lfh4kn

Basic Histogram Example

Let’s start with a simple example that creates files of different sizes. The --histogram flag enables visualization.

[3]:

%%iops --histogram
# Create files with varying sizes
for i in range(5):
    filename = os.path.join(test_dir, f'file_{i}.txt')
    # Size increases exponentially: 1KB, 10KB, 100KB, 1MB, 10MB
    size = 1024 * (10 ** i)
    with open(filename, 'w') as f:
        f.write('x' * size)

⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.

IOPS Profile Results (psutil (per-process))
Execution Time	0.0182 seconds
Read Operations	2
Write Operations	5
Total Operations	7
Bytes Read	0.00 B (0 bytes)
Bytes Written	10.86 MB (11,382,784 bytes)
Total Bytes	10.86 MB (11,382,784 bytes)
IOPS	384.70 operations/second
Throughput	596.59 MB/second

The histogram shows two charts:

Operation Count Distribution: How many operations fall into each size bucket
Total Bytes Distribution: Total bytes transferred in each size bucket

Both use logarithmic scale on the x-axis to show the wide range of operation sizes.

Read Operations Histogram

Now let’s read the files back and see the read operation distribution.

[4]:

%%iops --histogram
# Read files of different sizes
for i in range(5):
    filename = os.path.join(test_dir, f'file_{i}.txt')
    with open(filename, 'r') as f:
        content = f.read()

⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.

IOPS Profile Results (psutil (per-process))
Execution Time	0.0104 seconds
Read Operations	12
Write Operations	0
Total Operations	12
Bytes Read	0.00 B (0 bytes)
Bytes Written	0.00 B (0 bytes)
Total Bytes	0.00 B (0 bytes)
IOPS	1153.60 operations/second
Throughput	0.00 B/second

Notice how the distribution might differ from writes:

Operating system may cache recently written data
Read buffering strategies may differ from write buffering
Some reads might be satisfied from memory cache

Mixed Read/Write Operations

Let’s see what happens when we mix read and write operations.

[5]:

%%iops --histogram
# Write small files
for i in range(10):
    small_file = os.path.join(test_dir, f'small_{i}.txt')
    with open(small_file, 'w') as f:
        f.write('data' * 256)  # ~1KB each

# Write medium files
for i in range(5):
    medium_file = os.path.join(test_dir, f'medium_{i}.txt')
    with open(medium_file, 'w') as f:
        f.write('data' * 2560)  # ~10KB each

# Write large file
large_file = os.path.join(test_dir, 'large.txt')
with open(large_file, 'w') as f:
    f.write('data' * 256000)  # ~1MB

# Now read some files back
for i in range(5):
    with open(os.path.join(test_dir, f'small_{i}.txt'), 'r') as f:
        _ = f.read()

⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.

IOPS Profile Results (psutil (per-process))
Execution Time	0.0032 seconds
Read Operations	12
Write Operations	16
Total Operations	28
Bytes Read	0.00 B (0 bytes)
Bytes Written	1.07 MB (1,126,400 bytes)
Total Bytes	1.07 MB (1,126,400 bytes)
IOPS	8760.29 operations/second
Throughput	336.09 MB/second

The histogram now shows separate lines for:

Reads (one color)
Writes (another color)
All operations combined (third color)

This makes it easy to see how read and write patterns differ.

Analyzing Buffer Size Impact

One practical use of histograms is to analyze how buffer sizes affect I/O patterns.

[6]:

%%iops --histogram
# Small buffer size (default)
test_file = os.path.join(test_dir, 'buffer_test.txt')
with open(test_file, 'w') as f:
    for i in range(1000):
        f.write('x' * 100)

⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.

IOPS Profile Results (psutil (per-process))
Execution Time	0.0008 seconds
Read Operations	2
Write Operations	13
Total Operations	15
Bytes Read	0.00 B (0 bytes)
Bytes Written	100.00 KB (102,400 bytes)
Total Bytes	100.00 KB (102,400 bytes)
IOPS	19710.08 operations/second
Throughput	128.32 MB/second

[7]:

%%iops --histogram
# Larger buffer size
test_file_buffered = os.path.join(test_dir, 'buffer_test_large.txt')
with open(test_file_buffered, 'w', buffering=8192) as f:
    for i in range(1000):
        f.write('x' * 100)

⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.

IOPS Profile Results (psutil (per-process))
Execution Time	0.0008 seconds
Read Operations	2
Write Operations	13
Total Operations	15
Bytes Read	0.00 B (0 bytes)
Bytes Written	100.00 KB (102,400 bytes)
Total Bytes	100.00 KB (102,400 bytes)
IOPS	18921.67 operations/second
Throughput	123.19 MB/second

Compare the two histograms:

The larger buffer may result in fewer, larger operations
This can improve throughput but increase latency
The histogram makes the difference visually clear

Real-World Example: CSV Writing

Let’s look at a more realistic scenario - writing CSV data.

[8]:

%%iops --histogram
import csv

csv_file = os.path.join(test_dir, 'data.csv')
with open(csv_file, 'w', newline='') as f:
    writer = csv.writer(f)
    # Write header
    writer.writerow(['id', 'name', 'value', 'description'])
    # Write data rows
    for i in range(1000):
        writer.writerow([i, f'item_{i}', i * 1.5, f'Description for item {i}'])

⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.

IOPS Profile Results (psutil (per-process))
Execution Time	0.0021 seconds
Read Operations	2
Write Operations	6
Total Operations	8
Bytes Read	0.00 B (0 bytes)
Bytes Written	44.00 KB (45,056 bytes)
Total Bytes	44.00 KB (45,056 bytes)
IOPS	3800.48 operations/second
Throughput	20.41 MB/second

The histogram reveals:

How the CSV writer batches operations
Whether writes are uniform or variable in size
Opportunities for optimization (e.g., adjusting buffer size)

Understanding the Histogram

X-axis: Bytes per Operation (log scale)

Shows the size of individual I/O operations. The logarithmic scale allows you to see both tiny (< 1KB) and large (> 1MB) operations on the same chart.

Y-axis (Top chart): Operation Count

How many operations fall into each size bucket. Helps identify the most common operation sizes.

Y-axis (Bottom chart): Total Bytes

Total bytes transferred in each size bucket. Shows which operation sizes contribute most to overall data transfer.

Interpretation Tips

Many small operations: May indicate inefficient buffering
Few large operations: Usually more efficient for throughput
Bimodal distribution: Suggests different types of operations (e.g., metadata vs. data)
Read vs. Write differences: May reveal caching or buffering strategies

Cleanup

[9]:

shutil.rmtree(test_dir)
print("Cleanup complete!")

Cleanup complete!

Summary

In this notebook, we learned:

How to enable histogram visualization with --histogram
Interpreting operation count and bytes distribution charts
Analyzing read vs. write patterns
Using histograms to optimize buffer sizes
Applying histogram analysis to real-world scenarios

Histograms are particularly useful for:

Understanding I/O patterns in complex code
Identifying inefficiencies (many small operations)
Optimizing buffer and chunk sizes
Comparing different implementation strategies

Remember: Histogram mode is only available on Linux and macOS, not on Windows.