Histogram Visualization

This notebook demonstrates how to use the histogram feature to visualize I/O operation distributions.

Note: Histogram mode is available on Linux (with strace) and macOS (with fs_usage), but not on Windows.

Setup

Load the extension and prepare our test environment.

[1]:
%load_ext iops_profiler
[2]:
import tempfile
import os
import shutil

# Create a temporary directory
test_dir = tempfile.mkdtemp()
print(f"Working directory: {test_dir}")
Working directory: /tmp/tmpp_lfh4kn

Basic Histogram Example

Let’s start with a simple example that creates files of different sizes. The --histogram flag enables visualization.

[3]:
%%iops --histogram
# Create files with varying sizes
for i in range(5):
    filename = os.path.join(test_dir, f'file_{i}.txt')
    # Size increases exponentially: 1KB, 10KB, 100KB, 1MB, 10MB
    size = 1024 * (10 ** i)
    with open(filename, 'w') as f:
        f.write('x' * size)
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.
IOPS Profile Results (psutil (per-process))
Execution Time 0.0182 seconds
Read Operations 2
Write Operations 5
Total Operations 7
Bytes Read 0.00 B (0 bytes)
Bytes Written 10.86 MB (11,382,784 bytes)
Total Bytes 10.86 MB (11,382,784 bytes)
IOPS 384.70 operations/second
Throughput 596.59 MB/second

The histogram shows two charts:

  1. Operation Count Distribution: How many operations fall into each size bucket

  2. Total Bytes Distribution: Total bytes transferred in each size bucket

Both use logarithmic scale on the x-axis to show the wide range of operation sizes.

Read Operations Histogram

Now let’s read the files back and see the read operation distribution.

[4]:
%%iops --histogram
# Read files of different sizes
for i in range(5):
    filename = os.path.join(test_dir, f'file_{i}.txt')
    with open(filename, 'r') as f:
        content = f.read()
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.
IOPS Profile Results (psutil (per-process))
Execution Time 0.0104 seconds
Read Operations 12
Write Operations 0
Total Operations 12
Bytes Read 0.00 B (0 bytes)
Bytes Written 0.00 B (0 bytes)
Total Bytes 0.00 B (0 bytes)
IOPS 1153.60 operations/second
Throughput 0.00 B/second

Notice how the distribution might differ from writes:

  • Operating system may cache recently written data

  • Read buffering strategies may differ from write buffering

  • Some reads might be satisfied from memory cache

Mixed Read/Write Operations

Let’s see what happens when we mix read and write operations.

[5]:
%%iops --histogram
# Write small files
for i in range(10):
    small_file = os.path.join(test_dir, f'small_{i}.txt')
    with open(small_file, 'w') as f:
        f.write('data' * 256)  # ~1KB each

# Write medium files
for i in range(5):
    medium_file = os.path.join(test_dir, f'medium_{i}.txt')
    with open(medium_file, 'w') as f:
        f.write('data' * 2560)  # ~10KB each

# Write large file
large_file = os.path.join(test_dir, 'large.txt')
with open(large_file, 'w') as f:
    f.write('data' * 256000)  # ~1MB

# Now read some files back
for i in range(5):
    with open(os.path.join(test_dir, f'small_{i}.txt'), 'r') as f:
        _ = f.read()
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.
IOPS Profile Results (psutil (per-process))
Execution Time 0.0032 seconds
Read Operations 12
Write Operations 16
Total Operations 28
Bytes Read 0.00 B (0 bytes)
Bytes Written 1.07 MB (1,126,400 bytes)
Total Bytes 1.07 MB (1,126,400 bytes)
IOPS 8760.29 operations/second
Throughput 336.09 MB/second

The histogram now shows separate lines for:

  • Reads (one color)

  • Writes (another color)

  • All operations combined (third color)

This makes it easy to see how read and write patterns differ.

Analyzing Buffer Size Impact

One practical use of histograms is to analyze how buffer sizes affect I/O patterns.

[6]:
%%iops --histogram
# Small buffer size (default)
test_file = os.path.join(test_dir, 'buffer_test.txt')
with open(test_file, 'w') as f:
    for i in range(1000):
        f.write('x' * 100)
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.
IOPS Profile Results (psutil (per-process))
Execution Time 0.0008 seconds
Read Operations 2
Write Operations 13
Total Operations 15
Bytes Read 0.00 B (0 bytes)
Bytes Written 100.00 KB (102,400 bytes)
Total Bytes 100.00 KB (102,400 bytes)
IOPS 19710.08 operations/second
Throughput 128.32 MB/second
[7]:
%%iops --histogram
# Larger buffer size
test_file_buffered = os.path.join(test_dir, 'buffer_test_large.txt')
with open(test_file_buffered, 'w', buffering=8192) as f:
    for i in range(1000):
        f.write('x' * 100)
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.
IOPS Profile Results (psutil (per-process))
Execution Time 0.0008 seconds
Read Operations 2
Write Operations 13
Total Operations 15
Bytes Read 0.00 B (0 bytes)
Bytes Written 100.00 KB (102,400 bytes)
Total Bytes 100.00 KB (102,400 bytes)
IOPS 18921.67 operations/second
Throughput 123.19 MB/second

Compare the two histograms:

  • The larger buffer may result in fewer, larger operations

  • This can improve throughput but increase latency

  • The histogram makes the difference visually clear

Real-World Example: CSV Writing

Let’s look at a more realistic scenario - writing CSV data.

[8]:
%%iops --histogram
import csv

csv_file = os.path.join(test_dir, 'data.csv')
with open(csv_file, 'w', newline='') as f:
    writer = csv.writer(f)
    # Write header
    writer.writerow(['id', 'name', 'value', 'description'])
    # Write data rows
    for i in range(1000):
        writer.writerow([i, f'item_{i}', i * 1.5, f'Description for item {i}'])
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.

⚠️ Histograms not available for psutil measurement mode.
IOPS Profile Results (psutil (per-process))
Execution Time 0.0021 seconds
Read Operations 2
Write Operations 6
Total Operations 8
Bytes Read 0.00 B (0 bytes)
Bytes Written 44.00 KB (45,056 bytes)
Total Bytes 44.00 KB (45,056 bytes)
IOPS 3800.48 operations/second
Throughput 20.41 MB/second

The histogram reveals:

  • How the CSV writer batches operations

  • Whether writes are uniform or variable in size

  • Opportunities for optimization (e.g., adjusting buffer size)

Understanding the Histogram

X-axis: Bytes per Operation (log scale)

Shows the size of individual I/O operations. The logarithmic scale allows you to see both tiny (< 1KB) and large (> 1MB) operations on the same chart.

Y-axis (Top chart): Operation Count

How many operations fall into each size bucket. Helps identify the most common operation sizes.

Y-axis (Bottom chart): Total Bytes

Total bytes transferred in each size bucket. Shows which operation sizes contribute most to overall data transfer.

Interpretation Tips

  • Many small operations: May indicate inefficient buffering

  • Few large operations: Usually more efficient for throughput

  • Bimodal distribution: Suggests different types of operations (e.g., metadata vs. data)

  • Read vs. Write differences: May reveal caching or buffering strategies

Cleanup

[9]:
shutil.rmtree(test_dir)
print("Cleanup complete!")
Cleanup complete!

Summary

In this notebook, we learned:

  1. How to enable histogram visualization with --histogram

  2. Interpreting operation count and bytes distribution charts

  3. Analyzing read vs. write patterns

  4. Using histograms to optimize buffer sizes

  5. Applying histogram analysis to real-world scenarios

Histograms are particularly useful for:

  • Understanding I/O patterns in complex code

  • Identifying inefficiencies (many small operations)

  • Optimizing buffer and chunk sizes

  • Comparing different implementation strategies

Remember: Histogram mode is only available on Linux and macOS, not on Windows.