Histogram Visualization
This notebook demonstrates how to use the histogram feature to visualize I/O operation distributions.
Note: Histogram mode is available on Linux (with strace) and macOS (with fs_usage), but not on Windows.
Setup
Load the extension and prepare our test environment.
[1]:
%load_ext iops_profiler
[2]:
import tempfile
import os
import shutil
# Create a temporary directory
test_dir = tempfile.mkdtemp()
print(f"Working directory: {test_dir}")
Working directory: /tmp/tmpp_lfh4kn
Basic Histogram Example
Let’s start with a simple example that creates files of different sizes. The --histogram flag enables visualization.
[3]:
%%iops --histogram
# Create files with varying sizes
for i in range(5):
filename = os.path.join(test_dir, f'file_{i}.txt')
# Size increases exponentially: 1KB, 10KB, 100KB, 1MB, 10MB
size = 1024 * (10 ** i)
with open(filename, 'w') as f:
f.write('x' * size)
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.
⚠️ Histograms not available for psutil measurement mode.
| IOPS Profile Results (psutil (per-process)) | |
| Execution Time | 0.0182 seconds |
| Read Operations | 2 |
| Write Operations | 5 |
| Total Operations | 7 |
| Bytes Read | 0.00 B (0 bytes) |
| Bytes Written | 10.86 MB (11,382,784 bytes) |
| Total Bytes | 10.86 MB (11,382,784 bytes) |
| IOPS | 384.70 operations/second |
| Throughput | 596.59 MB/second |
The histogram shows two charts:
Operation Count Distribution: How many operations fall into each size bucket
Total Bytes Distribution: Total bytes transferred in each size bucket
Both use logarithmic scale on the x-axis to show the wide range of operation sizes.
Read Operations Histogram
Now let’s read the files back and see the read operation distribution.
[4]:
%%iops --histogram
# Read files of different sizes
for i in range(5):
filename = os.path.join(test_dir, f'file_{i}.txt')
with open(filename, 'r') as f:
content = f.read()
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.
⚠️ Histograms not available for psutil measurement mode.
| IOPS Profile Results (psutil (per-process)) | |
| Execution Time | 0.0104 seconds |
| Read Operations | 12 |
| Write Operations | 0 |
| Total Operations | 12 |
| Bytes Read | 0.00 B (0 bytes) |
| Bytes Written | 0.00 B (0 bytes) |
| Total Bytes | 0.00 B (0 bytes) |
| IOPS | 1153.60 operations/second |
| Throughput | 0.00 B/second |
Notice how the distribution might differ from writes:
Operating system may cache recently written data
Read buffering strategies may differ from write buffering
Some reads might be satisfied from memory cache
Mixed Read/Write Operations
Let’s see what happens when we mix read and write operations.
[5]:
%%iops --histogram
# Write small files
for i in range(10):
small_file = os.path.join(test_dir, f'small_{i}.txt')
with open(small_file, 'w') as f:
f.write('data' * 256) # ~1KB each
# Write medium files
for i in range(5):
medium_file = os.path.join(test_dir, f'medium_{i}.txt')
with open(medium_file, 'w') as f:
f.write('data' * 2560) # ~10KB each
# Write large file
large_file = os.path.join(test_dir, 'large.txt')
with open(large_file, 'w') as f:
f.write('data' * 256000) # ~1MB
# Now read some files back
for i in range(5):
with open(os.path.join(test_dir, f'small_{i}.txt'), 'r') as f:
_ = f.read()
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.
⚠️ Histograms not available for psutil measurement mode.
| IOPS Profile Results (psutil (per-process)) | |
| Execution Time | 0.0032 seconds |
| Read Operations | 12 |
| Write Operations | 16 |
| Total Operations | 28 |
| Bytes Read | 0.00 B (0 bytes) |
| Bytes Written | 1.07 MB (1,126,400 bytes) |
| Total Bytes | 1.07 MB (1,126,400 bytes) |
| IOPS | 8760.29 operations/second |
| Throughput | 336.09 MB/second |
The histogram now shows separate lines for:
Reads (one color)
Writes (another color)
All operations combined (third color)
This makes it easy to see how read and write patterns differ.
Analyzing Buffer Size Impact
One practical use of histograms is to analyze how buffer sizes affect I/O patterns.
[6]:
%%iops --histogram
# Small buffer size (default)
test_file = os.path.join(test_dir, 'buffer_test.txt')
with open(test_file, 'w') as f:
for i in range(1000):
f.write('x' * 100)
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.
⚠️ Histograms not available for psutil measurement mode.
| IOPS Profile Results (psutil (per-process)) | |
| Execution Time | 0.0008 seconds |
| Read Operations | 2 |
| Write Operations | 13 |
| Total Operations | 15 |
| Bytes Read | 0.00 B (0 bytes) |
| Bytes Written | 100.00 KB (102,400 bytes) |
| Total Bytes | 100.00 KB (102,400 bytes) |
| IOPS | 19710.08 operations/second |
| Throughput | 128.32 MB/second |
[7]:
%%iops --histogram
# Larger buffer size
test_file_buffered = os.path.join(test_dir, 'buffer_test_large.txt')
with open(test_file_buffered, 'w', buffering=8192) as f:
for i in range(1000):
f.write('x' * 100)
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.
⚠️ Histograms not available for psutil measurement mode.
| IOPS Profile Results (psutil (per-process)) | |
| Execution Time | 0.0008 seconds |
| Read Operations | 2 |
| Write Operations | 13 |
| Total Operations | 15 |
| Bytes Read | 0.00 B (0 bytes) |
| Bytes Written | 100.00 KB (102,400 bytes) |
| Total Bytes | 100.00 KB (102,400 bytes) |
| IOPS | 18921.67 operations/second |
| Throughput | 123.19 MB/second |
Compare the two histograms:
The larger buffer may result in fewer, larger operations
This can improve throughput but increase latency
The histogram makes the difference visually clear
Real-World Example: CSV Writing
Let’s look at a more realistic scenario - writing CSV data.
[8]:
%%iops --histogram
import csv
csv_file = os.path.join(test_dir, 'data.csv')
with open(csv_file, 'w', newline='') as f:
writer = csv.writer(f)
# Write header
writer.writerow(['id', 'name', 'value', 'description'])
# Write data rows
for i in range(1000):
writer.writerow([i, f'item_{i}', i * 1.5, f'Description for item {i}'])
⚠️ Could not use strace: [Errno 2] No such file or directory: 'strace'
Falling back to psutil per-process measurement.
⚠️ Histograms not available for psutil measurement mode.
| IOPS Profile Results (psutil (per-process)) | |
| Execution Time | 0.0021 seconds |
| Read Operations | 2 |
| Write Operations | 6 |
| Total Operations | 8 |
| Bytes Read | 0.00 B (0 bytes) |
| Bytes Written | 44.00 KB (45,056 bytes) |
| Total Bytes | 44.00 KB (45,056 bytes) |
| IOPS | 3800.48 operations/second |
| Throughput | 20.41 MB/second |
The histogram reveals:
How the CSV writer batches operations
Whether writes are uniform or variable in size
Opportunities for optimization (e.g., adjusting buffer size)
Understanding the Histogram
X-axis: Bytes per Operation (log scale)
Shows the size of individual I/O operations. The logarithmic scale allows you to see both tiny (< 1KB) and large (> 1MB) operations on the same chart.
Y-axis (Top chart): Operation Count
How many operations fall into each size bucket. Helps identify the most common operation sizes.
Y-axis (Bottom chart): Total Bytes
Total bytes transferred in each size bucket. Shows which operation sizes contribute most to overall data transfer.
Interpretation Tips
Many small operations: May indicate inefficient buffering
Few large operations: Usually more efficient for throughput
Bimodal distribution: Suggests different types of operations (e.g., metadata vs. data)
Read vs. Write differences: May reveal caching or buffering strategies
Cleanup
[9]:
shutil.rmtree(test_dir)
print("Cleanup complete!")
Cleanup complete!
Summary
In this notebook, we learned:
How to enable histogram visualization with
--histogramInterpreting operation count and bytes distribution charts
Analyzing read vs. write patterns
Using histograms to optimize buffer sizes
Applying histogram analysis to real-world scenarios
Histograms are particularly useful for:
Understanding I/O patterns in complex code
Identifying inefficiencies (many small operations)
Optimizing buffer and chunk sizes
Comparing different implementation strategies
Remember: Histogram mode is only available on Linux and macOS, not on Windows.