Performance Benchmarking
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%" — Donald Knuth
Performance Tuning Cycle
The performance tuning cycle is an iterative process that helps identify and resolve performance bottlenecks in software applications. It consists of the following steps:
- Identify Performance Goals: Define clear and measurable performance objectives based on user requirements and business needs.
- Measure Baseline Performance: Use profiling tools to gather data on the current performance of the application, establishing a baseline for comparison.
- Analyze Performance Data: Examine the collected data to identify bottlenecks and areas for improvement.
- Implement Optimizations: Apply targeted optimizations to address identified bottlenecks, which may include code refactoring, algorithm improvements, or resource management enhancements.
- Re-measure Performance: After implementing optimizations, re-measure the application's performance
What to measure
When benchmarking performance, consider measuring the following metrics:
- Memory Usage (heap, stack)
- CPU Usage& hot paths
- IO bottlenecks (disk, network)
- Throughput (e.g., requests per second)
- Latency: e.g response time, tail Latency
Tools
Hyperfine
Hyperfine is a command-line benchmarking tool that allows you to measure the execution time of commands and scripts. It provides statistical analysis of the results, including mean, median, standard deviation, and more. Hyperfine is useful for comparing the performance of different implementations or configurations.
# Install
cargo install hyperfine
# or
brew install hyperfine
Comparing
You can compare the execution time of two simple commands.
hyperfine 'sleep 0.1' 'sleep 0.2'
This will run each command multiple times and show you a statistical summary of the results.
A common use case is to compare the performance of different tools that do the same job. For example, comparing find and fd (a modern alternative to find).
hyperfine 'find . -name "*.md"' 'fd .md'
Benchmarking with Parameters
You can set up parameters for your benchmarks to see how performance changes with different inputs.
hyperfine --parameter-scan num 1 10 'my-script --size {num}'
This will run my-script with the --size parameter varying from 1 to 10.
Warmup Runs
For I/O-heavy commands, it's often useful to perform some warmup runs to make sure the disk cache is populated.
hyperfine --warmup 3 'cat my_large_file.txt > /dev/null'
Exporting Results
You can export the benchmark results to various formats like CSV, JSON, or Markdown.
hyperfine --export-markdown benchmark.md 'sleep 0.1' 'sleep 0.2'
This is great for sharing your results or for further analysis.
Sample output
hyperfine --warmup 3 'find . -name "*.md"' 'fd .md'
Benchmark 1: find . -name "*.md"
Time (mean ± σ): 30.0 ms ± 1.2 ms [User: 3.8 ms, System: 26.0 ms]
Range (min … max): 27.8 ms … 33.7 ms 97 runs
Benchmark 2: fd .md
Time (mean ± σ): 10.8 ms ± 1.6 ms [User: 19.4 ms, System: 50.6 ms]
Range (min … max): 8.6 ms … 30.3 ms 218 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
fd .md ran
2.77 ± 0.42 times faster than find . -name "*.md"
Flamegraph
Cargo Flamegraph is a tool for generating flame graphs from profiling data. Flame graphs are a visualization of hierarchical data, often used to represent CPU or memory usage in software applications. They help identify performance bottlenecks by showing which functions consume the most resources.
cargo install flamegraph
For proper working debug signals need to be activated:
[profile.release]
debug = true
Basic usage
# for a rust project
cargo flamegraph -o flamegraph.svg
# for a binary
cargo flamegraph --bin my_binary -o flamegraph.svg
Dhat