4.9 KiB
BS3 vs DP5 Benchmark Results
Generated: 2025-10-23
Summary
Comprehensive performance comparison between BS3 (Bogacki-Shampine 3rd order) and DP5 (Dormand-Prince 5th order) integrators across various test problems and tolerances.
Key Findings
Overall Performance Comparison
DP5 is consistently faster than BS3 across all tested scenarios, typically by a factor of 1.5x to 4.3x.
This might seem counterintuitive since BS3 uses fewer stages (4 vs 7), but several factors explain DP5's superior performance:
- Higher Order = Larger Steps: DP5's 5th order accuracy allows larger timesteps while maintaining the same error tolerance
- Optimized Implementation: DP5 has been highly optimized in the existing codebase
- Smoother Problems: The test problems are relatively smooth, favoring higher-order methods
When to Use BS3
Despite being slower in these benchmarks, BS3 still has value:
- Lower memory overhead: Simpler dense output (4 values vs 5 for DP5)
- Moderate accuracy needs: For tolerances around 1e-3 to 1e-5 where speed difference is smaller
- Educational/algorithmic diversity: Different method characteristics
- Specific problem types: May perform better on less smooth or oscillatory problems
Detailed Results
1. Exponential Decay (y' = -0.5y, tolerance 1e-5)
| Method | Time | Ratio |
|---|---|---|
| BS3 | 3.28 µs | 1.92x slower |
| DP5 | 1.70 µs | baseline |
Simple 1D problem with smooth exponential solution.
2. Harmonic Oscillator (y'' + y = 0, tolerance 1e-5)
| Method | Time | Ratio |
|---|---|---|
| BS3 | 30.70 µs | 2.25x slower |
| DP5 | 13.67 µs | baseline |
2D conservative system with periodic solution.
3. Nonlinear Pendulum (tolerance 1e-6)
| Method | Time | Ratio |
|---|---|---|
| BS3 | 132.35 µs | 3.57x slower |
| DP5 | 37.11 µs | baseline |
Nonlinear 2D system with trigonometric terms.
4. Orbital Mechanics (6D, tolerance 1e-6)
| Method | Time | Ratio |
|---|---|---|
| BS3 | 124.72 µs | 1.45x slower |
| DP5 | 86.10 µs | baseline |
Higher-dimensional problem with gravitational dynamics.
5. Interpolation Performance
| Method | Time (solve + 100 interpolations) | Ratio |
|---|---|---|
| BS3 | 19.68 µs | 4.81x slower |
| DP5 | 4.09 µs | baseline |
BS3 uses cubic Hermite interpolation, DP5 uses optimized 5th order interpolation.
6. Tolerance Scaling
Performance across different tolerance levels (y' = -y problem):
| Tolerance | BS3 Time | DP5 Time | Ratio (BS3/DP5) |
|---|---|---|---|
| 1e-3 | 1.63 µs | 1.26 µs | 1.30x |
| 1e-4 | 2.61 µs | 1.54 µs | 1.70x |
| 1e-5 | 4.64 µs | 2.03 µs | 2.28x |
| 1e-6 | 8.76 µs | ~2.6 µs* | ~3.4x* |
| 1e-7 | -** | -** | - |
* Estimated from trend (benchmark timed out) ** Not completed
Observation: The performance gap widens as tolerance tightens, because DP5's higher order allows it to take larger steps while maintaining accuracy.
Conclusions
Performance Characteristics
- DP5 is the better default choice for most problems requiring moderate to high accuracy
- Performance gap increases with tighter tolerances (favoring DP5)
- Higher dimensions slightly favor BS3 relative to DP5 (1.45x vs 3.57x slowdown)
- Interpolation strongly favors DP5 (4.8x faster)
Implementation Quality
Both integrators pass all accuracy and convergence tests:
- ✅ BS3: 3rd order convergence rate verified
- ✅ DP5: 5th order convergence rate verified (existing implementation)
- ✅ Both: FSAL property correctly implemented
- ✅ Both: Dense output accurate to specified order
Future Optimizations
Potential improvements to BS3 performance:
- Specialized dense output: Implement the optimized BS3 interpolation from the 1996 paper
- SIMD optimization: Vectorize stage computations
- Memory layout: Optimize cache usage for k-value storage
- Inline hints: Add compiler hints for critical paths
Even with optimizations, DP5 will likely remain faster for these problem types due to its higher order.
Recommendations
- Use DP5: For general-purpose ODE solving, especially for smooth problems
- Use BS3: When you specifically need:
- Lower memory usage
- A 3rd order reference implementation
- Comparison with other 3rd order methods
Methodology
- Tool: Criterion.rs v0.7.0
- Samples: 100 per benchmark
- Warmup: 3 seconds per benchmark
- Optimization: Release mode with full optimizations
- Platform: Linux x86_64
- Compiler: rustc (specific version from build)
All benchmarks use std::hint::black_box() to prevent compiler optimizations from affecting timing.
Reproducing Results
cargo bench --bench bs3_vs_dp5
Detailed plots and statistics are available in target/criterion/.