146 lines
4.9 KiB
Markdown
146 lines
4.9 KiB
Markdown
# BS3 vs DP5 Benchmark Results
|
|
|
|
Generated: 2025-10-23
|
|
|
|
## Summary
|
|
|
|
Comprehensive performance comparison between **BS3** (Bogacki-Shampine 3rd order) and **DP5** (Dormand-Prince 5th order) integrators across various test problems and tolerances.
|
|
|
|
## Key Findings
|
|
|
|
### Overall Performance Comparison
|
|
|
|
**DP5 is consistently faster than BS3 across all tested scenarios**, typically by a factor of **1.5x to 4.3x**.
|
|
|
|
This might seem counterintuitive since BS3 uses fewer stages (4 vs 7), but several factors explain DP5's superior performance:
|
|
|
|
1. **Higher Order = Larger Steps**: DP5's 5th order accuracy allows larger timesteps while maintaining the same error tolerance
|
|
2. **Optimized Implementation**: DP5 has been highly optimized in the existing codebase
|
|
3. **Smoother Problems**: The test problems are relatively smooth, favoring higher-order methods
|
|
|
|
### When to Use BS3
|
|
|
|
Despite being slower in these benchmarks, BS3 still has value:
|
|
- **Lower memory overhead**: Simpler dense output (4 values vs 5 for DP5)
|
|
- **Moderate accuracy needs**: For tolerances around 1e-3 to 1e-5 where speed difference is smaller
|
|
- **Educational/algorithmic diversity**: Different method characteristics
|
|
- **Specific problem types**: May perform better on less smooth or oscillatory problems
|
|
|
|
## Detailed Results
|
|
|
|
### 1. Exponential Decay (`y' = -0.5y`, tolerance 1e-5)
|
|
|
|
| Method | Time | Ratio |
|
|
|--------|------|-------|
|
|
| **BS3** | 3.28 µs | 1.92x slower |
|
|
| **DP5** | 1.70 µs | baseline |
|
|
|
|
Simple 1D problem with smooth exponential solution.
|
|
|
|
### 2. Harmonic Oscillator (`y'' + y = 0`, tolerance 1e-5)
|
|
|
|
| Method | Time | Ratio |
|
|
|--------|------|-------|
|
|
| **BS3** | 30.70 µs | 2.25x slower |
|
|
| **DP5** | 13.67 µs | baseline |
|
|
|
|
2D conservative system with periodic solution.
|
|
|
|
### 3. Nonlinear Pendulum (tolerance 1e-6)
|
|
|
|
| Method | Time | Ratio |
|
|
|--------|------|-------|
|
|
| **BS3** | 132.35 µs | 3.57x slower |
|
|
| **DP5** | 37.11 µs | baseline |
|
|
|
|
Nonlinear 2D system with trigonometric terms.
|
|
|
|
### 4. Orbital Mechanics (6D, tolerance 1e-6)
|
|
|
|
| Method | Time | Ratio |
|
|
|--------|------|-------|
|
|
| **BS3** | 124.72 µs | 1.45x slower |
|
|
| **DP5** | 86.10 µs | baseline |
|
|
|
|
Higher-dimensional problem with gravitational dynamics.
|
|
|
|
### 5. Interpolation Performance
|
|
|
|
| Method | Time (solve + 100 interpolations) | Ratio |
|
|
|--------|-----------------------------------|-------|
|
|
| **BS3** | 19.68 µs | 4.81x slower |
|
|
| **DP5** | 4.09 µs | baseline |
|
|
|
|
BS3 uses cubic Hermite interpolation, DP5 uses optimized 5th order interpolation.
|
|
|
|
### 6. Tolerance Scaling
|
|
|
|
Performance across different tolerance levels (`y' = -y` problem):
|
|
|
|
| Tolerance | BS3 Time | DP5 Time | Ratio (BS3/DP5) |
|
|
|-----------|----------|----------|-----------------|
|
|
| 1e-3 | 1.63 µs | 1.26 µs | 1.30x |
|
|
| 1e-4 | 2.61 µs | 1.54 µs | 1.70x |
|
|
| 1e-5 | 4.64 µs | 2.03 µs | 2.28x |
|
|
| 1e-6 | 8.76 µs | ~2.6 µs* | ~3.4x* |
|
|
| 1e-7 | -** | -** | - |
|
|
|
|
\* Estimated from trend (benchmark timed out)
|
|
\** Not completed
|
|
|
|
**Observation**: The performance gap widens as tolerance tightens, because DP5's higher order allows it to take larger steps while maintaining accuracy.
|
|
|
|
## Conclusions
|
|
|
|
### Performance Characteristics
|
|
|
|
1. **DP5 is the better default choice** for most problems requiring moderate to high accuracy
|
|
2. **Performance gap increases** with tighter tolerances (favoring DP5)
|
|
3. **Higher dimensions** slightly favor BS3 relative to DP5 (1.45x vs 3.57x slowdown)
|
|
4. **Interpolation** strongly favors DP5 (4.8x faster)
|
|
|
|
### Implementation Quality
|
|
|
|
Both integrators pass all accuracy and convergence tests:
|
|
- ✅ BS3: 3rd order convergence rate verified
|
|
- ✅ DP5: 5th order convergence rate verified (existing implementation)
|
|
- ✅ Both: FSAL property correctly implemented
|
|
- ✅ Both: Dense output accurate to specified order
|
|
|
|
### Future Optimizations
|
|
|
|
Potential improvements to BS3 performance:
|
|
1. **Specialized dense output**: Implement the optimized BS3 interpolation from the 1996 paper
|
|
2. **SIMD optimization**: Vectorize stage computations
|
|
3. **Memory layout**: Optimize cache usage for k-value storage
|
|
4. **Inline hints**: Add compiler hints for critical paths
|
|
|
|
Even with optimizations, DP5 will likely remain faster for these problem types due to its higher order.
|
|
|
|
## Recommendations
|
|
|
|
- **Use DP5**: For general-purpose ODE solving, especially for smooth problems
|
|
- **Use BS3**: When you specifically need:
|
|
- Lower memory usage
|
|
- A 3rd order reference implementation
|
|
- Comparison with other 3rd order methods
|
|
|
|
## Methodology
|
|
|
|
- **Tool**: Criterion.rs v0.7.0
|
|
- **Samples**: 100 per benchmark
|
|
- **Warmup**: 3 seconds per benchmark
|
|
- **Optimization**: Release mode with full optimizations
|
|
- **Platform**: Linux x86_64
|
|
- **Compiler**: rustc (specific version from build)
|
|
|
|
All benchmarks use `std::hint::black_box()` to prevent compiler optimizations from affecting timing.
|
|
|
|
## Reproducing Results
|
|
|
|
```bash
|
|
cargo bench --bench bs3_vs_dp5
|
|
```
|
|
|
|
Detailed plots and statistics are available in `target/criterion/`.
|