Files
differential-equations/benches/BS3_VS_DP5_RESULTS.md
2025-10-24 10:32:32 -04:00

4.9 KiB

BS3 vs DP5 Benchmark Results

Generated: 2025-10-23

Summary

Comprehensive performance comparison between BS3 (Bogacki-Shampine 3rd order) and DP5 (Dormand-Prince 5th order) integrators across various test problems and tolerances.

Key Findings

Overall Performance Comparison

DP5 is consistently faster than BS3 across all tested scenarios, typically by a factor of 1.5x to 4.3x.

This might seem counterintuitive since BS3 uses fewer stages (4 vs 7), but several factors explain DP5's superior performance:

  1. Higher Order = Larger Steps: DP5's 5th order accuracy allows larger timesteps while maintaining the same error tolerance
  2. Optimized Implementation: DP5 has been highly optimized in the existing codebase
  3. Smoother Problems: The test problems are relatively smooth, favoring higher-order methods

When to Use BS3

Despite being slower in these benchmarks, BS3 still has value:

  • Lower memory overhead: Simpler dense output (4 values vs 5 for DP5)
  • Moderate accuracy needs: For tolerances around 1e-3 to 1e-5 where speed difference is smaller
  • Educational/algorithmic diversity: Different method characteristics
  • Specific problem types: May perform better on less smooth or oscillatory problems

Detailed Results

1. Exponential Decay (y' = -0.5y, tolerance 1e-5)

Method Time Ratio
BS3 3.28 µs 1.92x slower
DP5 1.70 µs baseline

Simple 1D problem with smooth exponential solution.

2. Harmonic Oscillator (y'' + y = 0, tolerance 1e-5)

Method Time Ratio
BS3 30.70 µs 2.25x slower
DP5 13.67 µs baseline

2D conservative system with periodic solution.

3. Nonlinear Pendulum (tolerance 1e-6)

Method Time Ratio
BS3 132.35 µs 3.57x slower
DP5 37.11 µs baseline

Nonlinear 2D system with trigonometric terms.

4. Orbital Mechanics (6D, tolerance 1e-6)

Method Time Ratio
BS3 124.72 µs 1.45x slower
DP5 86.10 µs baseline

Higher-dimensional problem with gravitational dynamics.

5. Interpolation Performance

Method Time (solve + 100 interpolations) Ratio
BS3 19.68 µs 4.81x slower
DP5 4.09 µs baseline

BS3 uses cubic Hermite interpolation, DP5 uses optimized 5th order interpolation.

6. Tolerance Scaling

Performance across different tolerance levels (y' = -y problem):

Tolerance BS3 Time DP5 Time Ratio (BS3/DP5)
1e-3 1.63 µs 1.26 µs 1.30x
1e-4 2.61 µs 1.54 µs 1.70x
1e-5 4.64 µs 2.03 µs 2.28x
1e-6 8.76 µs ~2.6 µs* ~3.4x*
1e-7 -** -** -

* Estimated from trend (benchmark timed out) ** Not completed

Observation: The performance gap widens as tolerance tightens, because DP5's higher order allows it to take larger steps while maintaining accuracy.

Conclusions

Performance Characteristics

  1. DP5 is the better default choice for most problems requiring moderate to high accuracy
  2. Performance gap increases with tighter tolerances (favoring DP5)
  3. Higher dimensions slightly favor BS3 relative to DP5 (1.45x vs 3.57x slowdown)
  4. Interpolation strongly favors DP5 (4.8x faster)

Implementation Quality

Both integrators pass all accuracy and convergence tests:

  • BS3: 3rd order convergence rate verified
  • DP5: 5th order convergence rate verified (existing implementation)
  • Both: FSAL property correctly implemented
  • Both: Dense output accurate to specified order

Future Optimizations

Potential improvements to BS3 performance:

  1. Specialized dense output: Implement the optimized BS3 interpolation from the 1996 paper
  2. SIMD optimization: Vectorize stage computations
  3. Memory layout: Optimize cache usage for k-value storage
  4. Inline hints: Add compiler hints for critical paths

Even with optimizations, DP5 will likely remain faster for these problem types due to its higher order.

Recommendations

  • Use DP5: For general-purpose ODE solving, especially for smooth problems
  • Use BS3: When you specifically need:
    • Lower memory usage
    • A 3rd order reference implementation
    • Comparison with other 3rd order methods

Methodology

  • Tool: Criterion.rs v0.7.0
  • Samples: 100 per benchmark
  • Warmup: 3 seconds per benchmark
  • Optimization: Release mode with full optimizations
  • Platform: Linux x86_64
  • Compiler: rustc (specific version from build)

All benchmarks use std::hint::black_box() to prevent compiler optimizations from affecting timing.

Reproducing Results

cargo bench --bench bs3_vs_dp5

Detailed plots and statistics are available in target/criterion/.