connor/differential-equations

Fork 0

Files

Connor Johnstone e1e6f8b4bb Finished bs3 (at least for now)

2025-10-24 10:32:32 -04:00

4.9 KiB

Raw Blame History

BS3 vs DP5 Benchmark Results

Generated: 2025-10-23

Summary

Comprehensive performance comparison between BS3 (Bogacki-Shampine 3rd order) and DP5 (Dormand-Prince 5th order) integrators across various test problems and tolerances.

Key Findings

Overall Performance Comparison

DP5 is consistently faster than BS3 across all tested scenarios, typically by a factor of 1.5x to 4.3x.

This might seem counterintuitive since BS3 uses fewer stages (4 vs 7), but several factors explain DP5's superior performance:

Higher Order = Larger Steps: DP5's 5th order accuracy allows larger timesteps while maintaining the same error tolerance
Optimized Implementation: DP5 has been highly optimized in the existing codebase
Smoother Problems: The test problems are relatively smooth, favoring higher-order methods

When to Use BS3

Despite being slower in these benchmarks, BS3 still has value:

Lower memory overhead: Simpler dense output (4 values vs 5 for DP5)
Moderate accuracy needs: For tolerances around 1e-3 to 1e-5 where speed difference is smaller
Educational/algorithmic diversity: Different method characteristics
Specific problem types: May perform better on less smooth or oscillatory problems

Detailed Results

1. Exponential Decay (`y' = -0.5y`, tolerance 1e-5)

Method	Time	Ratio
BS3	3.28 µs	1.92x slower
DP5	1.70 µs	baseline

Simple 1D problem with smooth exponential solution.

2. Harmonic Oscillator (`y'' + y = 0`, tolerance 1e-5)

Method	Time	Ratio
BS3	30.70 µs	2.25x slower
DP5	13.67 µs	baseline

2D conservative system with periodic solution.

3. Nonlinear Pendulum (tolerance 1e-6)

Method	Time	Ratio
BS3	132.35 µs	3.57x slower
DP5	37.11 µs	baseline

Nonlinear 2D system with trigonometric terms.

4. Orbital Mechanics (6D, tolerance 1e-6)

Method	Time	Ratio
BS3	124.72 µs	1.45x slower
DP5	86.10 µs	baseline

Higher-dimensional problem with gravitational dynamics.

5. Interpolation Performance

Method	Time (solve + 100 interpolations)	Ratio
BS3	19.68 µs	4.81x slower
DP5	4.09 µs	baseline

BS3 uses cubic Hermite interpolation, DP5 uses optimized 5th order interpolation.

6. Tolerance Scaling

Performance across different tolerance levels (y' = -y problem):

Tolerance	BS3 Time	DP5 Time	Ratio (BS3/DP5)
1e-3	1.63 µs	1.26 µs	1.30x
1e-4	2.61 µs	1.54 µs	1.70x
1e-5	4.64 µs	2.03 µs	2.28x
1e-6	8.76 µs	~2.6 µs*	~3.4x*
1e-7	-**	-**	-

* Estimated from trend (benchmark timed out) ** Not completed

Observation: The performance gap widens as tolerance tightens, because DP5's higher order allows it to take larger steps while maintaining accuracy.

Conclusions

Performance Characteristics

DP5 is the better default choice for most problems requiring moderate to high accuracy
Performance gap increases with tighter tolerances (favoring DP5)
Higher dimensions slightly favor BS3 relative to DP5 (1.45x vs 3.57x slowdown)
Interpolation strongly favors DP5 (4.8x faster)

Implementation Quality

Both integrators pass all accuracy and convergence tests:

✅ BS3: 3rd order convergence rate verified
✅ DP5: 5th order convergence rate verified (existing implementation)
✅ Both: FSAL property correctly implemented
✅ Both: Dense output accurate to specified order

Future Optimizations

Potential improvements to BS3 performance:

Specialized dense output: Implement the optimized BS3 interpolation from the 1996 paper
SIMD optimization: Vectorize stage computations
Memory layout: Optimize cache usage for k-value storage
Inline hints: Add compiler hints for critical paths

Even with optimizations, DP5 will likely remain faster for these problem types due to its higher order.

Recommendations

Use DP5: For general-purpose ODE solving, especially for smooth problems
Use BS3: When you specifically need:
- Lower memory usage
- A 3rd order reference implementation
- Comparison with other 3rd order methods

Methodology

Tool: Criterion.rs v0.7.0
Samples: 100 per benchmark
Warmup: 3 seconds per benchmark
Optimization: Release mode with full optimizations
Platform: Linux x86_64
Compiler: rustc (specific version from build)

All benchmarks use std::hint::black_box() to prevent compiler optimizations from affecting timing.

Reproducing Results

cargo bench --bench bs3_vs_dp5

Detailed plots and statistics are available in target/criterion/.

4.9 KiB Raw Blame History