Performance Notes¶
Precision Modes¶
Snapshot storage supports:
- storage_compression=False / "none": raw storage.
- storage_compression="bf16": bfloat16-compressed storage.
CPU vs CUDA¶
- CPU path supports
float32/float64propagation and gradients. - CUDA path is recommended for large
ny x nx x ntworkloads.
Batching¶
- Throughput scales with
n_shotsuntil memory bandwidth saturation. - For gradient workloads, tune
model_gradient_sampling_intervalandstorage_modeto control memory use.
Storage Impact¶
storage_mode="device"is fastest but VRAM-heavy.storage_mode="cpu"/"disk"reduce VRAM pressure with transfer overhead.
Practical Optimization Checklist¶
- Choose CUDA for medium/large models and batches.
- Increase n_shots until throughput saturates, then stop.
- Balance stencil order and grid spacing for target fidelity.
- Use storage_mode=auto with byte limits on memory-constrained systems.
- Profile representative workloads rather than synthetic tiny benchmarks.
Read This Before Enabling Advanced Modes¶
Before enabling advanced runtime options broadly:
- Confirm correctness on a small case.
- Read
guides/limitations.md. - Run the checks in
guides/verification.md.