mod estimator¶
- module estimator¶
Working-set size estimator and spill decision.
The estimator combines the observed marker count from the inbound Arrow IPC payload with a conservative overhead multiplier drawn from the RAD-seq literature (Beissinger 2013, TASSEL-GBS, ipyrad).
MarkerTableSource::from_arrow_ipcdecodes the bytes once, asks the estimator whether the implied working set fits, and either keeps the batches in RAM (InMemory(ArrowMarkerSource)) or spills them to a Parquet temp file (Spilled(ParquetMarkerSource)).Variables
- const BYTES_PER_CELL: u64¶
Bytes per depth cell. We always store u16 in the marker buffer regardless of the inbound Arrow type, so this is a fixed 2 bytes.
- const DEFAULT_OVERHEAD: f64¶
Default overhead multiplier capturing arrow validity buffers, group masks, per-marker accumulators, intermediate Vecs. 6x is conservative for the largest commands (signif FDR, triage Bayesian, depth exact).
- const DEFAULT_SPILL_FRACTION: f64¶
Default fraction of available RAM we are willing to use before switching to the spill path.
Functions
- fn estimate_working_set_bytes(n_samples: usize, m_observed_or_predicted: usize, bytes_per_cell: u64, overhead_factor: f64, command_specific_multiplier: f64) -> SizeEstimate¶
Compute the predicted working-set size in bytes.
command_specific_multiplierlets the caller widen the prediction for the heavier commands (e.g. 2.0 for triage / signif with FDR, 1.3 for freq / depth which mostly stream).
- fn spill_threshold_bytes() -> u64¶
Bytes above which we should spill rather than keep the source in RAM.
Enums
- enum MarkerSourceError¶
Combined error for
from_arrow_ipc.- Arrow(ArrowSourceError)¶
- Parquet(ParquetSourceError)¶
Traits implemented
- impl std::fmt::Display for MarkerSourceError¶
- impl std::error::Error for MarkerSourceError¶
- enum MarkerTableSource¶
Resolved marker source: either in-memory Arrow or a Parquet spill.
Wraps the underlying source so the analysis commands can stay generic over
MarkerStreamwithout caring which physical backing they got.- InMemory(ArrowMarkerSource)¶
- Spilled(ParquetMarkerSource)¶
Implementations
- impl MarkerTableSource¶
Functions
- fn from_arrow_ipc(bytes: &[u8], popmap: Option<&Popmap>, min_depth: u16, command_multiplier: f64) -> Result<Self, MarkerSourceError>¶
Decode the inbound IPC bytes, consult the estimator, and produce either an in-memory or spilled source. The popmap is optional but required by the multi-group commands (distrib/signif/triage/depth).
- fn is_spilled(&self) -> bool¶
Convenience: was this source materialised to disk?
Structs and Unions