mod markers_table¶
- module markers_table¶
Markers table parser: mmap + algorithmic optimizations.
Key optimizations: - mmap for zero-copy I/O - memchr-based line and field iteration - Specialized parser paths for presence-only, depth-only, and full marker rows - Bitset presence tracking via popcount - Optional parallel chunk processing behind the
parallelfeatureStructs and Unions
- struct MarkersTableStream¶
Streaming / bounded-memory view over a marker depth table (mmap or Arrow).
This is the central abstraction: commands consume rows without ever materializing the full n_markers × n_ind matrix in RAM. See the paper and
docs/orgmode/reference/architecture.orgfor the design.- groups: Vec<String>¶
Implementations
- impl MarkersTableStream¶
Functions
- fn count_markers(&self) -> std::io::Result<u64>¶
Count markers with n_individuals > 0 (for Bonferroni correction). Streaming: O(1) memory, just counts.
-
fn for_each<F>(&self, f: F) -> std::io::Result<()>¶
where
F: FnMut(&Marker)
¶ Process all markers. Uses fast path when sequence isn’t needed.
-
fn for_each_parallel<F>(&self, f: F) -> std::io::Result<()>¶
where
F: Fn(&Marker) + Send + Sync
¶ Process all markers with a callback that is compatible with parallel execution.
With the
parallelfeature enabled, callback order is not specified.
- fn open(path: &Path, popmap: Option<&Popmap>, config: ParserConfig) -> std::io::Result<Self>¶
-
fn par_filter_map_collect<T, F>(&self, filter_map: F) -> std::io::Result<Vec<T>>¶
where
T: Send,
F: Fn(&Marker) -> Option<T> + Send + Sync
¶ Collect mapped marker values in table order, using parallel parsing for large inputs.
The filter/mapping closure runs independently per marker. Results are buffered per line-aligned chunk, then concatenated in chunk order before returning.
-
fn par_fold_reduce<Acc, Fold, Reduce>(&self, init: Acc, fold: Fold, reduce: Reduce) -> std::io::Result<Acc>¶
where
Acc: Send + Sync + Clone,
Fold: Fn(&mut Acc, &Marker) + Send + Sync + Clone,
Reduce: Fn(Acc, Acc) -> Acc + Send + Sync
¶ Parallel fold + reduce for accumulation without per-marker locking.
This is the ergonomic high-level API for strong scaling.
Each rayon thread processes one or more chunks, maintaining a local
Accand callingfold(&mut local, &marker)for every marker. At the end the per-chunk accumulators are combined withreduce.This enables lock-free parallel accumulation for: - distrib 2D tables - per-individual depth/freq stats - FDR p-value collection (fold into Vec of (p, metadata))
-
fn par_for_each<F>(&self, f: F) -> std::io::Result<()>¶
where
F: Fn(&Marker) + Send + Sync
¶ Parallel for_each when the “parallel” feature is enabled. Splits the mmap into ~1 MiB line-aligned chunks and processes them concurrently with rayon. The closure must be
Send + Sync.Use this for strong scaling on large marker tables (100k+ rows) on multi-core machines for commands like distrib, signif (non-FDR), freq, depth.
For FDR in signif, the caller must use a thread-safe collector (e.g. DashMap or crossbeam channel + final sort by original order).
- struct ParserConfig¶
Configuration for the markers table parser (controls what is materialized from the on-disk / Arrow source during streaming).
Used by all commands to trade off speed vs. functionality (e.g. fasta output or per-marker sequences for mapping requires storing the sequence).
- store_sequence: bool¶
- store_depths: bool¶
- compute_groups: bool¶
- min_depth: u16¶
Traits implemented
- impl Default for ParserConfig¶