`mod markers_table`¶

module markers_table¶

Markers table parser: mmap + algorithmic optimizations.

Key optimizations: - mmap for zero-copy I/O - memchr-based line and field iteration - Specialized parser paths for presence-only, depth-only, and full marker rows - Bitset presence tracking via popcount - Optional parallel chunk processing behind the parallel feature

Structs and Unions

struct MarkersTableStream¶

Streaming / bounded-memory view over a marker depth table (mmap or Arrow).

This is the central abstraction: commands consume rows without ever materializing the full n_markers × n_ind matrix in RAM. See the paper and docs/orgmode/reference/architecture.org for the design.

header: crate::io::table_io::TableHeader¶

groups: Vec<String>¶

Implementations

impl MarkersTableStream¶

Functions

fn collect(&self) -> std::io::Result<Vec<Marker>>¶

fn count_markers(&self) -> std::io::Result<u64>¶: Count markers with n_individuals > 0 (for Bonferroni correction). Streaming: O(1) memory, just counts.

fn for_each<F>(&self, f: F) -> std::io::Result<()>¶ where F: FnMut(&Marker) ¶: Process all markers. Uses fast path when sequence isn’t needed.

fn for_each_parallel<F>(&self, f: F) -> std::io::Result<()>¶ where F: Fn(&Marker) + Send + Sync ¶

Process all markers with a callback that is compatible with parallel execution.

With the parallel feature enabled, callback order is not specified.

fn for_each_parallel<F>(&self, f: F) -> std::io::Result<()>¶ where F: Fn(&Marker) + Send + Sync ¶

fn iter(&self) -> impl Iterator<Item = Marker>¶: Convenience iterator over a successful collect. Panics on I/O error so failures are never turned into a silent empty success (the old unwrap_or_default behavior). Prefer Self::try_iter or Self::for_each in production code.

fn open(path: &Path, popmap: Option<&Popmap>, config: ParserConfig) -> std::io::Result<Self>¶

fn par_filter_map_collect<T, F>(&self, filter_map: F) -> std::io::Result<Vec<T>>¶ where T: Send, F: Fn(&Marker) -> Option<T> + Send + Sync ¶

Collect mapped marker values in table order, using parallel parsing for large inputs.

The filter/mapping closure runs independently per marker. Results are buffered per line-aligned chunk, then concatenated in chunk order before returning.

fn par_fold_reduce<Acc, Fold, Reduce>(&self, init: Acc, fold: Fold, reduce: Reduce) -> std::io::Result<Acc>¶ where Acc: Send + Sync + Clone, Fold: Fn(&mut Acc, &Marker) + Send + Sync + Clone, Reduce: Fn(Acc, Acc) -> Acc + Send + Sync ¶

Parallel fold + reduce for accumulation without per-marker locking.

This is the ergonomic high-level API for strong scaling.

Each rayon thread processes one or more chunks, maintaining a local Acc and calling fold(&mut local, &marker) for every marker. At the end the per-chunk accumulators are combined with reduce.

This enables lock-free parallel accumulation for: - distrib 2D tables - per-individual depth/freq stats - FDR p-value collection (fold into Vec of (p, metadata))

fn par_for_each<F>(&self, f: F) -> std::io::Result<()>¶ where F: Fn(&Marker) + Send + Sync ¶

Parallel for_each when the “parallel” feature is enabled. Splits the mmap into ~1 MiB line-aligned chunks and processes them concurrently with rayon. The closure must be Send + Sync.

Use this for strong scaling on large marker tables (100k+ rows) on multi-core machines for commands like distrib, signif (non-FDR), freq, depth.

For FDR in signif, the caller must use a thread-safe collector (e.g. DashMap or crossbeam channel + final sort by original order).

fn try_iter(&self) -> std::io::Result<std::vec::IntoIter<Marker>>¶: Materialize all markers for iterator use. Prefer Self::for_each for streaming; this path is fallible and does not swallow I/O errors.

struct ParserConfig¶

Configuration for the markers table parser (controls what is materialized from the on-disk / Arrow source during streaming).

Used by all commands to trade off speed vs. functionality (e.g. fasta output or per-marker sequences for mapping requires storing the sequence).

store_sequence: bool¶

store_depths: bool¶

compute_groups: bool¶

min_depth: u16¶

Traits implemented

impl Default for ParserConfig¶

mod markers_table¶

`mod markers_table`¶