mod seq_reader

module seq_reader

Sequence file reader wrapping needletail for FASTQ/FASTA (optionally gzipped).

Functions

fn count_sequences(path: &Path) -> Result<ahash::AHashMap<Vec<u8>, u16>, Box<dyn std::error::Error + Send + Sync>>

Count occurrences of each unique sequence in a single file. Uses 2-bit packed DNA keys for 4x memory reduction. Returns a map of packed_sequence -> count.

fn get_input_files(dir: &Path) -> std::io::Result<Vec<InputFile>>

Scan a directory for supported sequence files and extract individual names.

fn pack_2bit(seq: &[u8]) -> Vec<u8>

Pack a DNA sequence into 2-bit encoding: A=00, C=01, G=10, T=11. 4 bases per byte, big-endian within each byte. Returns the packed bytes.

fn unpack_2bit(packed: &[u8]) -> Vec<u8>

Unpack a 2-bit encoded DNA sequence back to ASCII.

Structs and Unions

struct InputFile

An input file with its individual name derived from the filename.

path: PathBuf
individual_name: String