Quick Start

Author:

Rohit Goswami

1 Quick Start

1.1 Installation

1.1.2 From GitHub releases

Pre-built binaries are available for Linux (x8664, aarch64) and macOS (x8664, arm64) from the Releases page. On Windows, build from source without the map feature (minimap2 is not available there).

1.1.3 Via pixi

cd rsx-rs
pixi run build

1.2 Example workflow

Given demultiplexed RAD-seq reads in reads/ and a population map:

# popmap.tsv
ind1    M
ind2    M
ind3    F
ind4    F

1.2.1 Step 1: Build markers table

rsx process -i reads/ -o markers.tsv -T 4 -d 5

1.2.2 Step 2: Check marker frequencies

rsx freq -t markers.tsv -o freq.tsv -d 5

1.2.3 Step 3: Compute sex-bias distribution

rsx distrib -t markers.tsv -p popmap.tsv -o distrib.tsv -d 5 -G M,F

1.2.4 Step 4: Extract significant markers

rsx signif -t markers.tsv -p popmap.tsv -o signif.tsv -d 5 -G M,F

1.2.5 Step 5: Map to reference genome

rsx map -t markers.tsv -p popmap.tsv -g genome.fa -o aligned.tsv -d 5 -G M,F

1.2.6 Step 6: Merge multiple tables

rsx merge -o combined.tsv pop1_markers.tsv pop2_markers.tsv pop3_markers.tsv

Uses bounded-memory external sort (~500MB) for arbitrarily large datasets.

1.2.7 Step 7: Streaming PCA

rsx pca -t combined.tsv -o pca_results/ -d 5 -r 10

Produces eigenvalues, loadings, and summary in the output directory. PC1 typically separates males and females for sex-linked markers.

1.3 Output format

All outputs are tab-separated with an optional #source: comment line. The format is identical to the original C++ RADSex tool, so existing R scripts work without modification.

1.4 Memory guarantees

All commands operate in bounded memory regardless of input size:

Command

Memory

distrib, freq

O(nindividuals)

signif, subset

O(nindividuals)

map

O(genomeindex)

depth (small)

O(nmarkers* nind)

depth (> 2GB)

O(buffersize)

merge

O(buffersize)

pca

O(nindividuals2)

For 200 individuals and 75M markers, typical peak memory is < 500MB (except map which loads the minimap2 genome index).

1.5 Python bindings (high-level API)

Install with pip install pyrsx (or pixi run -e python build-python from source).

import pyrsx

# Process reads → marker depth table (Arrow-backed under the hood)
pyrsx.process("reads/", "markers.tsv", threads=8, min_depth=5)

# Distribution + significance with Bayesian evidence
pyrsx.distrib("markers.tsv", "popmap.tsv", "distrib.tsv", groups=["M", "F"])
pyrsx.signif("markers.tsv", "popmap.tsv", "signif.tsv",
             groups=["M", "F"], test="fisher", correction="fdr", bayes=True)

# High-level ergonomic objects + narwhals / plotting
tbl = pyrsx.MarkerTable.from_path("markers.tsv")
print("n_markers:", len(tbl))

# Streaming PCA (Tucker mode-2) for sex signal QC
pyrsx.pca("markers.tsv", "pca_out/", n_components=5)

# Merge multiple tables (bounded memory)
pyrsx.merge(["run1.tsv", "run2.tsv"], "merged.tsv")

See the Python README and the python-api-design reference for the full surface (including from_dataframe, to_arrow, custom triage, etc.).

1.6 Reproducing the paper

Every benchmark, figure, and biological result reported for rsx is reproducible from a single deposited archive. The Zenodo deposit doi:10.5281/zenodo.20531539 bundles the pinned workflow (v0.2.3), the downloaded literature inputs, the result tables, and a one-command pixi + Snakemake pipeline:

# from the extracted reproducibility archive
pixi install
pixi run bench        # regenerates results/ and results/figures/

The archive clones rsx at the pinned tag, builds it alongside the C++ RADSex v1.2.0 reference, and regenerates the four-panel literature benchmark (the 8.38x geometric-mean speedup across 56 paired timings), the Bayesian evidence grades, and the sex-linked marker calls. Timings scale with the host hardware; the biological results (marker counts and evidence grades) do not.