mod stats

module stats

Statistical functions: chi-squared test with Yates correction, Bonferroni multiple testing correction, and group bias.

Functions

fn bayes_factor_2x2(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32) -> f64

Bayes Factor for association in a 2x2 contingency table. BF > 1: evidence for association. BF > 10: strong evidence. Uses Beta-Binomial marginal likelihoods with uniform Beta(1,1) priors. H1: separate proportions per group. H0: shared proportion.

fn benjamini_hochberg(p_values: &[f64]) -> Vec<f64>

Apply Benjamini-Hochberg FDR correction to a vector of p-values. Returns adjusted p-values (q-values). Controls FDR at the given level.

fn bonferroni_correct(p: f64, n_markers: u64) -> f64

Apply Bonferroni correction to a p-value.

fn chi_squared_p(chi_sq: f64) -> f64

P-value for a chi-squared statistic with df=1.

Uses the exact identity: for df=1, the chi-squared CDF is

P(chi2) = erf(sqrt(chi2/2))

so the p-value is:

p = 1 - P(chi2) = erfc(sqrt(chi2/2))

This replaces the full regularized gamma function with a single libm erfc call. Derived via SymPy:

gamma(1/2, x) = sqrt(pi) * erf(sqrt(x))
Gamma(1/2) = sqrt(pi)
P(1/2, x) = erf(sqrt(x))
p = erfc(sqrt(chi2/2))
fn chi_squared_yates(n_group1: u32, n_group2: u32, total_group1: u32, total_group2: u32) -> f64

Chi-squared statistic with Yates continuity correction for a 2x2 table.

Implements the shortcut formula:

chi2 = N * (|ad - bc| - N/2)^2 / (a+b)(c+d)(a+c)(b+d)

where the contingency table is:

====== ============== =============
\      marker present marker absent
====== ============== =============
group1 n_group1       total1 - n1
group2 n_group2       total2 - n2
====== ============== =============
fn empirical_bayes_em(group_counts: &[(u32, u32)], total_g1: u32, total_g2: u32, p_sex: f64, max_iter: usize) -> (f64, Vec<f64>)

Empirical Bayes EM: estimate pi (fraction of sex-linked markers) from data. Returns (pi, posteriors) after convergence. group_counts: Vec of (n_g1, n_g2) for each marker.

fn fast_erfc_poly(t: f64) -> f64

Sollya-generated minimax polynomial for erfc(t) on [0, 6]. Single degree-40 polynomial – branchless, no exp(), GPU/SIMD ready. Max absolute error: 8.2e-17 (below f64 epsilon). See scripts/sollya/erfc_direct.sollya.

fn find_median(data: &mut [u16]) -> u16

Find median of a mutable slice (partially sorts in-place).

fn fisher_exact(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32) -> f64

Fisher’s exact test for 2x2 table (one-sided, more extreme). Uses hypergeometric probability. Better than chi-squared for small n.

fn g_test(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32) -> f64

G-test (log-likelihood ratio) for 2x2 table. Better asymptotic properties than chi-squared.

fn group_bias(n_group1: u32, total_group1: u32, n_group2: u32, total_group2: u32) -> f64

Group bias: difference in marker frequency between two groups. Ranges from -1.0 (only in group2) to +1.0 (only in group1).

fn logistic_regression(x: &[f64], y: &[f64], n: usize, p: usize, max_iter: usize) -> Vec<f64>

Logistic regression: fit y ~ X using IRLS (Newton-Raphson). X: n x p design matrix (row-major), y: n binary outcomes (0/1). Returns coefficient vector beta (length p).

fn p_association(n_group1: u32, n_group2: u32, total_group1: u32, total_group2: u32) -> f64

Compute p-value of association with group using chi-squared test with Yates correction. Matches C++ get_p_association exactly.

fn posterior_sex_linked(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32, pi: f64, p_sex: f64) -> f64

Posterior probability that a marker is sex-linked (empirical Bayes). Uses a Beta-Binomial conjugate model. pi: prior probability of sex-linkage (estimated from data or set to 0.01). p_sex: assumed frequency in the linked sex (e.g., 0.9).

Structs and Unions

struct Cg(f64)

Format a float like C++ operator<< default: %g with 6 significant digits. This matches the C++ radsex output format exactly.

Traits implemented

impl fmt::Display for Cg