mod stats¶
- module stats¶
Statistical functions: chi-squared test with Yates correction, Bonferroni multiple testing correction, and group bias.
Functions
- fn bayes_factor_2x2(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32) -> f64¶
Bayes Factor for association in a 2x2 contingency table. BF > 1: evidence for association. BF > 10: strong evidence. Uses Beta-Binomial marginal likelihoods with uniform Beta(1,1) priors. H1: separate proportions per group. H0: shared proportion.
- fn benjamini_hochberg(p_values: &[f64]) -> Vec<f64>¶
Apply Benjamini-Hochberg FDR correction to a vector of p-values. Returns adjusted p-values (q-values). Controls FDR at the given level.
- fn bonferroni_correct(p: f64, n_markers: u64) -> f64¶
Apply Bonferroni correction to a p-value.
- fn chi_squared_p(chi_sq: f64) -> f64¶
P-value for a chi-squared statistic with df=1.
- Uses the exact identity: for df=1, the chi-squared CDF is
P(chi2) = erf(sqrt(chi2/2))
- so the p-value is:
p = 1 - P(chi2) = erfc(sqrt(chi2/2))
This replaces the full regularized gamma function with a single libm erfc call. Derived via SymPy:
gamma(1/2, x) = sqrt(pi) * erf(sqrt(x)) Gamma(1/2) = sqrt(pi) P(1/2, x) = erf(sqrt(x)) p = erfc(sqrt(chi2/2))
- fn chi_squared_yates(n_group1: u32, n_group2: u32, total_group1: u32, total_group2: u32) -> f64¶
Chi-squared statistic with Yates continuity correction for a 2x2 table.
Implements the shortcut formula:
chi2 = N * (|ad - bc| - N/2)^2 / (a+b)(c+d)(a+c)(b+d)where the contingency table is:
====== ============== ============= \ marker present marker absent ====== ============== ============= group1 n_group1 total1 - n1 group2 n_group2 total2 - n2 ====== ============== =============
- fn empirical_bayes_em(group_counts: &[(u32, u32)], total_g1: u32, total_g2: u32, p_sex: f64, max_iter: usize) -> (f64, Vec<f64>)¶
Empirical Bayes EM: estimate pi (fraction of sex-linked markers) from data. Returns (pi, posteriors) after convergence. group_counts: Vec of (n_g1, n_g2) for each marker.
- fn fast_erfc_poly(t: f64) -> f64¶
Sollya-generated minimax polynomial for erfc(t) on [0, 6]. Single degree-40 polynomial – branchless, no exp(), GPU/SIMD ready. Max absolute error: 8.2e-17 (below f64 epsilon). See
scripts/sollya/erfc_direct.sollya.
- fn find_median(data: &mut [u16]) -> u16¶
Find median of a mutable slice (partially sorts in-place).
- fn fisher_exact(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32) -> f64¶
Fisher’s exact test for 2x2 table (one-sided, more extreme). Uses hypergeometric probability. Better than chi-squared for small n.
- fn g_test(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32) -> f64¶
G-test (log-likelihood ratio) for 2x2 table. Better asymptotic properties than chi-squared.
- fn group_bias(n_group1: u32, total_group1: u32, n_group2: u32, total_group2: u32) -> f64¶
Group bias: difference in marker frequency between two groups. Ranges from -1.0 (only in group2) to +1.0 (only in group1).
- fn logistic_regression(x: &[f64], y: &[f64], n: usize, p: usize, max_iter: usize) -> Vec<f64>¶
Logistic regression: fit y ~ X using IRLS (Newton-Raphson). X: n x p design matrix (row-major), y: n binary outcomes (0/1). Returns coefficient vector beta (length p).
- fn p_association(n_group1: u32, n_group2: u32, total_group1: u32, total_group2: u32) -> f64¶
Compute p-value of association with group using chi-squared test with Yates correction. Matches C++
get_p_associationexactly.
- fn posterior_sex_linked(n_g1: u32, n_g2: u32, total_g1: u32, total_g2: u32, pi: f64, p_sex: f64) -> f64¶
Posterior probability that a marker is sex-linked (empirical Bayes). Uses a Beta-Binomial conjugate model. pi: prior probability of sex-linkage (estimated from data or set to 0.01). p_sex: assumed frequency in the linked sex (e.g., 0.9).
Structs and Unions