Statistical Tests Guide ======================= MARVELous supports various statistical tests for associating genetic exposures with phenotypic outcomes. This guide covers available tests, when to use them, and how to define custom tests. Available Tests --------------- Tests for Continuous Outcomes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 15 25 60 * - Code - Name - Description * - OLS - Ordinary Least Squares - Linear regression with optional covariates. * - KW - Kruskal-Wallis - Non-parametric test for differences in distributions across groups. * - AOV - One-way ANOVA - Parametric test for differences in group means. * - MWU - Mann-Whitney U - Non-parametric rank-sum test comparing two groups. * - T - Independent-samples T-test - Parametric test comparing means of two groups. Tests for Binary Outcomes ^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 15 25 60 * - Code - Name - Description * - GLM-Binom - Logistic Regression - Generalized linear model with binomial family and logit link with optional covariates. * - CHISQ - Chi-square - Tests independence between a binary outcome and an exposure in a contingency table. * - FISHER - Fisher's Exact - Exact test for 2×2 contingency tables. Preferred over chi-square when expected cell counts are small. Tests for Categorical Outcomes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 15 25 60 * - Code - Name - Description * - CHISQ - Chi-square - Tests independence between a multi-category outcome and an exposure in a contingency table. Survival Analysis ^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 15 25 60 * - Code - Name - Description * - Cox-PH - Cox Proportional Hazards - Semi-parametric survival model. Requires both an event indicator (0/1) and a time-to-event column. Supports covariates. Covariate Models ---------------- Define multiple covariate models to compare adjusted and unadjusted results: .. code-block:: ini [Covs] Unadjusted None Model_1 age;sex Model_2 age;sex;bmi Full age;sex;bmi;smoking;PC1;PC2;PC3;PC4 Each model runs separately with regression tests (OLS, GLM-Binom, FIRTH). .. note:: Not all tests support covariates. They are run only with the "Unadjusted" model. Custom Tests ------------ The tests are based on the `clean-data `_ package, which included the described tests. The package allows is desgned to work with statistical tests and regression callables from ``scipy``, ``statsmodels``, and ``lifelines`` and defining custom statistical tests is possible by modifying ``marvel/association/tests.py``. Test Structure ^^^^^^^^^^^^^^ Tests are defined as dictionaries with three keys: .. code-block:: python { 'Test method': callable, # The test function 'P-value': int or str, # Index or attribute for p-value 'kwargs': dict, # Optional keyword arguments } Adding a Custom Test ^^^^^^^^^^^^^^^^^^^^ 1. Edit ``marvel/association/tests.py`` 2. Add your test to the ``STATS`` class: .. code-block:: python from scipy import stats class STATS: def __post_init__(self): self.__tests = { # ... existing tests ... 'MY_TEST': { BNames.TEST_METHOD: my_test_function, BNames.TEST_PVALUE: 0, BNames.TEST_KWARGS: {}, }, } @property def my_test(self): '''My custom test.''' return self.__tests['MY_TEST'] 3. Register in ``AllTests`` class: .. code-block:: python class AllTests: def __post_init__(self): self.__tests = { # ... existing tests ... 'MY_TEST': { TEST_DICT_NAMES.NAME: 'My Custom Test', TEST_DICT_NAMES.TEST: BTests().my_test, TEST_DICT_NAMES.STAT: STATS().my_test, }, } 4. Use in configuration: .. code-block:: ini [ConTests] outcome MY_TEST Environment Variable Override ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can specify a custom tests module via environment variable: .. code-block:: bash export MARVEL_TEST_DEFS=/path/to/custom_tests.py marvelous config.cnf -v Output Interpretation --------------------- Results File Columns ^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 25 75 * - Column - Description * - Model - Covariate model name (from [Covs] section) * - Model name - Full name of the statistical test * - Variable - Outcome variable name * - Exposure - Exposure/gene name * - N (Cases) - Number of cases (binary outcomes) * - N (Samples) - Total sample size for this test * - Exposed - Number of exposed (carrier) samples * - Non-exposed - Number of non-exposed (non-carrier) samples * - Estimate - Effect estimate (beta for OLS, log-OR for logistic) * - Std. Error - Standard error of estimate * - Test statistic - Test statistic value (t, chi-square, etc.) * - P-value - P-value from the test * - Estimate (95% CI) - Formatted estimate with confidence interval * - OR (95% CI) - Odds ratio with CI (binary outcomes only) See Also -------- - :doc:`configuration` - Configuration reference - :doc:`advanced` - Advanced features - :doc:`api/association` - API documentation for test classes