Statistical Tests Guide

MARVELous supports various statistical tests for associating genetic exposures with phenotypic outcomes. This guide covers available tests, when to use them, and how to define custom tests.

Available Tests

Tests for Continuous Outcomes

Code

Name

Description

OLS

Ordinary Least Squares

Linear regression with optional covariates.

KW

Kruskal-Wallis

Non-parametric test for differences in distributions across groups.

AOV

One-way ANOVA

Parametric test for differences in group means.

MWU

Mann-Whitney U

Non-parametric rank-sum test comparing two groups.

T

Independent-samples T-test

Parametric test comparing means of two groups.

Tests for Binary Outcomes

Code

Name

Description

GLM-Binom

Logistic Regression

Generalized linear model with binomial family and logit link with optional covariates.

CHISQ

Chi-square

Tests independence between a binary outcome and an exposure in a contingency table.

FISHER

Fisher’s Exact

Exact test for 2×2 contingency tables. Preferred over chi-square when expected cell counts are small.

Tests for Categorical Outcomes

Code

Name

Description

CHISQ

Chi-square

Tests independence between a multi-category outcome and an exposure in a contingency table.

Survival Analysis

Code

Name

Description

Cox-PH

Cox Proportional Hazards

Semi-parametric survival model. Requires both an event indicator (0/1) and a time-to-event column. Supports covariates.

Covariate Models

Define multiple covariate models to compare adjusted and unadjusted results:

[Covs]
Unadjusted   None
Model_1      age;sex
Model_2      age;sex;bmi
Full age;sex;bmi;smoking;PC1;PC2;PC3;PC4

Each model runs separately with regression tests (OLS, GLM-Binom, FIRTH).

Note

Not all tests support covariates. They are run only with the “Unadjusted” model.

Custom Tests

The tests are based on the clean-data package, which included the described tests. The package allows is desgned to work with statistical tests and regression callables from scipy, statsmodels, and lifelines and defining custom statistical tests is possible by modifying marvel/association/tests.py.

Test Structure

Tests are defined as dictionaries with three keys:

{
    'Test method': callable,     # The test function
    'P-value': int or str,       # Index or attribute for p-value
    'kwargs': dict,              # Optional keyword arguments
}

Adding a Custom Test

  1. Edit marvel/association/tests.py

  2. Add your test to the STATS class:

from scipy import stats

class STATS:
    def __post_init__(self):
        self.__tests = {
            # ... existing tests ...

            'MY_TEST': {
                BNames.TEST_METHOD: my_test_function,
                BNames.TEST_PVALUE: 0,
                BNames.TEST_KWARGS: {},
            },
        }

    @property
    def my_test(self):
        '''My custom test.'''
        return self.__tests['MY_TEST']
  1. Register in AllTests class:

class AllTests:
    def __post_init__(self):
        self.__tests = {
            # ... existing tests ...

            'MY_TEST': {
                TEST_DICT_NAMES.NAME: 'My Custom Test',
                TEST_DICT_NAMES.TEST: BTests().my_test,
                TEST_DICT_NAMES.STAT: STATS().my_test,
            },
        }
  1. Use in configuration:

[ConTests]
outcome      MY_TEST

Environment Variable Override

You can specify a custom tests module via environment variable:

export MARVEL_TEST_DEFS=/path/to/custom_tests.py
marvelous config.cnf -v

Output Interpretation

Results File Columns

Column

Description

Model

Covariate model name (from [Covs] section)

Model name

Full name of the statistical test

Variable

Outcome variable name

Exposure

Exposure/gene name

N (Cases)

Number of cases (binary outcomes)

N (Samples)

Total sample size for this test

Exposed

Number of exposed (carrier) samples

Non-exposed

Number of non-exposed (non-carrier) samples

Estimate

Effect estimate (beta for OLS, log-OR for logistic)

Std. Error

Standard error of estimate

Test statistic

Test statistic value (t, chi-square, etc.)

P-value

P-value from the test

Estimate (95% CI)

Formatted estimate with confidence interval

OR (95% CI)

Odds ratio with CI (binary outcomes only)

See Also