marvel.pipeline module

The pipeline module contains the main orchestrator classes for running the MARVELous analysis pipeline.

Main Pipeline

class marvel.pipeline.MARVELousPipeline(config: PipelineConfig, verbose: bool = True)

Bases: object

Main pipeline orchestrator for MARVELous

The main orchestrator class that coordinates extraction and association steps.

Example usage:

from marvel.pipeline import MARVELousPipeline

# Create from config file
pipeline = MARVELousPipeline.from_config_file("config.cnf")

# Run pipeline
results = pipeline.run()

# Get summary
print(pipeline.get_summary())
classmethod from_config_file(config_path: str, verbose: bool = True) MARVELousPipeline

Create pipeline from configuration file

Parameters:
  • config_path (str) – Path to configuration file

  • verbose (bool) – Enable verbose output

Returns:

Initialized pipeline

Return type:

MARVELousPipeline

get_summary() dict

Get pipeline execution summary

run() dict

Run the complete pipeline

Returns:

Results from all pipeline steps

Return type:

dict

Pipeline Steps

Abstract Base Class

class marvel.pipeline.PipelineStep(config: PipelineConfig, verbose=False, logger=None)

Bases: ABC

Abstract base class for pipeline steps

Abstract base class that defines the interface for pipeline steps.

abstract execute() dict

Execute the pipeline step

Returns:

Results dictionary

Return type:

dict

log_error(message: str)

Log error message

log_info(message: str)

Log info message

log_warning(message: str)

Log warning message

Variant Extraction Step

class marvel.pipeline.VariantExtractionStep(config: PipelineConfig, verbose=False, logger=None)

Bases: PipelineStep

Pipeline step for variant extraction

Handles variant extraction from genetic files (VCF, BGEN, PLINK).

Example usage:

from marvel.pipeline import VariantExtractionStep
from marvel.utils.config_tools import PipelineConfig

config = PipelineConfig(
    extract_variants=True,
    geno_files={"chr22": "/data/chr22.vcf.gz"},
    var_files={"variants": "/data/variants.tsv"},
    output_path="./results"
)

step = VariantExtractionStep(config, verbose=True)
results = step.execute()

# Results contain:
# - output_files: List of created carrier files
# - summary: DataFrame with extraction summary
# - carriers: DataFrame with carrier matrix
# - results: Dict with successful/failed task info
execute() dict

Execute variant extraction

Returns:

Dictionary with keys: ‘output_files’, ‘summary’, ‘carriers’

Return type:

dict

Association Testing Step

class marvel.pipeline.AssociationTestingStep(config: PipelineConfig, variant_files: List[str], outcome_cache_size: int = 20, covariate_cache_size: int = 20, verbose=False, logger=None)

Bases: PipelineStep

Pipeline step for association testing

Handles association testing between genetic exposures and phenotypic outcomes.

Example usage:

from marvel.pipeline import AssociationTestingStep
from marvel.utils.config_tools import PipelineConfig

config = PipelineConfig(
    association_analysis=True,
    pheno_file="/data/phenotypes.tsv",
    cov_file="/data/covariates.tsv",
    id_column="IID",
    output_path="./results",
    outcomes={
        "blood_pressure": {
            "outcome_coltype": "continuous",
            "tests": ["OLS", "KW"]
        }
    },
    covariate_models={"Unadjusted": None}
)

step = AssociationTestingStep(
    config=config,
    variant_files=["carriers.tsv.gz"],
    verbose=True
)
results = step.execute()

# Results contain:
# - completed: List of successfully tested exposures
# - failed: List of (exposure, error) tuples
# - skipped: List of skipped exposures
execute() dict

Execute association testing

Returns:

Dictionary with test results

Return type:

dict

Entry Point

MARVELous Entry Point

Command-line entry point for MARVELous.

Example:

from marvel.marvelous_entry import marvelous

# Run programmatically
exit_code = marvelous(
    config_file="config.cnf",
    outpath="./custom_output",
    dry_run=False,
    verbose=True
)
marvel.marvelous_entry.main()

Main entry point for MARVELous

marvel.marvelous_entry.marvelous(config_file: str, outpath: str = '', dry_run: bool = False, verbose: bool = True)

Perform MARVELous

Parameters:
  • config_file (str) – Path to configuration file

  • outpath (str) – Override path to output directory from configuration file

  • dry_run (bool) – Whether dry run should be performed

  • verbose (bool) – Enable verbose output