Usage Guide =========== This page covers how to run MARVELous from the command line. Command-Line Interface ---------------------- MARVELous provides a command-line interface through the ``marvelous`` command (installed with the package). Basic Syntax ^^^^^^^^^^^^ .. code-block:: bash marvelous [options] Arguments ^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 20 80 * - Argument - Description * - ``config_file`` - Path to the configuration file (required) Options ^^^^^^^ .. list-table:: :header-rows: 1 :widths: 25 75 * - Option - Description * - ``-v``, ``--verbose`` - Enable verbose output with detailed logging * - ``--outpath PATH`` - Override output directory from config file * - ``--dry-run`` - Validate configuration without running pipeline * - ``--log-file PATH`` - Write log output to file (warnings will not print to console) * - ``--version`` - Show version number and exit * - ``-h``, ``--help`` - Show help message and exit Examples ^^^^^^^^ **Run full pipeline:** .. code-block:: bash marvelous /path/to/config.cnf -v **Validate configuration (dry run):** .. code-block:: bash marvelous /path/to/config.cnf --dry-run **Override output directory:** .. code-block:: bash marvelous /path/to/config.cnf --outpath /custom/output/dir **Write warnings to file:** .. code-block:: bash marvelous /path/to/config.cnf -v --log-file analysis.log Programmatic Usage ------------------ MARVELous can also be used as a Python library for integration into scripts or notebooks. See the :doc:`api/pipeline` documentation for details. Output Files ------------ Extraction Output ^^^^^^^^^^^^^^^^^ When variant extraction is enabled, MARVELous creates: .. list-table:: :header-rows: 1 :widths: 35 65 * - File - Description * - ``{VarOutput}_carriers.tsv.gz`` - Carrier matrix (samples × variants/genes) * - ``{VarOutput}_summary.tsv.gz`` - Extraction summary with variant counts **Carrier file format:** .. code-block:: text id GENE1 GENE2 variant_1 variant_2 sample1 1 0 1 0 sample2 0 1 0 1 sample3 2 0 1 1 Values indicate allele count (0, 1, or 2 for diploid). If variants are combined using the `cat_column` option, the number can become higher, because it is a sum of the variants. For more information on the values, see :doc:`advanced`. Association Output ^^^^^^^^^^^^^^^^^^ When association testing is enabled, MARVELous creates for each exposure: .. list-table:: :header-rows: 1 :widths: 35 65 * - File - Description * - ``{exposure}_results.tsv.gz`` - Association test results * - ``{exposure}_baseline.tsv`` - Baseline characteristics table **Results file columns:** .. list-table:: :header-rows: 1 :widths: 25 75 * - Column - Description * - Model - Covariate model name * - Model name - Statistical test name * - Variable - Outcome variable name * - Exposure - Exposure variable name * - N (Cases) - Number of cases (binary outcomes) * - N (Samples) - Total sample size * - Exposed - Number of exposed samples * - Non-exposed - Number of non-exposed samples * - Estimate - Effect estimate (beta or OR) * - Std. Error - Standard error * - Test statistic - Test statistic value * - P-value - P-value * - Estimate (95% CI) - Formatted estimate with confidence interval * - OR (95% CI) - Odds ratio with confidence interval (binary outcomes) Dry Run Mode ------------ Use ``--dry-run`` to validate your configuration without running the analysis: .. code-block:: bash marvelous config.cnf --dry-run This will: 1. Parse and validate the configuration file 2. Check that all input files exist 3. Verify column names in input files 4. Check that specified tests are defined 5. Print a configuration summary Workflow Examples ----------------- Full Analysis ^^^^^^^^^^^^^ A typical full analysis workflow: .. code-block:: bash # 1. Validate configuration marvelous analysis.cnf --dry-run # 2. Run full pipeline marvelous analysis.cnf -v --log-file analysis.log # 3. Check results ls ./results/ The configuration file can be created manually, or using a helper function included in the package. For more information see :doc:`configuration`. Please refer to the full example here: :doc:`examples/cli_example`. Two-Stage Workflow ^^^^^^^^^^^^^^^^^^ For large analyses or more control, run extraction and association separately: **Stage 1: Extraction** Create ``extraction.cnf``: .. code-block:: ini [GenoInput] chr22 /data/chr22.vcf.gz [VarInput] variants /data/variants.tsv [Output] VarOutput /results/carriers [Options] extract_variants True association_analysis False Run: .. code-block:: bash marvelous extraction.cnf -v **Stage 2: Association** Create ``association.cnf``: .. code-block:: ini [ExpInput] carriers /results/carriers_carriers.tsv.gz [PhenoInput] phenotypes /data/outcomes.tsv covariates /data/covariates.tsv [BinTests] disease GLM-Binom;FISHER [Covs] Adjusted age;sex [Options] extract_variants False association_analysis True output_path /results Run: .. code-block:: bash marvelous association.cnf -v Error Handling -------------- Common errors and solutions: **Configuration file not found:** .. code-block:: text FileNotFoundError: Configuration file not found: config.cnf *Solution:* Check the path to your configuration file. **Missing required headers:** .. code-block:: text ConfigHeaderMissingError: The following headers are missing: ['PhenoInput'] *Solution:* Add the required section to your configuration file. **Input file not found:** .. code-block:: text FileNotFoundError: The following input files are missing: ['/path/to/file.vcf.gz'] *Solution:* Verify paths in your configuration file. **Column not found:** .. code-block:: text InputValidationError: The columns ['outcome1'] were not present in the input files *Solution:* Check column names in your phenotype file. **Unsupported genetic file type:** .. code-block:: text TypeError: Unsupported genetic file extension *Solution:* Use VCF (.vcf, .vcf.gz), BGEN (.bgen), or PLINK (.bed) files. See Also -------- - :doc:`configuration` - Configuration file reference - :doc:`getting_started` - Quick start guide - :doc:`advanced` - Advanced features