Usage Guide
===========

This page covers how to run MARVELous from the command line.

Command-Line Interface
----------------------

MARVELous provides a command-line interface through the ``marvelous`` command
(installed with the package).

Basic Syntax
^^^^^^^^^^^^

.. code-block:: bash

   marvelous <config_file> [options]

Arguments
^^^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Argument
     - Description
   * - ``config_file``
     - Path to the configuration file (required)

Options
^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Option
     - Description
   * - ``-v``, ``--verbose``
     - Enable verbose output with detailed logging
   * - ``--outpath PATH``
     - Override output directory from config file
   * - ``--dry-run``
     - Validate configuration without running pipeline
   * - ``--log-file PATH``
     - Write log output to file (warnings will not print to console)
   * - ``--version``
     - Show version number and exit
   * - ``-h``, ``--help``
     - Show help message and exit


Examples
^^^^^^^^

**Run full pipeline:**

.. code-block:: bash

   marvelous /path/to/config.cnf -v

**Validate configuration (dry run):**

.. code-block:: bash

   marvelous /path/to/config.cnf --dry-run

**Override output directory:**

.. code-block:: bash

   marvelous /path/to/config.cnf --outpath /custom/output/dir

**Write warnings to file:**

.. code-block:: bash

   marvelous /path/to/config.cnf -v --log-file analysis.log


Programmatic Usage
------------------

MARVELous can also be used as a Python library for integration into scripts or
notebooks. See the :doc:`api/pipeline` documentation for details.


Output Files
------------

Extraction Output
^^^^^^^^^^^^^^^^^

When variant extraction is enabled, MARVELous creates:

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - File
     - Description
   * - ``{VarOutput}_carriers.tsv.gz``
     - Carrier matrix (samples × variants/genes)
   * - ``{VarOutput}_summary.tsv.gz``
     - Extraction summary with variant counts

**Carrier file format:**

.. code-block:: text

   id      GENE1   GENE2   variant_1       variant_2
   sample1 1       0       1               0
   sample2 0       1       0               1
   sample3 2       0       1               1

Values indicate allele count (0, 1, or 2 for diploid).
If variants are combined using the `cat_column` option, the number can become
higher, because it is a sum of the variants.
For more information on the values, see :doc:`advanced`.


Association Output
^^^^^^^^^^^^^^^^^^

When association testing is enabled, MARVELous creates for each exposure:

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - File
     - Description
   * - ``{exposure}_results.tsv.gz``
     - Association test results
   * - ``{exposure}_baseline.tsv``
     - Baseline characteristics table

**Results file columns:**

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Column
     - Description
   * - Model
     - Covariate model name
   * - Model name
     - Statistical test name
   * - Variable
     - Outcome variable name
   * - Exposure
     - Exposure variable name
   * - N (Cases)
     - Number of cases (binary outcomes)
   * - N (Samples)
     - Total sample size
   * - Exposed
     - Number of exposed samples
   * - Non-exposed
     - Number of non-exposed samples
   * - Estimate
     - Effect estimate (beta or OR)
   * - Std. Error
     - Standard error
   * - Test statistic
     - Test statistic value
   * - P-value
     - P-value
   * - Estimate (95% CI)
     - Formatted estimate with confidence interval
   * - OR (95% CI)
     - Odds ratio with confidence interval (binary outcomes)


Dry Run Mode
------------

Use ``--dry-run`` to validate your configuration without running the analysis:

.. code-block:: bash

   marvelous config.cnf --dry-run

This will:

1. Parse and validate the configuration file
2. Check that all input files exist
3. Verify column names in input files
4. Check that specified tests are defined
5. Print a configuration summary


Workflow Examples
-----------------

Full Analysis
^^^^^^^^^^^^^

A typical full analysis workflow:

.. code-block:: bash

   # 1. Validate configuration
   marvelous analysis.cnf --dry-run

   # 2. Run full pipeline
   marvelous analysis.cnf -v --log-file analysis.log

   # 3. Check results
   ls ./results/

The configuration file can be created manually, or using a helper function
included in the package. For more information see :doc:`configuration`.

Please refer to the full example here: :doc:`examples/cli_example`.

Two-Stage Workflow
^^^^^^^^^^^^^^^^^^

For large analyses or more control, run extraction and association separately:

**Stage 1: Extraction**

Create ``extraction.cnf``:

.. code-block:: ini

   [GenoInput]
   chr22	/data/chr22.vcf.gz

   [VarInput]
   variants	/data/variants.tsv

   [Output]
   VarOutput	/results/carriers

   [Options]
   extract_variants	True
   association_analysis	False

Run:

.. code-block:: bash

   marvelous extraction.cnf -v

**Stage 2: Association**

Create ``association.cnf``:

.. code-block:: ini

   [ExpInput]
   carriers	/results/carriers_carriers.tsv.gz

   [PhenoInput]
   phenotypes	/data/outcomes.tsv
   covariates	/data/covariates.tsv

   [BinTests]
   disease	GLM-Binom;FISHER

   [Covs]
   Adjusted	age;sex

   [Options]
   extract_variants	False
   association_analysis	True
   output_path	/results

Run:

.. code-block:: bash

   marvelous association.cnf -v


Error Handling
--------------

Common errors and solutions:

**Configuration file not found:**

.. code-block:: text

   FileNotFoundError: Configuration file not found: config.cnf

*Solution:* Check the path to your configuration file.

**Missing required headers:**

.. code-block:: text

   ConfigHeaderMissingError: The following headers are missing: ['PhenoInput']

*Solution:* Add the required section to your configuration file.

**Input file not found:**

.. code-block:: text

   FileNotFoundError: The following input files are missing: ['/path/to/file.vcf.gz']

*Solution:* Verify paths in your configuration file.

**Column not found:**

.. code-block:: text

   InputValidationError: The columns ['outcome1'] were not present in the input files

*Solution:* Check column names in your phenotype file.

**Unsupported genetic file type:**

.. code-block:: text

   TypeError: Unsupported genetic file extension

*Solution:* Use VCF (.vcf, .vcf.gz), BGEN (.bgen), or PLINK (.bed) files.


See Also
--------

- :doc:`configuration` - Configuration file reference
- :doc:`getting_started` - Quick start guide
- :doc:`advanced` - Advanced features