Adding New Config Options to MARVEL

This guide explains how to add a new option to the MARVEL config file and wire it through the pipeline. Follow every step in order — skipping a step will typically result in a silent failure or a validation error.

Overview

A MARVEL config file is an INI-style .cnf file. Each option travels through four layers before it reaches the pipeline code:

.cnf file
  → ConfigParser()          # raw text → dict
  → PipelineConfig          # dict → typed dataclass
  → check_config_file()     # validates inputs
  → Pipeline step           # uses the value

All four layers live in marvel/:

  • constants.py — section names, option keys, default values

  • utils/config_tools.py — parser, dataclass, validator

  • pipeline.py — extraction and association steps

Step 1 — Add the option key to constants.py

Open marvel/constants.py. There are three classes to update.

1a. Add the key name to CONFIG_LHS

CONFIG_LHS holds the left-hand side strings that appear in the config file. Add your key as a class attribute:

class CONFIG_LHS:
    # ... existing keys ...
    MyNewOption = 'my_new_option'   # <-- add this

The string value is exactly what the user writes in the .cnf file.

1b. Add a default value to OPTIONS

OPTIONS provides the value used when the user omits the option:

class OPTIONS:
    # ... existing defaults ...
    my_new_option: bool = False   # <-- add this

Use the same name as the CONFIG_LHS attribute value (snake_case). The type annotation is documentation only — Python does not enforce it here.

When to skip 1b: if the option is required (i.e. the pipeline must error when it is absent), do not add a default. Instead, add validation in Step 3.

Step 2 — Add the field to PipelineConfig

Open marvel/utils/config_tools.py and find the PipelineConfig dataclass.

2a. Declare the field

Add a typed field with a default drawn from OPTIONS:

@dataclass
class PipelineConfig:
    # ... existing fields ...
    my_new_option: bool = field(default=OPTIONS.my_new_option)

For a required field (no default):

my_new_option: str = field(default=None)  # validated later

2b. Populate the field in from_config_dict()

Inside the from_config_dict class method, find the block where Options section values are read. Follow the existing pattern:

@classmethod
def from_config_dict(cls, config: dict) -> "PipelineConfig":
    options = config.get(CONFIG_HEADERS.Options, {})

    # ... existing reads ...
    my_new_option = options.get(
        CONFIG_LHS.MyNewOption, OPTIONS.my_new_option
    )

    return cls(
        # ... existing kwargs ...
        my_new_option=my_new_option,
    )

Note

If your option lives in a section other than [Options] (e.g. [GenoInput]), read it from config.get(CONFIG_HEADERS.GenoInput, {}) instead.

Step 3 — Add validation to check_config_file()

Open marvel/utils/config_tools.py and find check_config_file().

Add a check in the appropriate block. There are three common patterns:

Required field (must be present):

if CONFIG_LHS.MyNewOption not in options:
    errors.append(
        f"[Options] '{CONFIG_LHS.MyNewOption}' is required."
    )

Conditional requirement (required only when another option is set):

if config.extract_variants and not config.my_new_option:
    errors.append(
        f"[Options] '{CONFIG_LHS.MyNewOption}' must be set "
        "when extract_variants=True."
    )

File existence check:

if config.my_new_option and not Path(config.my_new_option).exists():
    errors.append(
        f"[Options] '{CONFIG_LHS.MyNewOption}' path does not "
        f"exist: {config.my_new_option}"
    )

Errors are collected into a list and raised together at the end of the function — follow the same pattern used for existing checks.

Step 4 — Use the option in the pipeline

Open marvel/pipeline.py. The two pipeline steps are:

  • VariantExtractionStep — controls extraction from genetic files

  • AssociationTestingStep — controls statistical testing

Both receive the full PipelineConfig object (self._config).

Extraction step example:

class VariantExtractionStep:
    def execute(self) -> None:
        if self._config.my_new_option:
            # do something different during extraction
            ...

Association step example:

class AssociationTestingStep:
    def execute(self) -> None:
        if self._config.my_new_option:
            # alter testing behaviour
            ...

If the option needs to be passed down to extraction utilities (e.g. col_config), add it to the dictionary that is already being built there:

col_config = {
    # ... existing keys ...
    'my_new_option': self._config.my_new_option,
}

Step 5 — Update the example config file

Open resources/examples/example.cnf. Add a commented-out example of the new option in the appropriate section so users can discover it:

[Options]
extract_variants    True
# my_new_option     False    # Description of what this does

Step 6 — Update the constants docstring / reference table

If constants.py has a table or docstring listing available options, add a row for your new option. This keeps the reference material consistent with the code.

Step 7 — Write tests

All new options require tests. Create or extend test files under tests/ mirroring the package structure:

  • Unit tests — cover PipelineConfig.from_config_dict() with and without

the option present. - Validation tests — cover check_config_file() raising the expected error when required conditions are not met. - Integration test — cover the pipeline step using the option correctly.

Minimal unit test template:

def test_my_new_option_default(tmp_path):
    """PipelineConfig uses the correct default when option is absent."""
    config_dict = {CONFIG_HEADERS.Options: {}}
    pc = PipelineConfig.from_config_dict(config_dict)
    assert pc.my_new_option == OPTIONS.my_new_option


def test_my_new_option_set(tmp_path):
    """PipelineConfig reads my_new_option from config dict."""
    config_dict = {
        CONFIG_HEADERS.Options: {CONFIG_LHS.MyNewOption: True}
    }
    pc = PipelineConfig.from_config_dict(config_dict)
    assert pc.my_new_option is True

Quick-reference checklist

Use this checklist when adding any new option:

#

Task

File

1a

Add key string to CONFIG_LHS

marvel/constants.py

1b

Add default value to OPTIONS

marvel/constants.py

2a

Declare typed field on PipelineConfig

marvel/utils/config_tools.py

2b

Read value in from_config_dict()

marvel/utils/config_tools.py

3

Add validation in check_config_file()

marvel/utils/config_tools.py

4

Use the value in the pipeline step(s)

marvel/pipeline.py

5

Add commented example to .cnf file

resources/examples/example.cnf

6

Update reference table / docstrings

marvel/constants.py

7

Write unit + integration tests

tests/

Config file format reference

[SectionName]
key_name    value
  • Delimiter between key and value: tab (\t)

  • Comments: lines starting with #

  • Boolean values: True / False (case-insensitive)

  • Null values: None / null (case-insensitive)

  • Multiple values on one line: separated by ;

Worked example — adding a min_ac allele-count filter

This example adds an integer option min_ac (minimum allele count) that filters variants during extraction.

constants.py:

class CONFIG_LHS:
    MinAC = 'min_ac'            # <-- new

class OPTIONS:
    min_ac: int = 1             # <-- new (default: no filtering below 1)

config_tools.py — PipelineConfig:

@dataclass
class PipelineConfig:
    min_ac: int = field(default=OPTIONS.min_ac)   # <-- new

@classmethod
def from_config_dict(cls, config):
    options = config.get(CONFIG_HEADERS.Options, {})
    min_ac = int(options.get(CONFIG_LHS.MinAC, OPTIONS.min_ac))  # <-- new
    return cls(..., min_ac=min_ac)

config_tools.py — check_config_file():

if config.min_ac < 1:
    errors.append(
        f"[Options] '{CONFIG_LHS.MinAC}' must be >= 1, "
        f"got {config.min_ac}."
    )

pipeline.py — VariantExtractionStep:

col_config = {
    ...
    'min_ac': self._config.min_ac,   # <-- new
}

example.cnf:

[Options]
extract_variants    True
# min_ac            1    # Minimum allele count — variants below this are dropped