Adding New Config Options to MARVEL

This guide explains how to add a new option to the MARVEL config file and wire it through the pipeline. Follow every step in order — skipping a step will typically result in a silent failure or a validation error.

—

Overview 

A MARVEL config file is an INI-style .cnf file. Each option travels through four layers before it reaches the pipeline code:

.cnf file
  → ConfigParser()          # raw text → dict
  → PipelineConfig          # dict → typed dataclass
  → check_config_file()     # validates inputs
  → Pipeline step           # uses the value

All four layers live in marvel/:

constants.py — section names, option keys, default values
utils/config_tools.py — parser, dataclass, validator
pipeline.py — extraction and association steps

—

Step 1 — Add the option key to `constants.py`

Open marvel/constants.py. There are three classes to update.

1a. Add the key name to CONFIG_LHS

CONFIG_LHS holds the left-hand side strings that appear in the config file. Add your key as a class attribute:

class CONFIG_LHS:
    # ... existing keys ...
    MyNewOption = 'my_new_option'   # <-- add this

The string value is exactly what the user writes in the .cnf file.

1b. Add a default value to OPTIONS

OPTIONS provides the value used when the user omits the option:

class OPTIONS:
    # ... existing defaults ...
    my_new_option: bool = False   # <-- add this

Use the same name as the CONFIG_LHS attribute value (snake_case). The type annotation is documentation only — Python does not enforce it here.

When to skip 1b: if the option is required (i.e. the pipeline must error when it is absent), do not add a default. Instead, add validation in Step 3.

—

Step 2 — Add the field to `PipelineConfig`

Open marvel/utils/config_tools.py and find the PipelineConfig dataclass.

2a. Declare the field

Add a typed field with a default drawn from OPTIONS:

@dataclass
class PipelineConfig:
    # ... existing fields ...
    my_new_option: bool = field(default=OPTIONS.my_new_option)

For a required field (no default):

my_new_option: str = field(default=None)  # validated later

2b. Populate the field in from_config_dict()

Inside the from_config_dict class method, find the block where Options section values are read. Follow the existing pattern:

@classmethod
def from_config_dict(cls, config: dict) -> "PipelineConfig":
    options = config.get(CONFIG_HEADERS.Options, {})

    # ... existing reads ...
    my_new_option = options.get(
        CONFIG_LHS.MyNewOption, OPTIONS.my_new_option
    )

    return cls(
        # ... existing kwargs ...
        my_new_option=my_new_option,
    )

Note

If your option lives in a section other than [Options] (e.g. [GenoInput]), read it from config.get(CONFIG_HEADERS.GenoInput, {}) instead.

—

Step 3 — Add validation to `check_config_file()`

Open marvel/utils/config_tools.py and find check_config_file().

Add a check in the appropriate block. There are three common patterns:

Required field (must be present):

if CONFIG_LHS.MyNewOption not in options:
    errors.append(
        f"[Options] '{CONFIG_LHS.MyNewOption}' is required."
    )

Conditional requirement (required only when another option is set):

if config.extract_variants and not config.my_new_option:
    errors.append(
        f"[Options] '{CONFIG_LHS.MyNewOption}' must be set "
        "when extract_variants=True."
    )

File existence check:

if config.my_new_option and not Path(config.my_new_option).exists():
    errors.append(
        f"[Options] '{CONFIG_LHS.MyNewOption}' path does not "
        f"exist: {config.my_new_option}"
    )

Errors are collected into a list and raised together at the end of the function — follow the same pattern used for existing checks.

—

Step 4 — Use the option in the pipeline 

Open marvel/pipeline.py. The two pipeline steps are:

VariantExtractionStep — controls extraction from genetic files
AssociationTestingStep — controls statistical testing

Both receive the full PipelineConfig object (self._config).

Extraction step example:

class VariantExtractionStep:
    def execute(self) -> None:
        if self._config.my_new_option:
            # do something different during extraction
            ...

Association step example:

class AssociationTestingStep:
    def execute(self) -> None:
        if self._config.my_new_option:
            # alter testing behaviour
            ...

If the option needs to be passed down to extraction utilities (e.g. col_config), add it to the dictionary that is already being built there:

col_config = {
    # ... existing keys ...
    'my_new_option': self._config.my_new_option,
}

—

Step 5 — Update the example config file 

Open resources/examples/example.cnf. Add a commented-out example of the new option in the appropriate section so users can discover it:

[Options]
extract_variants    True
# my_new_option     False    # Description of what this does

—

Step 6 — Update the constants docstring / reference table 

If constants.py has a table or docstring listing available options, add a row for your new option. This keeps the reference material consistent with the code.

—

Step 7 — Write tests 

All new options require tests. Create or extend test files under tests/ mirroring the package structure:

Unit tests — cover PipelineConfig.from_config_dict() with and without

the option present. - Validation tests — cover check_config_file() raising the expected error when required conditions are not met. - Integration test — cover the pipeline step using the option correctly.

Minimal unit test template:

def test_my_new_option_default(tmp_path):
    """PipelineConfig uses the correct default when option is absent."""
    config_dict = {CONFIG_HEADERS.Options: {}}
    pc = PipelineConfig.from_config_dict(config_dict)
    assert pc.my_new_option == OPTIONS.my_new_option


def test_my_new_option_set(tmp_path):
    """PipelineConfig reads my_new_option from config dict."""
    config_dict = {
        CONFIG_HEADERS.Options: {CONFIG_LHS.MyNewOption: True}
    }
    pc = PipelineConfig.from_config_dict(config_dict)
    assert pc.my_new_option is True

—

Quick-reference checklist 

Use this checklist when adding any new option:

#	Task	File
1a	Add key string to `CONFIG_LHS`	`marvel/constants.py`
1b	Add default value to `OPTIONS`	`marvel/constants.py`
2a	Declare typed field on `PipelineConfig`	`marvel/utils/config_tools.py`
2b	Read value in `from_config_dict()`	`marvel/utils/config_tools.py`
3	Add validation in `check_config_file()`	`marvel/utils/config_tools.py`
4	Use the value in the pipeline step(s)	`marvel/pipeline.py`
5	Add commented example to `.cnf` file	`resources/examples/example.cnf`
6	Update reference table / docstrings	`marvel/constants.py`
7	Write unit + integration tests	`tests/`

—

Config file format reference 

[SectionName]
key_name    value

Delimiter between key and value: tab (\t)
Comments: lines starting with #
Boolean values: True / False (case-insensitive)
Null values: None / null (case-insensitive)
Multiple values on one line: separated by ;

—

Worked example — adding a `min_ac` allele-count filter 

This example adds an integer option min_ac (minimum allele count) that filters variants during extraction.

constants.py:

class CONFIG_LHS:
    MinAC = 'min_ac'            # <-- new

class OPTIONS:
    min_ac: int = 1             # <-- new (default: no filtering below 1)

config_tools.py — PipelineConfig:

@dataclass
class PipelineConfig:
    min_ac: int = field(default=OPTIONS.min_ac)   # <-- new

@classmethod
def from_config_dict(cls, config):
    options = config.get(CONFIG_HEADERS.Options, {})
    min_ac = int(options.get(CONFIG_LHS.MinAC, OPTIONS.min_ac))  # <-- new
    return cls(..., min_ac=min_ac)

config_tools.py — check_config_file():

if config.min_ac < 1:
    errors.append(
        f"[Options] '{CONFIG_LHS.MinAC}' must be >= 1, "
        f"got {config.min_ac}."
    )

pipeline.py — VariantExtractionStep:

col_config = {
    ...
    'min_ac': self._config.min_ac,   # <-- new
}

example.cnf:

[Options]
extract_variants    True
# min_ac            1    # Minimum allele count — variants below this are dropped