Adding New Config Options to MARVEL
This guide explains how to add a new option to the MARVEL config file and wire it through the pipeline. Follow every step in order — skipping a step will typically result in a silent failure or a validation error.
—
Overview
A MARVEL config file is an INI-style .cnf file. Each option travels through
four layers before it reaches the pipeline code:
.cnf file
→ ConfigParser() # raw text → dict
→ PipelineConfig # dict → typed dataclass
→ check_config_file() # validates inputs
→ Pipeline step # uses the value
All four layers live in marvel/:
constants.py— section names, option keys, default valuesutils/config_tools.py— parser, dataclass, validatorpipeline.py— extraction and association steps
—
Step 1 — Add the option key to constants.py
Open marvel/constants.py. There are three classes to update.
1a. Add the key name to CONFIG_LHS
CONFIG_LHS holds the left-hand side strings that appear in the config file.
Add your key as a class attribute:
class CONFIG_LHS:
# ... existing keys ...
MyNewOption = 'my_new_option' # <-- add this
The string value is exactly what the user writes in the .cnf file.
1b. Add a default value to OPTIONS
OPTIONS provides the value used when the user omits the option:
class OPTIONS:
# ... existing defaults ...
my_new_option: bool = False # <-- add this
Use the same name as the CONFIG_LHS attribute value (snake_case).
The type annotation is documentation only — Python does not enforce it here.
When to skip 1b: if the option is required (i.e. the pipeline must error when it is absent), do not add a default. Instead, add validation in Step 3.
—
Step 2 — Add the field to PipelineConfig
Open marvel/utils/config_tools.py and find the PipelineConfig dataclass.
2a. Declare the field
Add a typed field with a default drawn from OPTIONS:
@dataclass
class PipelineConfig:
# ... existing fields ...
my_new_option: bool = field(default=OPTIONS.my_new_option)
For a required field (no default):
my_new_option: str = field(default=None) # validated later
2b. Populate the field in from_config_dict()
Inside the from_config_dict class method, find the block where Options
section values are read. Follow the existing pattern:
@classmethod
def from_config_dict(cls, config: dict) -> "PipelineConfig":
options = config.get(CONFIG_HEADERS.Options, {})
# ... existing reads ...
my_new_option = options.get(
CONFIG_LHS.MyNewOption, OPTIONS.my_new_option
)
return cls(
# ... existing kwargs ...
my_new_option=my_new_option,
)
Note
If your option lives in a section other than [Options]
(e.g. [GenoInput]), read it from
config.get(CONFIG_HEADERS.GenoInput, {}) instead.
—
Step 3 — Add validation to check_config_file()
Open marvel/utils/config_tools.py and find check_config_file().
Add a check in the appropriate block. There are three common patterns:
Required field (must be present):
if CONFIG_LHS.MyNewOption not in options:
errors.append(
f"[Options] '{CONFIG_LHS.MyNewOption}' is required."
)
Conditional requirement (required only when another option is set):
if config.extract_variants and not config.my_new_option:
errors.append(
f"[Options] '{CONFIG_LHS.MyNewOption}' must be set "
"when extract_variants=True."
)
File existence check:
if config.my_new_option and not Path(config.my_new_option).exists():
errors.append(
f"[Options] '{CONFIG_LHS.MyNewOption}' path does not "
f"exist: {config.my_new_option}"
)
Errors are collected into a list and raised together at the end of the function — follow the same pattern used for existing checks.
—
Step 4 — Use the option in the pipeline
Open marvel/pipeline.py. The two pipeline steps are:
VariantExtractionStep— controls extraction from genetic filesAssociationTestingStep— controls statistical testing
Both receive the full PipelineConfig object (self._config).
Extraction step example:
class VariantExtractionStep:
def execute(self) -> None:
if self._config.my_new_option:
# do something different during extraction
...
Association step example:
class AssociationTestingStep:
def execute(self) -> None:
if self._config.my_new_option:
# alter testing behaviour
...
If the option needs to be passed down to extraction utilities
(e.g. col_config), add it to the dictionary that is already being
built there:
col_config = {
# ... existing keys ...
'my_new_option': self._config.my_new_option,
}
—
Step 5 — Update the example config file
Open resources/examples/example.cnf. Add a commented-out example of the new
option in the appropriate section so users can discover it:
[Options]
extract_variants True
# my_new_option False # Description of what this does
—
Step 6 — Update the constants docstring / reference table
If constants.py has a table or docstring listing available options, add a
row for your new option. This keeps the reference material consistent with the
code.
—
Step 7 — Write tests
All new options require tests. Create or extend test files under tests/
mirroring the package structure:
Unit tests — cover
PipelineConfig.from_config_dict()with and without
the option present.
- Validation tests — cover check_config_file() raising the expected
error when required conditions are not met.
- Integration test — cover the pipeline step using the option correctly.
Minimal unit test template:
def test_my_new_option_default(tmp_path):
"""PipelineConfig uses the correct default when option is absent."""
config_dict = {CONFIG_HEADERS.Options: {}}
pc = PipelineConfig.from_config_dict(config_dict)
assert pc.my_new_option == OPTIONS.my_new_option
def test_my_new_option_set(tmp_path):
"""PipelineConfig reads my_new_option from config dict."""
config_dict = {
CONFIG_HEADERS.Options: {CONFIG_LHS.MyNewOption: True}
}
pc = PipelineConfig.from_config_dict(config_dict)
assert pc.my_new_option is True
—
Quick-reference checklist
Use this checklist when adding any new option:
# |
Task |
File |
|---|---|---|
1a |
Add key string to |
|
1b |
Add default value to |
|
2a |
Declare typed field on |
|
2b |
Read value in |
|
3 |
Add validation in |
|
4 |
Use the value in the pipeline step(s) |
|
5 |
Add commented example to |
|
6 |
Update reference table / docstrings |
|
7 |
Write unit + integration tests |
|
—
Config file format reference
[SectionName]
key_name value
Delimiter between key and value: tab (
\t)Comments: lines starting with
#Boolean values:
True/False(case-insensitive)Null values:
None/null(case-insensitive)Multiple values on one line: separated by
;
—
Worked example — adding a min_ac allele-count filter
This example adds an integer option min_ac (minimum allele count)
that filters variants during extraction.
constants.py:
class CONFIG_LHS:
MinAC = 'min_ac' # <-- new
class OPTIONS:
min_ac: int = 1 # <-- new (default: no filtering below 1)
config_tools.py — PipelineConfig:
@dataclass
class PipelineConfig:
min_ac: int = field(default=OPTIONS.min_ac) # <-- new
@classmethod
def from_config_dict(cls, config):
options = config.get(CONFIG_HEADERS.Options, {})
min_ac = int(options.get(CONFIG_LHS.MinAC, OPTIONS.min_ac)) # <-- new
return cls(..., min_ac=min_ac)
config_tools.py — check_config_file():
if config.min_ac < 1:
errors.append(
f"[Options] '{CONFIG_LHS.MinAC}' must be >= 1, "
f"got {config.min_ac}."
)
pipeline.py — VariantExtractionStep:
col_config = {
...
'min_ac': self._config.min_ac, # <-- new
}
example.cnf:
[Options]
extract_variants True
# min_ac 1 # Minimum allele count — variants below this are dropped