.. _dev_guide_adding_config_options: Adding New Config Options to MARVEL ===================================== This guide explains how to add a new option to the MARVEL config file and wire it through the pipeline. Follow every step in order — skipping a step will typically result in a silent failure or a validation error. .. contents:: Steps at a glance :local: :depth: 1 --- Overview -------- A MARVEL config file is an INI-style ``.cnf`` file. Each option travels through four layers before it reaches the pipeline code: .. code-block:: text .cnf file → ConfigParser() # raw text → dict → PipelineConfig # dict → typed dataclass → check_config_file() # validates inputs → Pipeline step # uses the value All four layers live in ``marvel/``: - ``constants.py`` — section names, option keys, default values - ``utils/config_tools.py`` — parser, dataclass, validator - ``pipeline.py`` — extraction and association steps --- Step 1 — Add the option key to ``constants.py`` ------------------------------------------------ Open ``marvel/constants.py``. There are three classes to update. **1a. Add the key name to** ``CONFIG_LHS`` ``CONFIG_LHS`` holds the left-hand side strings that appear in the config file. Add your key as a class attribute: .. code-block:: python class CONFIG_LHS: # ... existing keys ... MyNewOption = 'my_new_option' # <-- add this The string value is exactly what the user writes in the ``.cnf`` file. **1b. Add a default value to** ``OPTIONS`` ``OPTIONS`` provides the value used when the user omits the option: .. code-block:: python class OPTIONS: # ... existing defaults ... my_new_option: bool = False # <-- add this Use the same name as the ``CONFIG_LHS`` attribute value (snake_case). The type annotation is documentation only — Python does not enforce it here. **When to skip 1b:** if the option is *required* (i.e. the pipeline must error when it is absent), do not add a default. Instead, add validation in Step 3. --- Step 2 — Add the field to ``PipelineConfig`` -------------------------------------------- Open ``marvel/utils/config_tools.py`` and find the ``PipelineConfig`` dataclass. **2a. Declare the field** Add a typed field with a default drawn from ``OPTIONS``: .. code-block:: python @dataclass class PipelineConfig: # ... existing fields ... my_new_option: bool = field(default=OPTIONS.my_new_option) For a required field (no default): .. code-block:: python my_new_option: str = field(default=None) # validated later **2b. Populate the field in** ``from_config_dict()`` Inside the ``from_config_dict`` class method, find the block where ``Options`` section values are read. Follow the existing pattern: .. code-block:: python @classmethod def from_config_dict(cls, config: dict) -> "PipelineConfig": options = config.get(CONFIG_HEADERS.Options, {}) # ... existing reads ... my_new_option = options.get( CONFIG_LHS.MyNewOption, OPTIONS.my_new_option ) return cls( # ... existing kwargs ... my_new_option=my_new_option, ) .. note:: If your option lives in a section *other than* ``[Options]`` (e.g. ``[GenoInput]``), read it from ``config.get(CONFIG_HEADERS.GenoInput, {})`` instead. --- Step 3 — Add validation to ``check_config_file()`` ---------------------------------------------------- Open ``marvel/utils/config_tools.py`` and find ``check_config_file()``. Add a check in the appropriate block. There are three common patterns: **Required field (must be present):** .. code-block:: python if CONFIG_LHS.MyNewOption not in options: errors.append( f"[Options] '{CONFIG_LHS.MyNewOption}' is required." ) **Conditional requirement (required only when another option is set):** .. code-block:: python if config.extract_variants and not config.my_new_option: errors.append( f"[Options] '{CONFIG_LHS.MyNewOption}' must be set " "when extract_variants=True." ) **File existence check:** .. code-block:: python if config.my_new_option and not Path(config.my_new_option).exists(): errors.append( f"[Options] '{CONFIG_LHS.MyNewOption}' path does not " f"exist: {config.my_new_option}" ) Errors are collected into a list and raised together at the end of the function — follow the same pattern used for existing checks. --- Step 4 — Use the option in the pipeline ---------------------------------------- Open ``marvel/pipeline.py``. The two pipeline steps are: - ``VariantExtractionStep`` — controls extraction from genetic files - ``AssociationTestingStep`` — controls statistical testing Both receive the full ``PipelineConfig`` object (``self._config``). **Extraction step example:** .. code-block:: python class VariantExtractionStep: def execute(self) -> None: if self._config.my_new_option: # do something different during extraction ... **Association step example:** .. code-block:: python class AssociationTestingStep: def execute(self) -> None: if self._config.my_new_option: # alter testing behaviour ... If the option needs to be passed down to extraction utilities (e.g. ``col_config``), add it to the dictionary that is already being built there: .. code-block:: python col_config = { # ... existing keys ... 'my_new_option': self._config.my_new_option, } --- Step 5 — Update the example config file ----------------------------------------- Open ``resources/examples/example.cnf``. Add a commented-out example of the new option in the appropriate section so users can discover it: .. code-block:: ini [Options] extract_variants True # my_new_option False # Description of what this does --- Step 6 — Update the constants docstring / reference table ---------------------------------------------------------- If ``constants.py`` has a table or docstring listing available options, add a row for your new option. This keeps the reference material consistent with the code. --- Step 7 — Write tests --------------------- All new options require tests. Create or extend test files under ``tests/`` mirroring the package structure: - **Unit tests** — cover ``PipelineConfig.from_config_dict()`` with and without the option present. - **Validation tests** — cover ``check_config_file()`` raising the expected error when required conditions are not met. - **Integration test** — cover the pipeline step using the option correctly. Minimal unit test template: .. code-block:: python def test_my_new_option_default(tmp_path): """PipelineConfig uses the correct default when option is absent.""" config_dict = {CONFIG_HEADERS.Options: {}} pc = PipelineConfig.from_config_dict(config_dict) assert pc.my_new_option == OPTIONS.my_new_option def test_my_new_option_set(tmp_path): """PipelineConfig reads my_new_option from config dict.""" config_dict = { CONFIG_HEADERS.Options: {CONFIG_LHS.MyNewOption: True} } pc = PipelineConfig.from_config_dict(config_dict) assert pc.my_new_option is True --- Quick-reference checklist -------------------------- Use this checklist when adding any new option: .. list-table:: :header-rows: 1 :widths: 5 50 30 * - # - Task - File * - 1a - Add key string to ``CONFIG_LHS`` - ``marvel/constants.py`` * - 1b - Add default value to ``OPTIONS`` - ``marvel/constants.py`` * - 2a - Declare typed field on ``PipelineConfig`` - ``marvel/utils/config_tools.py`` * - 2b - Read value in ``from_config_dict()`` - ``marvel/utils/config_tools.py`` * - 3 - Add validation in ``check_config_file()`` - ``marvel/utils/config_tools.py`` * - 4 - Use the value in the pipeline step(s) - ``marvel/pipeline.py`` * - 5 - Add commented example to ``.cnf`` file - ``resources/examples/example.cnf`` * - 6 - Update reference table / docstrings - ``marvel/constants.py`` * - 7 - Write unit + integration tests - ``tests/`` --- Config file format reference ------------------------------ .. code-block:: ini [SectionName] key_name value - Delimiter between key and value: **tab** (``\t``) - Comments: lines starting with ``#`` - Boolean values: ``True`` / ``False`` (case-insensitive) - Null values: ``None`` / ``null`` (case-insensitive) - Multiple values on one line: separated by ``;`` --- Worked example — adding a ``min_ac`` allele-count filter ---------------------------------------------------------- This example adds an integer option ``min_ac`` (minimum allele count) that filters variants during extraction. **constants.py:** .. code-block:: python class CONFIG_LHS: MinAC = 'min_ac' # <-- new class OPTIONS: min_ac: int = 1 # <-- new (default: no filtering below 1) **config_tools.py — PipelineConfig:** .. code-block:: python @dataclass class PipelineConfig: min_ac: int = field(default=OPTIONS.min_ac) # <-- new @classmethod def from_config_dict(cls, config): options = config.get(CONFIG_HEADERS.Options, {}) min_ac = int(options.get(CONFIG_LHS.MinAC, OPTIONS.min_ac)) # <-- new return cls(..., min_ac=min_ac) **config_tools.py — check_config_file():** .. code-block:: python if config.min_ac < 1: errors.append( f"[Options] '{CONFIG_LHS.MinAC}' must be >= 1, " f"got {config.min_ac}." ) **pipeline.py — VariantExtractionStep:** .. code-block:: python col_config = { ... 'min_ac': self._config.min_ac, # <-- new } **example.cnf:** .. code-block:: ini [Options] extract_variants True # min_ac 1 # Minimum allele count — variants below this are dropped