Pydantic Configurations#
Using pydantic, we define configurations for the package.
Most importantly, these configurations are part of the CLIs that the package provides.
but they also help with programmatically validating and constructing various objects.
Maybe most importantly, the GraphConfig and ModelConfig may be
used to precisely and reproducibly define how the function construct_model()
should create lymphatic progression models.
- pydantic model lyscripts.configs.CrossValidationConfig[source]#
Bases:
BaseModelConfigs for splitting a dataset into cross-validation folds.
Show JSON schema
{ "title": "CrossValidationConfig", "description": "Configs for splitting a dataset into cross-validation folds.", "type": "object", "properties": { "seed": { "default": 42, "description": "Seed for the random number generator.", "title": "Seed", "type": "integer" }, "folds": { "default": 5, "description": "Number of folds to split the dataset into.", "title": "Folds", "type": "integer" } } }
- pydantic model lyscripts.configs.DataConfig[source]#
Bases:
BaseModelWhere to load lymphatic progression data from and how to feed it into a model.
Show JSON schema
{ "title": "DataConfig", "description": "Where to load lymphatic progression data from and how to feed it into a model.", "type": "object", "properties": { "source": { "anyOf": [ { "format": "file-path", "type": "string" }, { "$ref": "#/$defs/LyDataset" } ], "description": "Either a path to a CSV file or a config that specifies how and where to fetch the data from.", "title": "Source" }, "side": { "anyOf": [ { "enum": [ "ipsi", "contra" ], "type": "string" }, { "type": "null" } ], "default": null, "description": "Side of the neck to load data for. Only for Unilateral models.", "title": "Side" }, "mapping": { "additionalProperties": { "anyOf": [ { "type": "integer" }, { "type": "string" } ] }, "description": "Optional mapping of numeric T-stages to model T-stages.", "title": "Mapping", "type": "object" } }, "$defs": { "LyDataset": { "description": "Specification of a dataset.", "properties": { "year": { "description": "Release year of dataset.", "exclusiveMinimum": 0, "maximum": 2026, "title": "Year", "type": "integer" }, "institution": { "description": "Institution's short code. E.g., University Hospital Zurich: `usz`.", "minLength": 1, "title": "Institution", "type": "string" }, "subsite": { "description": "Tumor subsite(s) patients in this dataset were diagnosed with.", "minLength": 1, "title": "Subsite", "type": "string" }, "repo_name": { "anyOf": [ { "minLength": 1, "type": "string" }, { "type": "null" } ], "default": "lycosystem/lydata", "description": "GitHub `repository/owner`.", "title": "Repo Name" }, "ref": { "anyOf": [ { "minLength": 1, "type": "string" }, { "type": "null" } ], "default": "main", "description": "Branch/tag/commit of the repo.", "title": "Ref" }, "local_dataset_dir": { "anyOf": [ { "format": "directory-path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to directory containing all the dataset subdirectories. So, e.g. if `path_on_disk` is `~/datasets` and the dataset is `2023-clb-multisite`, then the CSV file is expected to be at `~/datasets/2023-clb-multisite/data.csv`.", "title": "Local Dataset Dir" } }, "required": [ "year", "institution", "subsite" ], "title": "LyDataset", "type": "object" } }, "required": [ "source" ] }
- field source: FilePath | LyDataset [Required]#
Either a path to a CSV file or a config that specifies how and where to fetch the data from.
- field side: Literal['ipsi', 'contra'] | None = None#
Side of the neck to load data for. Only for Unilateral models.
- lyscripts.configs.check_pattern(value: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]) Any[source]#
Check if the value can be converted to a boolean value.
- pydantic model lyscripts.configs.DiagnosisConfig[source]#
Bases:
BaseModelDefines an ipsi- and contralateral diagnosis pattern.
Show JSON schema
{ "title": "DiagnosisConfig", "description": "Defines an ipsi- and contralateral diagnosis pattern.", "type": "object", "properties": { "ipsi": { "additionalProperties": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "type": "object" }, "default": {}, "description": "Observed diagnoses by different modalities on the ipsi neck.", "examples": [ { "CT": { "II": true, "III": false } } ], "title": "Ipsi", "type": "object" }, "contra": { "additionalProperties": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "type": "object" }, "default": {}, "description": "Observed diagnoses by different modalities on the contra neck.", "title": "Contra", "type": "object" } } }
- field ipsi: dict[str, Annotated[PatternType, AfterValidator(check_pattern)]] = {}#
Observed diagnoses by different modalities on the ipsi neck.
- field contra: dict[str, Annotated[PatternType, AfterValidator(check_pattern)]] = {}#
Observed diagnoses by different modalities on the contra neck.
- to_involvement(modality: str) InvolvementConfig[source]#
Convert the diagnosis pattern to an involvement pattern for
modality.
- pydantic model lyscripts.configs.DistributionConfig[source]#
Bases:
BaseModelConfiguration defining a distribution over diagnose times.
Show JSON schema
{ "title": "DistributionConfig", "description": "Configuration defining a distribution over diagnose times.", "type": "object", "properties": { "kind": { "default": "frozen", "description": "Parametric distributions may be updated.", "enum": [ "frozen", "parametric" ], "title": "Kind", "type": "string" }, "func": { "const": "binomial", "default": "binomial", "description": "Name of predefined function to use as distribution.", "title": "Func", "type": "string" }, "params": { "additionalProperties": { "anyOf": [ { "type": "integer" }, { "type": "number" } ] }, "default": {}, "description": "Parameters to pass to the predefined function.", "title": "Params", "type": "object" } } }
- field kind: Literal['frozen', 'parametric'] = 'frozen'#
Parametric distributions may be updated.
- field func: FuncNameType = 'binomial'#
Name of predefined function to use as distribution.
- pydantic model lyscripts.configs.InvolvementConfig[source]#
Bases:
BaseModelConfig that defines an ipsi- and contralateral involvement pattern.
Show JSON schema
{ "title": "InvolvementConfig", "description": "Config that defines an ipsi- and contralateral involvement pattern.", "type": "object", "properties": { "ipsi": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "default": {}, "description": "Involvement pattern for the ipsilateral side of the neck.", "examples": [ { "II": true, "III": false } ], "title": "Ipsi", "type": "object" }, "contra": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "default": {}, "description": "Involvement pattern for the contralateral side of the neck.", "title": "Contra", "type": "object" } } }
- field ipsi: Annotated[PatternType, AfterValidator(check_pattern)] = {}#
Involvement pattern for the ipsilateral side of the neck.
- field contra: Annotated[PatternType, AfterValidator(check_pattern)] = {}#
Involvement pattern for the contralateral side of the neck.
- lyscripts.configs.retrieve_graph_representation(model: Model) Representation[source]#
Retrieve the graph representation from a model.
- pydantic model lyscripts.configs.GraphConfig[source]#
Bases:
BaseModelSpecifies how the tumor(s) and LNLs are connected in a DAG.
Show JSON schema
{ "title": "GraphConfig", "description": "Specifies how the tumor(s) and LNLs are connected in a DAG.", "type": "object", "properties": { "tumor": { "additionalProperties": { "items": { "type": "string" }, "type": "array" }, "description": "Define the name of the tumor(s) and which LNLs it/they drain to.", "title": "Tumor", "type": "object" }, "lnl": { "additionalProperties": { "items": { "type": "string" }, "type": "array" }, "description": "Define the name of the LNL(s) and which LNLs it/they drain to.", "title": "Lnl", "type": "object" } }, "required": [ "tumor", "lnl" ] }
- field tumor: dict[str, list[str]] [Required]#
Define the name of the tumor(s) and which LNLs it/they drain to.
- field lnl: dict[str, list[str]] [Required]#
Define the name of the LNL(s) and which LNLs it/they drain to.
- classmethod from_model(model: Model) GraphConfig[source]#
Create a
GraphConfigfrom aModel.
- lyscripts.configs.has_model_symbol(path: Path) Path[source]#
Check if the Python file at
pathdefines a symbol namedmodel.
- lyscripts.configs.get_symmetry_kwargs(model: Model) dict[str, Any][source]#
Get the symmetry kwargs from a model.
- pydantic model lyscripts.configs.ModelConfig[source]#
Bases:
BaseModelDefine which of the
lymphmodels to use and how to set them up.Show JSON schema
{ "title": "ModelConfig", "description": "Define which of the ``lymph`` models to use and how to set them up.", "type": "object", "properties": { "external_file": { "anyOf": [ { "format": "file-path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to a Python file that defines a model.", "title": "External File" }, "class_name": { "default": "Unilateral", "description": "Name of the model class to use.", "enum": [ "Unilateral", "Bilateral", "Midline" ], "title": "Class Name", "type": "string" }, "constructor": { "default": "binary", "description": "Trinary models differentiate btw. micro- and macroscopic disease.", "enum": [ "binary", "trinary" ], "title": "Constructor", "type": "string" }, "max_time": { "default": 10, "description": "Max. number of time-steps to evolve the model over.", "title": "Max Time", "type": "integer" }, "named_params": { "default": null, "description": "Subset of valid model parameters a sampler may provide in the form of a dictionary to the model instead of as an array. Or, after sampling, with this list, one may safely recover which parameter corresponds to which index in the sample.", "items": { "type": "string" }, "title": "Named Params", "type": "array" }, "kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments to pass to the model constructor.", "title": "Kwargs", "type": "object" } } }
- field external_file: Annotated[FilePath, AfterValidator(has_model_symbol)] | None = None#
Path to a Python file that defines a model.
- field class_name: Literal['Unilateral', 'Bilateral', 'Midline'] = 'Unilateral'#
Name of the model class to use.
- field constructor: Literal['binary', 'trinary'] = 'binary'#
Trinary models differentiate btw. micro- and macroscopic disease.
- field named_params: Sequence[str] = None#
Subset of valid model parameters a sampler may provide in the form of a dictionary to the model instead of as an array. Or, after sampling, with this list, one may safely recover which parameter corresponds to which index in the sample.
- classmethod from_model(model: Model) ModelConfig[source]#
Create a
ModelConfigfrom aModel.
- lyscripts.configs.modalityconfig_from_model(model: Model, modality_name: str) ModalityConfig[source]#
Create a
ModalityConfigfrom aModel.
- pydantic model lyscripts.configs.DeprecatedModelConfig[source]#
Bases:
BaseModelModel configuration prior to
lyscriptsmajor version 1.This is implemented for backwards compatibility. Its sole job is to translate the outdated settings format into the new one. Note that the only stuff that needs to be translated is the model configuration itself and the distributions for marginalization over diagnosis times. The
GraphConfigis still compatible.Show JSON schema
{ "title": "DeprecatedModelConfig", "description": "Model configuration prior to ``lyscripts`` major version 1.\n\nThis is implemented for backwards compatibility. Its sole job is to translate\nthe outdated settings format into the new one. Note that the only stuff that needs\nto be translated is the model configuration itself and the distributions for\nmarginalization over diagnosis times. The :py:class:`~GraphConfig` is still\ncompatible.", "type": "object", "properties": { "first_binom_prob": { "description": "Fixed parameter for first binomial dist over diagnosis times.", "maximum": 1.0, "minimum": 0.0, "title": "First Binom Prob", "type": "number" }, "max_t": { "description": "Max. number of time-steps to evolve the model over.", "exclusiveMinimum": 0, "title": "Max T", "type": "integer" }, "t_stages": { "description": "List of T-stages to marginalize over in the scenario. The old format assumed all T-stages except the first one to be parametric. Only binomial distributions are supported.", "items": { "anyOf": [ { "type": "integer" }, { "type": "string" } ] }, "title": "T Stages", "type": "array" }, "class": { "description": "Name of the model class. Only binary models are supported.", "enum": [ "Unilateral", "Bilateral", "Midline", "MidlineBilateral" ], "title": "Class", "type": "string" }, "kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments to pass to the model constructor.", "title": "Kwargs", "type": "object" } }, "required": [ "first_binom_prob", "max_t", "t_stages", "class" ] }
- field first_binom_prob: float [Required]#
Fixed parameter for first binomial dist over diagnosis times.
- field t_stages: list[int | str] [Required]#
List of T-stages to marginalize over in the scenario. The old format assumed all T-stages except the first one to be parametric. Only binomial distributions are supported.
- field class_: Literal['Unilateral', 'Bilateral', 'Midline', 'MidlineBilateral'] [Required] (alias 'class')#
Name of the model class. Only binary models are supported.
- translate() tuple[ModelConfig, dict[int | str, DistributionConfig]][source]#
Translate the deprecated model config to the new format.
- pydantic model lyscripts.configs.SamplingConfig[source]#
Bases:
BaseModelSettings to configure the MCMC sampling.
Show JSON schema
{ "title": "SamplingConfig", "description": "Settings to configure the MCMC sampling.", "type": "object", "properties": { "storage_file": { "description": "Path to HDF5 file store results or load last state.", "format": "path", "title": "Storage File", "type": "string" }, "history_file": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "description": "Path to store the burn-in metrics (as CSV file).", "title": "History File" }, "dataset": { "default": "mcmc", "description": "Name of the dataset in the HDF5 file.", "title": "Dataset", "type": "string" }, "cores": { "anyOf": [ { "exclusiveMinimum": 0, "type": "integer" }, { "type": "null" } ], "default": 2, "description": "Number of cores to use for parallel sampling. If `None`, no parallel processing is used.", "title": "Cores" }, "seed": { "default": 42, "description": "Seed for the random number generator.", "title": "Seed", "type": "integer" }, "walkers_per_dim": { "default": 20, "description": "Number of walkers per parameter space dimension.", "title": "Walkers Per Dim", "type": "integer" }, "check_interval": { "default": 50, "description": "Check for convergence each time after this many steps.", "title": "Check Interval", "type": "integer" }, "trust_factor": { "default": 50.0, "description": "Trust the autocorrelation time only when it's smaller than this factor times the length of the chain.", "title": "Trust Factor", "type": "number" }, "relative_thresh": { "default": 0.05, "description": "Relative threshold for convergence.", "title": "Relative Thresh", "type": "number" }, "burnin_steps": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Number of burn-in steps to take. If None, burn-in runs until convergence.", "title": "Burnin Steps" }, "num_steps": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": 100, "description": "Number of steps to take in the MCMC sampling.", "title": "Num Steps" }, "thin_by": { "default": 10, "description": "How many samples to draw before for saving one.", "title": "Thin By", "type": "integer" }, "inverse_temp": { "default": 1.0, "description": "Inverse temperature for thermodynamic integration. Note that this is not yet fully implemented.", "title": "Inverse Temp", "type": "number" } }, "required": [ "storage_file" ] }
- field storage_file: Path [Required]#
Path to HDF5 file store results or load last state.
- field cores: int | None = 2#
Number of cores to use for parallel sampling. If None, no parallel processing is used.
- field trust_factor: float = 50.0#
Trust the autocorrelation time only when it’s smaller than this factor times the length of the chain.
- field burnin_steps: int | None = None#
Number of burn-in steps to take. If None, burn-in runs until convergence.
- lyscripts.configs.geometric_schedule(num: int, *_a) ndarray[source]#
Create a geometric sequence of
numnumbers from 0 to 1.
- lyscripts.configs.linear_schedule(num: int, *_a) ndarray[source]#
Create a linear sequence of
numnumbers from 0 to 1.Equivalent to the
power_schedule()withpower=1.
- lyscripts.configs.power_schedule(num: int, power: float, *_a) ndarray[source]#
Create a power sequence of
numnumbers from 0 to 1.This is essentially a
linear_schedule()ofnumnumbers from 0 to 1, but each number is raised to the power ofpower.
- pydantic model lyscripts.configs.ScheduleConfig[source]#
Bases:
BaseModelConfiguration for generating a schedule of inverse temperatures.
Show JSON schema
{ "title": "ScheduleConfig", "description": "Configuration for generating a schedule of inverse temperatures.", "type": "object", "properties": { "method": { "default": "power", "description": "Method to generate the inverse temperature schedule.", "enum": [ "geometric", "linear", "power" ], "title": "Method", "type": "string" }, "num": { "default": 32, "description": "Number of inverse temperatures in the schedule.", "title": "Num", "type": "integer" }, "power": { "default": 4.0, "description": "If a power schedule is chosen, use this as power.", "title": "Power", "type": "number" }, "values": { "anyOf": [ { "items": { "type": "number" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "List of inverse temperatures to use instead of generating a schedule. If a list is provided, the other parameters are ignored.", "title": "Values" } } }
- field method: Literal['geometric', 'linear', 'power'] = 'power'#
Method to generate the inverse temperature schedule.
- lyscripts.configs.map_to_optional_bool(value: Any) Any[source]#
Try to convert the options in the PatternType to a boolean value.
- pydantic model lyscripts.configs.ScenarioConfig[source]#
Bases:
BaseModelDefine a scenario for which e.g. prevalences and risks may be computed.
Show JSON schema
{ "title": "ScenarioConfig", "description": "Define a scenario for which e.g. prevalences and risks may be computed.", "type": "object", "properties": { "t_stages": { "description": "List of T-stages to marginalize over in the scenario.", "examples": [ [ "early" ], [ 3, 4 ] ], "items": { "anyOf": [ { "type": "integer" }, { "type": "string" } ] }, "title": "T Stages", "type": "array" }, "t_stages_dist": { "default": [ 1.0 ], "description": "Distribution over T-stages to use for marginalization.", "examples": [ [ 1.0 ], [ 0.6, 0.4 ] ], "items": { "type": "number" }, "title": "T Stages Dist", "type": "array" }, "midext": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": null, "description": "Whether the patient's tumor extends over the midline.", "title": "Midext" }, "mode": { "default": "HMM", "description": "Which underlying model architecture to use.", "enum": [ "HMM", "BN" ], "title": "Mode", "type": "string" }, "involvement": { "$ref": "#/$defs/InvolvementConfig", "default": { "ipsi": {}, "contra": {} } }, "diagnosis": { "$ref": "#/$defs/DiagnosisConfig", "default": { "ipsi": {}, "contra": {} } } }, "$defs": { "DiagnosisConfig": { "description": "Defines an ipsi- and contralateral diagnosis pattern.", "properties": { "ipsi": { "additionalProperties": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "type": "object" }, "default": {}, "description": "Observed diagnoses by different modalities on the ipsi neck.", "examples": [ { "CT": { "II": true, "III": false } } ], "title": "Ipsi", "type": "object" }, "contra": { "additionalProperties": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "type": "object" }, "default": {}, "description": "Observed diagnoses by different modalities on the contra neck.", "title": "Contra", "type": "object" } }, "title": "DiagnosisConfig", "type": "object" }, "InvolvementConfig": { "description": "Config that defines an ipsi- and contralateral involvement pattern.", "properties": { "ipsi": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "default": {}, "description": "Involvement pattern for the ipsilateral side of the neck.", "examples": [ { "II": true, "III": false } ], "title": "Ipsi", "type": "object" }, "contra": { "additionalProperties": { "anyOf": [ { "enum": [ false, 0, "healthy", true, 1, "involved", "micro", "macro", "notmacro" ] }, { "type": "null" } ] }, "default": {}, "description": "Involvement pattern for the contralateral side of the neck.", "title": "Contra", "type": "object" } }, "title": "InvolvementConfig", "type": "object" } }, "required": [ "t_stages" ] }
- field mode: Literal['HMM', 'BN'] = 'HMM'#
Which underlying model architecture to use.
- field involvement: InvolvementConfig = InvolvementConfig(ipsi={}, contra={})#
- field diagnosis: DiagnosisConfig = DiagnosisConfig(ipsi={}, contra={})#
- lyscripts.configs.construct_model(model_config: ModelConfig, graph_config: GraphConfig) Model[source]#
Construct a model from a
model_config.The default/expected use of this is to specify a model class from the lymph package and pass the necessary arguments to its constructor. However, it is also possible to load a model from an external Python file via the
externalattribute of themodel_configargument. In this case, a symbol with namemodelmust be defined in the file that is to be loaded.Note
No check is performed on the model’s compatibility with the command/pipeline it is used in. It is assumed the model complies with the
model typespecifications of the lymph package.
- lyscripts.configs.add_distributions(model: Model, configs: dict[str | int, DistributionConfig], mapping: dict[Literal['binomial'], Callable] | None = None, inplace: bool = False) Model[source]#
Construct and add distributions over diagnose times to a
model.
- lyscripts.configs.add_modalities(model: Model, modalities: dict[str, ModalityConfig], inplace: bool = False) Model[source]#
Add
modalitiesto amodel.
- lyscripts.configs.add_data(model: Model, path: Path, side: Literal['ipsi', 'contra'], mapping: dict[Literal[0, 1, 2, 3, 4], int | str] | None = None, inplace: bool = False) Model[source]#
Add data to a
model.
- class lyscripts.configs.DynamicYamlConfigSettingsSource(settings_cls, yaml_file: Path | str | Sequence[Path | str] | None = PosixPath('.'), yaml_file_encoding: str | None = None, yaml_file_path_field: str = 'configs')[source]#
Bases:
YamlConfigSettingsSourceYAML config source that allows dynamic file path specification.
This is heavily inspired by this comment in the discussion on a related issue of the pydantic-settings GitHub repository.
Essentially, this little hack allows a user to specify a one or multiple YAML files from which the CLI should read configurations. Normally, pydantic-settings only allows hard-coding the location of these config files.
- pydantic settings lyscripts.configs.BaseCLI[source]#
Bases:
BaseSettingsBase settings class for all CLI scripts to inherit from.
Show JSON schema
{ "title": "BaseCLI", "description": "Base settings class for all CLI scripts to inherit from.", "type": "object", "properties": { "configs": { "default": [ "config.yaml" ], "description": "Path to the YAML file(s) that contain the configuration(s). Configs from YAML files may be overwritten by command line arguments. When multiple files are specified, the configs are merged in the order they are given. Note that every config file must have a `version: 1` key in it.", "items": { "format": "path", "type": "string" }, "title": "Configs", "type": "array" } } }
- field configs: list[Path] = ['config.yaml']#
Path to the YAML file(s) that contain the configuration(s). Configs from YAML files may be overwritten by command line arguments. When multiple files are specified, the configs are merged in the order they are given. Note that every config file must have a version: 1 key in it.