Map to LyProX Format#
Consumes raw data and transforms it into a CSV that LyProX understands.
To do so, it needs a dictionary that defines a mapping from raw columns to the LyProX
style data format. See the documentation of the transform_to_lyprox() function
for more information.
- lyscripts.data.lyproxify.ensure_python_file(file: Path) Path[source]#
Check if the file is a Python file.
- lyscripts.data.lyproxify.ensure_column_map(file: Path) Path[source]#
Ensure the Python file contains a
COLUMN_MAPdictionary.
- pydantic settings lyscripts.data.lyproxify.LyproxifyCLI[source]#
Bases:
BaseCLIMap any CSV file to the LyProX format with the help of a Python mapping dict.
Show JSON schema
{ "title": "LyproxifyCLI", "description": "Map any CSV file to the LyProX format with the help of a Python mapping dict.", "type": "object", "properties": { "configs": { "default": [ "config.yaml" ], "description": "Path to the YAML file(s) that contain the configuration(s). Configs from YAML files may be overwritten by command line arguments. When multiple files are specified, the configs are merged in the order they are given. Note that every config file must have a `version: 1` key in it.", "items": { "format": "path", "type": "string" }, "title": "Configs", "type": "array" }, "input_file": { "description": "Location of raw CSV data.", "format": "file-path", "title": "Input File", "type": "string" }, "num_header_rows": { "default": 1, "description": "Number of rows comprising the header of the raw CSV file.", "title": "Num Header Rows", "type": "integer" }, "mapping_file": { "description": "Location of Python file containing a `COLUMN_MAP` dictionary. It may also contain an `EXCLUDE` list of tuples `(column, check)` to exclude patients.", "format": "file-path", "title": "Mapping File", "type": "string" }, "drop_rows": { "default": [], "description": "Delete rows of specified indices. Counting of rows start at 0 _after_ the `header-rows`.", "items": { "type": "integer" }, "title": "Drop Rows", "type": "array" }, "drop_cols": { "default": [], "description": "Delete columns of specified indices.", "items": { "type": "integer" }, "title": "Drop Cols", "type": "array" }, "output_file": { "description": "Location to store the lyproxified CSV file.", "format": "path", "title": "Output File", "type": "string" } }, "required": [ "input_file", "mapping_file", "output_file" ] }
- field mapping_file: Annotated[Path, PathType(path_type=file), AfterValidator(func=ensure_python_file), AfterValidator(func=ensure_column_map)] [Required]#
Location of Python file containing a COLUMN_MAP dictionary. It may also contain an EXCLUDE list of tuples (column, check) to exclude patients.
- field drop_rows: list[int] = []#
Delete rows of specified indices. Counting of rows start at 0 _after_ the header-rows.
- cli_cmd() None[source]#
Start the
lyproxifysubcommand.After reading in the specified file, it will first
drop_rowsanddrop_cols, as specified in the command line arguments. Then, it will callexclude_patients()which will further remove patients based on theEXCLUDEobject in themapping_file. Finally, it will calltransform_to_lyprox()to transform the data into the LyProX format given theCOLUMN_MAPobject in themapping_file.
- exception lyscripts.data.lyproxify.ParsingError[source]#
Bases:
ExceptionError while parsing the CSV file.
- lyscripts.data.lyproxify.clean_header(table: DataFrame, num_cols: int, num_header_rows: int) DataFrame[source]#
Rename the header cells in the
table.
- lyscripts.data.lyproxify.get_instruction_depth(nested_column_map: dict[tuple, dict[str, Any]]) int[source]#
Get the depth at which the column mapping instructions are nested.
Instructions are a dictionary that contains either a ‘func’ or ‘default’ key.
>>> nested_column_map = {"patient": {"age": {"func": int}}} >>> get_instruction_depth(nested_column_map) 2 >>> flat_column_map = flatten(nested_column_map, max_depth=2) >>> get_instruction_depth(flat_column_map) 1 >>> nested_column_map = {"patient": {"__doc__": "some patient info", "age": 61}} >>> get_instruction_depth(nested_column_map) Traceback (most recent call last): ... ValueError: Leaf of column map must be a dictionary with 'func' or 'default' key.
- lyscripts.data.lyproxify.generate_markdown_docs(nested_column_map: dict[tuple, dict[str, Any]], depth: int = 0, indent_len: int = 4) str[source]#
Generate a markdown nested, ordered list as documentation for the column map.
A key in the doctionary is supposed to be documented, when its value is a dictionary containing a
"__doc__"key.>>> nested_column_map = { ... "patient": { ... "__doc__": "some patient info", ... "age": { ... "__doc__": "age of the patient", ... "func": int, ... "columns": ["age"], ... }, ... }, ... } >>> generate_markdown_docs(nested_column_map) '1. **`patient:`** some patient info\n 1. **`age:`** age of the patient\n'
- lyscripts.data.lyproxify.transform_to_lyprox(raw: DataFrame, column_map: dict[tuple, dict[str, Any]]) DataFrame[source]#
Transform
rawdata into table that can be uploaded directly to LyProX.To do so, it uses instructions in the colum_map dictionary, that needs to have a particular structure:
For each column in the final ‘lyproxified’ pd.DataFrame, one entry must exist in the column_map dictionary. E.g., for the column corresponding to a patient’s age, the dictionary should contain a key-value pair of this shape:
column_map = { ("patient", "core", "age"): { "func": compute_age_from_raw, "kwargs": {"randomize": False}, "columns": ["birthday", "date of diagnosis"] }, }
In this example, the function
compute_age_from_rawis called with the values of the columns"birthday"and"date of diagnosis"as positional arguments, and the keyword argument"randomize"is set toFalse. The function then returns the patient’s age, which is subsequently stored in the column("patient", "core", "age").Note that the
column_mapdictionary must have either a"default"key or"func"along with"columns"and"kwargs", depending on the function definition. If the function does not take any arguments,"columns"can be omitted. If it also does not take any keyword arguments,"kwargs"can be omitted, too.
- lyscripts.data.lyproxify.leftright_to_ipsicontra(data: DataFrame)[source]#
Change absolute side reporting to tumor-relative.
Transform reporting of LNL involvement by absolute side (right & left) to a reporting relative to the tumor (ipsi- & contralateral). The table
datashould already be in the format LyProX requires, except for the side-reporting of LNL involvement.
- lyscripts.data.lyproxify.exclude_patients(raw: DataFrame, exclude: list[tuple[str, Any]])[source]#
Exclude patients in the
rawdata based on a list of what toexclude.The
excludelist contains tuples(column, check). Thecheckfunction will then exclude any patients from the cohort wherecheck(raw[column])evaluates toTrue.>>> exclude = [("age", lambda s: s > 50)] >>> table = pd.DataFrame({ ... "age": [43, 82, 18, 67], ... "T-category": [ 3, 4, 2, 1], ... }) >>> exclude_patients(table, exclude) age T-category 0 43 3 2 18 2
Command Help#
Usage: lyscripts data lyproxify [-h] [--configs list[Path]]
[--input-file Path] [--num-header-rows int]
[--mapping-file Path] [--drop-rows list[int]]
[--drop-cols list[int]] [--output-file Path]
Map any CSV file to the LyProX format with the help of a Python mapping dict.
Options:
-h, --help show this help message and exit
--configs list[Path] Path to the YAML file(s) that contain the
configuration(s). Configs from YAML files may be
overwritten by command line arguments. When multiple
files are specified, the configs are merged in the
order they are given. Note that every config file must
have a `version: 1` key in it. (default:
['config.yaml'])
--input-file Path Location of raw CSV data. (required)
--num-header-rows int
Number of rows comprising the header of the raw CSV
file. (default: 1)
--mapping-file Path Location of Python file containing a `COLUMN_MAP`
dictionary. It may also contain an `EXCLUDE` list of
tuples `(column, check)` to exclude patients.
(required)
--drop-rows list[int]
Delete rows of specified indices. Counting of rows
start at 0 _after_ the `header-rows`. (default: [])
--drop-cols list[int]
Delete columns of specified indices. (default: [])
--output-file Path Location to store the lyproxified CSV file. (required)