Evaluation#

Evaluate the performance of the trained model.

This is done by computing quantities like the Bayesian information criterion (BIC) or (if thermodynamic integration was performed) the actual evidence (with error) of the model.

lyscripts.evaluate.comp_bic(log_probs: ndarray, num_params: int, num_data: int) → float[source]#

Compute the negative one half of the Bayesian Information Criterion (BIC).

The BIC is defined as [^1] $$ BIC = k \ln{n} - 2 \ln{\hat{L}} $$ where $k$ is the number of parameters num_params, $n$ the number of datapoints num_data and $\hat{L}$ the maximum likelihood estimate of the log_prob. It is constructed such that the following is an approximation of the model evidence: $$ p(D \mid m) \approx \exp{\left( - BIC / 2 \right)} $$ which is why this function returns the negative one half of it.

[^1]: https://en.wikipedia.org/wiki/Bayesian_information_criterion

lyscripts.evaluate.compute_evidence(temp_schedule: ndarray, log_probs: ndarray) → float[source]#

Compute the evidence.

Given a temp_schedule of inverse temperatures and corresponding sets of log_probs, we calculate the mean log_prob over all samples to approximate the expectation value under the corresponding power posterior for each step in the temp_schedule. The evidence is evaluated using trapezoidal integration of the expectation values over the temp_schedule.

lyscripts.evaluate.compute_ti_results(metrics: dict, params: dict, ndim: int, h5_file: Path, model: Path) → tuple[ndarray, ndarray][source]#: Compute the results in case of a thermodynamic integration run.

lyscripts.evaluate.main(args: Namespace)[source]#: Run main script.

Evaluation

Contents

Evaluation#

Command Help#