Evaluation#
Evaluate the performance of the trained model.
This is done by computing quantities like the Bayesian information criterion (BIC) or (if thermodynamic integration was performed) the actual evidence (with error) of the model.
- lyscripts.evaluate.comp_bic(log_probs: ndarray, num_params: int, num_data: int) float[source]#
Compute the negative one half of the Bayesian Information Criterion (BIC).
The BIC is defined as [^1] $$ BIC = k \ln{n} - 2 \ln{\hat{L}} $$ where $k$ is the number of parameters
num_params, $n$ the number of datapointsnum_dataand $\hat{L}$ the maximum likelihood estimate of thelog_prob. It is constructed such that the following is an approximation of the model evidence: $$ p(D \mid m) \approx \exp{\left( - BIC / 2 \right)} $$ which is why this function returns the negative one half of it.[^1]: https://en.wikipedia.org/wiki/Bayesian_information_criterion
- lyscripts.evaluate.compute_evidence(temp_schedule: ndarray, log_probs: ndarray) float[source]#
Compute the evidence.
Given a
temp_scheduleof inverse temperatures and corresponding sets oflog_probs, we calculate the meanlog_probover all samples to approximate the expectation value under the corresponding power posterior for each step in thetemp_schedule. The evidence is evaluated using trapezoidal integration of the expectation values over thetemp_schedule.
- lyscripts.evaluate.compute_ti_results(metrics: dict, params: dict, ndim: int, h5_file: Path, model: Path) tuple[ndarray, ndarray][source]#
Compute the results in case of a thermodynamic integration run.