From model outputs#

Perhaps you have already trained your model, and you are looking for calibrated uncertainty estimates starting from your model outputs.

If that is the case, you are in the right place. In this scenario, Fortuna can calibrate your model outputs, estimate uncertainty, compute metrics and obtain conformal prediction sets. Your model may have been written and trained in any language; Fortuna just needs model outputs and target variables in numpy.ndarray or jax.numpy.ndarray formats..

Classification#

Let us show how to calibrate model outputs, estimate uncertainty, compute metrics and conformal sets in classification.

Build a calibration classifier#

First, let us build a calibration classifier. This defines the output calibrator to attach to the model output, and the final probabilistic output layer used for calibration and to compute predictive statistics. The default output calibrator is temperature scaling, that is what we use in this example.

from fortuna.output_calib_model import OutputCalibClassifier
output_calib_model = OutputCalibClassifier()

Calibrate the model outputs#

Let’s calibrate the model outputs. Fortuna needs an array of model outputs computed over some calibration inputs, and a corresponding array of calibration target variables. We denote these as calib_outputs and calib_targets, respectively. You can configure the calibration process using a calibration configuration object. In this example, we will stick with the default configuration options.

status = output_calib_model.calibrate(
    calib_outputs=calib_outputs,
    calib_targets=calib_targets
)

Estimate statistics#

Given some test model outputs test_outputs, and potentially an array of test target variables test_targets, we are ready to estimate predictive statistics. These include predictive mode, mean, log-pdf, variance, entropy, etc; please consult the predictive reference.

Note

In classification, the predictive mode gives label predictions, i.e. the label predicted for a certain input, while the predictive mean gives probability predictions, i.e. the probability of each label.

References: log_prob(), mode(), mean()#
test_logprob = output_calib_model.predictive.log_prob(
    outputs=test_outputs, targets=test_targets
)
test_modes = output_calib_model.predictive.mode(
    outputs=test_outputs
)
test_means = output_calib_model.predictive.mean(
    outputs=test_outputs
)

Compute metrics#

Fortuna supports some classification metrics, e.g. accuracy, expected calibration error and Brier score. You are encouraged to bring in metrics from other frameworks and apply them on Fortuna’s predictions, as the latter are compatible with metrics operating on numpy.ndarray.

from fortuna.metric.classification import accuracy, expected_calibration_error
acc = accuracy(
    preds=test_modes,
    targets=test_targets
)
ece = expected_calibration_error(
    preds=test_modes,
    probs=test_means,
    targets=test_targets
)

Compute conformal sets#

Finally, like in Classification, starting from predictive statistics you can compute conformal sets. Again, we need model outputs and data for this purpose. We denote val_outputs to be validation model outputs, and val_targets to be the corresponding validation target variables.

References: conformal_set()#
from fortuna.conformal import AdaptivePredictionConformalClassifier
val_means = calib_model.predictive.mean(
    outputs=val_outputs
)
conformal_sets = AdaptivePredictionConformalClassifier().conformal_set(
    val_probs=val_means,
    test_probs=test_means,
    val_targets=val_targets,
    error=0.05
)

Regression#

Similarly as in the classification example, let us show how to calibrate model outputs, estimate uncertainty, compute metrics and obtain conformal intervals in regression.

Note

In regression, Fortuna requires model outputs to be concatenations of mean and log-variance models of a Gaussian likelihood function. Mathematically, suppose that \(\mu(\theta, x)\) is the mean model, \(\sigma^2(\theta, x)\) is a variance model, and \(N\Big(y|\mu(\theta, x), \sigma^2(\theta, x)\Big)\) is likelihood function, where \(\theta\) are model parameters, \(x\) is an inputs variable and \(y\) is an output variable. Then model outputs should be concatenations \([\mu(\theta, x), \log\sigma^2(\theta, x)]\), for each input.

Build a calibration regressor#

First, let us build a calibration regressor. This defines the output calibrator to attach to the model output, and the final probabilistic output layer used for calibration and to compute predictive statistics. The default output calibrator is temperature scaling, that is what we use in this example.

References: OutputCalibRegressor#
from fortuna.calib_model import CalibRegression
output_calib_model = OutputCalibRegressor()

Calibrate the model outputs#

Let’s calibrate the model outputs. Fortuna needs an array of model outputs computed over some calibration inputs, and a corresponding array of calibration target variables. We denote these as calib_outputs and calib_targets, respectively. You can configure the calibration process using a calibration configuration object. In this example, we will stick with the default configuration options.

status = output_calib_model.calibrate(
    calib_outputs=calib_outputs,
    calib_targets=calib_targets
)

Estimate statistics#

Given some test model outputs test_outputs, and potentially an array of test target variables test_targets, we are ready to estimate predictive statistics. These include predictive mode, mean, log-pdf, variance, entropy, etc; please consult the predictive reference.

Note

In contrast with classification, in regression both the predictive mean and the predictive mode provide predictions for the target variables, and do not represent measures of uncertainty.

test_logprob = output_calib_model.predictive.log_prob(
    outputs=test_outputs, targets=test_targets
)
test_means = output_calib_model.predictive.mean(
    outputs=test_outputs
)
test_cred_intervals = output_calib_model.predictive.credible_interval(
    outputs=test_outputs
)

Compute metrics#

Fortuna supports some regression metrics, e.g. Root Mean-Squared Error (RMSE) and Prediction Interval Coverage Probability (PICP). You are encouraged to bring in metrics from other frameworks and apply them on Fortuna’s predictions, as the latter are compatible with metrics operating on numpy.ndarray.

from fortuna.metric.regression import root_mean_squared_error, prediction_interval_coverage_probability
rmse = root_mean_squared_error(
    preds=test_modes,
    targets=test_targets
)
picp = prediction_interval_coverage_probability(
    lower_bounds=test_cred_intervals[:, 0],
    upper_bounds=test_cred_intervals[:, 1],
    targets=test_targets
)

Compute conformal intervals#

Finally, like in Conformal intervals from confidence or credibility intervals, starting from predictive statistics you can compute conformal intervals. Again, we need model outputs and data for this purpose. We denote val_outputs to be validation model outputs, and val_targets to be the corresponding validation target variables.

from fortuna.conformal import QuantileConformalRegressor
val_cred_intervals = calib_model.predictive.credible_interval(
    outputs=val_outputs
)
conformal_intervals = QuantileConformalRegressor().conformal_intervals(
    val_lower_bounds=val_cred_intervals[:, 0],
    val_upper_bounds=valcalib_cred_intervals[:, 1],
    test_lower_bounds=test_cred_intervals[:, 0],
    test_upper_bounds=test_cred_intervals[:, 1],
    val_targets=val_targets
)