# Methods#

This section lists published methods for estimation and calibration of uncertainty supported by Fortuna.

## Posterior approximation methods#

**Maximum-A-Posteriori (MAP)**[Bassett et al., 2018]Approximate the posterior distribution with a Dirac delta centered at its estimated mode. It is the fastest ad crudest posterior approximation method supported by Fortuna. It can be seen as a form of regularized Maximum Likelihood Estimation (MLE) procedure.

**Automatic Differentiation Variational Inference (ADVI)**[Kucukelbir et al., 2017]A variational inference approach that approximates the posterior distribution with a diagonal multivariate Gaussian distribution.

**Laplace approximation**[Daxberger et al., 2021]The Laplace approximation approximates the posterior distribution with a Gaussian distribution. The mean is given by the MAP, i.e. an estimate of the mode of the posterior. The covariance matrix is expressed as the inverse of the Hessian of the negative-log-posterior. In practice, the Hessian is also approximated. Fortuna currently supports a diagonal Generalized Gauss-Newton Hessian approximation.

**Deep ensemble**[Lakshminarayanan et al., 2017]An ensemble of Maximum-A-Posteriori (MAP) starting from different initialization, approximating the posterior distribution as a mixture of Dirac deltas.

**SWAG**[Maddox et al., 2019]SWAG approximates the posterior with a Gaussian distribution. After a convergence regime is reached, the mean is taken by averaging checkpoints over the stochastic optimization trajectory. The covariance is also estimated empirically along the trajectory, and it is made of a diagonal component and a low-rank non-diagonal one.

**Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)**[Chen et al., 2014]SGHMC approximates the posterior as a steady-state distribution of a Monte Carlo Markov chain with Hamiltonian dynamics. After the initial “burn-in” phase, each step of the chain generates samples from the posterior.

**Cyclical Stochastic Gradient Langevin Dynamics (Cyclical SGLD)**[Zhang et al., 2020]Cyclical SGLD adapts the cyclical cosine step size schedule, and alternates between

*exploration*and*sampling*stages to better explore the multimodal posteriors for deep neural networks.

**Spectral-normalized Neural Gaussian Process (SNGP)**[Liu et al., 2020]A Deep Kernel Learning (DKL; see [Wilson et al., 2015]) approach with spectral normalization and random features. It is useful to improve epistemic uncertainty out-of-distribution.

## Parametric calibration methods#

Fortuna supports parametric calibration by adding an output calibration model on top of the outputs of the model used for
training or posterior approximation, and training its parameters. A **temperature scaling** model
[Guo et al., 2017]
is supported explicitly, for both classification and regression, where the outputs are calibrated using a single scaling
parameter.

## Conformal prediction methods#

We support conformal prediction methods for classification and regression.

For classification:

**A simple conformal prediction sets method**[Vovk et al., 2005]A simple conformal prediction method deriving a score function from the probability associated to the largest class.

**An adaptive conformal prediction sets method**[Romano et al., 2020]A method for conformal prediction deriving a score function that makes use of the full vector of class probabilities.

**Adaptive conformal inference**[Gibbs et al., 2021]A method for conformal prediction that aims at correcting the coverage of conformal prediction methods in a sequential prediction framework (e.g. time series forecasting) when the distribution of the data shifts over time.

**BatchMVP**[Jung C. et al., 2022]A conformal prediction algorithm that satisfies coverage guarantees conditioned on group membership and non-conformity thresholds.

**Multicalibrate**[Hébert-Johnson Ú. et al., 2017], [Roth A., Algorithm 15]Unlike standard conformal prediction methods, this algorithm returns scalar calibrated score values for each data point. For example, in binary classification, it can return calibrated probabilities of predictions. This method satisfies coverage guarantees conditioned on group membership and model values.

**One-Shot Multicalibrate**[Roth A., Algorithm 15]Unlike standard conformal prediction methods, this algorithm returns scalar calibrated score values for each data point. This method satisfies coverage guarantees conditioned on model values.

For regression:

**Conformalized quantile regression**[Romano et al., 2019]A conformal prediction method that takes in input a coverage interval and calibrates it.

**Conformal interval from scalar uncertainty measure**[Angelopoulos et al., 2022]A conformal prediction method that takes in input a scalar measure of uncertainty (e.g. the standard deviation) and returns a conformal interval.

**Jackknife+, jackknife-minmax and CV+**[Barber et al., 2021]Methods based on leave-one-out and K-fold cross validation that, from model outputs only, provide conformal intervals satisfying minimal coverage properties.

**BatchMVP**[Jung C. et al., 2022]A conformal prediction algorithm that satisfies coverage guarantees conditioned on group membership and non-conformity thresholds.

**EnbPI**[Xu et al., 2021]A conformal prediction method for time series regression based on data bootstrapping.

**Multicalibrate**[Hébert-Johnson Ú. et al., 2017], [Roth A., Algorithm 15]Unlike standard conformal prediction methods, this algorithm returns scalar calibrated score values for each data point. This method satisfies coverage guarantees conditioned on group membership and model values.

**One-Shot Multicalibrate**[Roth A., Algorithm 15]Unlike standard conformal prediction methods, this algorithm returns scalar calibrated score values for each data point. This method satisfies coverage guarantees conditioned on model values.

**Adaptive conformal inference**[Gibbs et al., 2021]A method for conformal prediction that aims at correcting the coverage of conformal prediction methods in a sequential prediction framework (e.g. time series forecasting) when the distribution of the data shifts over time.

## Out-of-distribution (OOD) detection#

We support the following methods for OOD detection in classification:

**Mahalanobis distance classifier**[Lee et al., 2018]A classifier based on the Mahalanobis distance. It estimates an OOD score for each input.

**Deep Deterministic Uncertainty (DDU)**[Mukhoti et al., 2022]Similar to the Mahalanobis distance classifier, it fits a Gaussian for each label and estimates an OOD score for each input.