Losses
This submodules consists of loss functions and divergences used in calculating the error of a neural network model. Loss functions are typically used to define a “distance” (used colloquially, not as a formal metric) between a prediction and true value from a dataset. Typically, loss functions map two tensors to a scalar.
In contrast, divergences define a “distance” between a prior and posterior distribution during the training of a Bayesian neural network. These usually map a neural network (more specifically, a representation of the distribution of weights) to a scalar. While both terms contribute to the total loss during training, they are used in very different ways.
For most use cases, we recommend the corresponding Module of the same name rather than calling these functions directly.
For example, using the Module GaussianKullbackLeiblerDivergence is preferred over calling the function
gaussian_kullback_leibler_divergence(). If you do need to import these functions, we recommend the following
import statements to prevent naming conflicts with torch.nn.functional.
>>> import torch.nn.functional as F
>>> import UQpy.scientific_machine_learning.functional as func
Gaussian Kullback-Leibler Divergence
This is an implementation of Kullback and Liebler’s work in closed form [37].
The function gaussian_kullback_leibler_divergence() is imported using the following command:
>>> from UQpy.scientific_machine_learning.functional import gaussian_kullback_leibler_divergence
- gaussian_kullback_leibler_divergence(posterior_mu, posterior_sigma, prior_mu, prior_sigma, reduction='sum')[source]
Compute the Gaussian Kullback-Leibler divergence for a prior and posterior distribution
- Parameters:
posterior_mu (
Tensor) – Mean of the posterior distributionposterior_sigma (
Tensor) – Standard deviation of the posterior distributionprior_mu (
Tensor) – Mean of the prior distributionprior_sigma (
Tensor) – Standard deviation of the prior distributionreduction (
str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
- Return type:
Tensor- Returns:
Gaussian KL divergence between prior and posterior distributions
- Raises:
ValueError – If
reductionis not one of ‘none’, ‘mean’, or ‘sum’
Formula
The Gaussian Kullback-Leiber divergence \(D_{KL}\) for two univariate normal distributions is computed as
\[D_{KL}(p, q) = \frac{1}{2} \left( 2\log \frac{\sigma_1}{\sigma_0} + \frac{\sigma_0^2}{\sigma_1^2} + \frac{\sigma_0^2 + (\mu_0-\mu_1)^2}{\sigma_1^2} -1 \right)\]
Monte Carlo Kullback-Leibler Divergence
This is based on Kullback and Liebler’s work [37].
The function mc_kullback_leibler_divergence() is imported using the following command:
>>> from UQpy.scientific_machine_learning.functional import mc_kullback_leibler_divergence
- mc_kullback_leibler_divergence(posterior_distributions, prior_distributions, n_samples=1000, reduction='sum')[source]
Compute the Kullback-Leibler divergence by sampling for a prior and posterior distribution
- Parameters:
posterior_distributions (
list) – List of UQpy distributions defining the variational posteriorprior_distributions (
list) – List of UQpy distributions defining the priorn_samples (
int) – Number of samples in the Monte Carlo estimationreduction (
str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
- Return type:
Tensor- Returns:
KL divergence between prior and posterior distributions
- Raises:
ValueError – If
reductionis not one of ‘none’, ‘mean’, or ‘sum’
Generalized Jensen-Shannon Divergence
This implements a Jensen-Shannon formula [38].
The function generalized_jensen_shannon_divergence() is imported using the following command:
>>> from UQpy.scientific_machine_learning.functional import generalized_jensen_shannon_divergence
- generalized_jensen_shannon_divergence(posterior_distributions, prior_distributions, n_samples=1000, alpha=0.5, reduction='sum', device=None)[source]
Compute the generalized Jensen-Shannon divergence for a prior and posterior distribution
- Parameters:
posterior_distributions (
list) – List of UQpy distributions defining the variational posteriorprior_distributions (
list) – List of UQpy distributions defining the priorn_samples (
int) – Number of samples in the Monte Carlo estimation. Default: 1,000alpha (
float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5reduction (
str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
- Return type:
Tensor- Returns:
JS divergence between prior and posterior distributions
- Raises:
ValueError – If
reductionis not one of ‘none’, ‘mean’, or ‘sum’RuntimeError – If
len(posterior_distributions)is not equal tolen(prior_distributions)
Formula
The Jenson-Shannon divergence \(D_{JS}\) is computed as
\[D_{JS}(Q, P) = (1- \alpha) D_{KL}(Q, M) + \alpha D_{KL}(P, M)\]where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=\alpha Q + (1-\alpha) P\) is the mixture distribution.
Geometric Jensen-Shannon Divergence
This implements a Jensen-Shannon formula [38] [39].
The function geometric_jensen_shannon_divergence() is imported using the following command:
>>> from UQpy.scientific_machine_learning.functional import geometric_jensen_shannon_divergence
- geometric_jensen_shannon_divergence(posterior_mu, posterior_sigma, prior_mu, prior_sigma, alpha=0.5, reduction='sum')[source]
Compute the Geometric Jensen-Shannon divergence for a Gaussian prior and Gaussian posterior distributions
- Parameters:
posterior_mu (
Tensor) – Mean of the posterior distributionposterior_sigma (
Tensor) – Standard deviation of the posterior distributionprior_mu (
Tensor) – Mean of the prior distributionprior_sigma (
Tensor) – Standard deviation of the prior distributionalpha (
float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5reduction (
str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
- Return type:
Tensor- Returns:
Geometric JS divergence between prior and posterior distributions
Formula
The Geometric Jensen-Shannon divergence \(D_{JSG}\) is computed as
\[D_{JSG}(P, Q) = (1-\alpha) D_{KL}(P, M) + \alpha D_{KL}(Q, M)\]where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=P^\alpha Q^{(1-\alpha)}\) is the geometric mean distribution. When the distributions \(P\) and \(Q\) are Gaussian, the closed form for Geometric Jensen-Shannon divergence is given as
\[D_{JSG}(P, Q) = \frac12 \left( \frac{(1-\alpha)\sigma_0^2 + \alpha\sigma_1^2}{\sigma_\alpha^2} + \log \frac{\sigma_\alpha^2}{\sigma_0^{2(1-\alpha)} \sigma_1^{2\alpha}} + (1-\alpha) \frac{(\mu_\alpha - \mu_0)^2}{\sigma_\alpha^2} + \frac{\alpha(\mu_\alpha - \mu_1)^2}{\sigma_\alpha^2} -1 \right)\]where \(\sigma_\alpha^2 = \left( \frac{\alpha}{\sigma_0^2}+\frac{1-\alpha}{\sigma_1^2} \right)^{-1}\) and \(\mu_\alpha = \sigma_\alpha^2 \left[\frac{\alpha \mu_0}{\sigma_0^2} + \frac{(1-\alpha)\mu_1}{\sigma_1^2}\right]\)