Losses

This submodules consists of loss functions and divergences used in calculating the error of a neural network model. Loss functions are typically used to define a “distance” (used colloquially, not as a formal metric) between a prediction and true value from a dataset. Typically, loss functions map two tensors to a scalar.

In contrast, divergences define a “distance” between a prior and posterior distribution during the training of a Bayesian neural network. These usually map a neural network (more specifically, a representation of the distribution of weights) to a scalar. While both terms contribute to the total loss during training, they are used in very different ways.

For most use cases, we recommend the corresponding Module of the same name rather than calling these functions directly. For example, using the Module GaussianKullbackLeiblerDivergence is preferred over calling the function gaussian_kullback_leibler_divergence(). If you do need to import these functions, we recommend the following import statements to prevent naming conflicts with torch.nn.functional.

>>> import torch.nn.functional as F
>>> import UQpy.scientific_machine_learning.functional as func

Gaussian Kullback-Leibler Divergence

This is an implementation of Kullback and Liebler’s work in closed form [37]. The function gaussian_kullback_leibler_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import gaussian_kullback_leibler_divergence

gaussian_kullback_leibler_divergence(posterior_mu, posterior_sigma, prior_mu, prior_sigma, reduction='sum')[source]

Compute the Gaussian Kullback-Leibler divergence for a prior and posterior distribution

Parameters:

posterior_mu (Tensor) – Mean of the posterior distribution
posterior_sigma (Tensor) – Standard deviation of the posterior distribution
prior_mu (Tensor) – Mean of the prior distribution
prior_sigma (Tensor) – Standard deviation of the prior distribution
reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

Gaussian KL divergence between prior and posterior distributions

Raises:

ValueError – If reduction is not one of ‘none’, ‘mean’, or ‘sum’

Formula

The Gaussian Kullback-Leiber divergence \(D_{KL}\) for two univariate normal distributions is computed as

\[D_{KL}(p, q) = \frac{1}{2} \left( 2\log \frac{\sigma_1}{\sigma_0} + \frac{\sigma_0^2}{\sigma_1^2} + \frac{\sigma_0^2 + (\mu_0-\mu_1)^2}{\sigma_1^2} -1 \right)\]

Monte Carlo Kullback-Leibler Divergence

This is based on Kullback and Liebler’s work [37]. The function mc_kullback_leibler_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import mc_kullback_leibler_divergence

mc_kullback_leibler_divergence(posterior_distributions, prior_distributions, n_samples=1000, reduction='sum')[source]

Compute the Kullback-Leibler divergence by sampling for a prior and posterior distribution

Parameters:

posterior_distributions (list) – List of UQpy distributions defining the variational posterior
prior_distributions (list) – List of UQpy distributions defining the prior
n_samples (int) – Number of samples in the Monte Carlo estimation
reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

KL divergence between prior and posterior distributions

Raises:

ValueError – If reduction is not one of ‘none’, ‘mean’, or ‘sum’

Generalized Jensen-Shannon Divergence

This implements a Jensen-Shannon formula [38]. The function generalized_jensen_shannon_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import generalized_jensen_shannon_divergence

generalized_jensen_shannon_divergence(posterior_distributions, prior_distributions, n_samples=1000, alpha=0.5, reduction='sum', device=None)[source]

Compute the generalized Jensen-Shannon divergence for a prior and posterior distribution

Parameters:

posterior_distributions (list) – List of UQpy distributions defining the variational posterior
prior_distributions (list) – List of UQpy distributions defining the prior
n_samples (int) – Number of samples in the Monte Carlo estimation. Default: 1,000
alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5
reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

JS divergence between prior and posterior distributions

Raises:

ValueError – If reduction is not one of ‘none’, ‘mean’, or ‘sum’
RuntimeError – If len(posterior_distributions) is not equal to len(prior_distributions)

Formula

The Jenson-Shannon divergence \(D_{JS}\) is computed as

\[D_{JS}(Q, P) = (1- \alpha) D_{KL}(Q, M) + \alpha D_{KL}(P, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=\alpha Q + (1-\alpha) P\) is the mixture distribution.

Geometric Jensen-Shannon Divergence

This implements a Jensen-Shannon formula [38] [39]. The function geometric_jensen_shannon_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import geometric_jensen_shannon_divergence

geometric_jensen_shannon_divergence(posterior_mu, posterior_sigma, prior_mu, prior_sigma, alpha=0.5, reduction='sum')[source]

Compute the Geometric Jensen-Shannon divergence for a Gaussian prior and Gaussian posterior distributions

Parameters:

posterior_mu (Tensor) – Mean of the posterior distribution
posterior_sigma (Tensor) – Standard deviation of the posterior distribution
prior_mu (Tensor) – Mean of the prior distribution
prior_sigma (Tensor) – Standard deviation of the prior distribution
alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5
reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

Geometric JS divergence between prior and posterior distributions

Formula

The Geometric Jensen-Shannon divergence \(D_{JSG}\) is computed as

\[D_{JSG}(P, Q) = (1-\alpha) D_{KL}(P, M) + \alpha D_{KL}(Q, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=P^\alpha Q^{(1-\alpha)}\) is the geometric mean distribution. When the distributions \(P\) and \(Q\) are Gaussian, the closed form for Geometric Jensen-Shannon divergence is given as

\[D_{JSG}(P, Q) = \frac12 \left( \frac{(1-\alpha)\sigma_0^2 + \alpha\sigma_1^2}{\sigma_\alpha^2} + \log \frac{\sigma_\alpha^2}{\sigma_0^{2(1-\alpha)} \sigma_1^{2\alpha}} + (1-\alpha) \frac{(\mu_\alpha - \mu_0)^2}{\sigma_\alpha^2} + \frac{\alpha(\mu_\alpha - \mu_1)^2}{\sigma_\alpha^2} -1 \right)\]

where \(\sigma_\alpha^2 = \left( \frac{\alpha}{\sigma_0^2}+\frac{1-\alpha}{\sigma_1^2} \right)^{-1}\) and \(\mu_\alpha = \sigma_\alpha^2 \left[\frac{\alpha \mu_0}{\sigma_0^2} + \frac{(1-\alpha)\mu_1}{\sigma_1^2}\right]\)