Losses

This submodules consists of loss functions and divergences used in calculating the error of a neural network model. Loss functions are typically used to define a “distance” (used colloquially, not as a formal metric) between a prediction and true value from a dataset. Typically, loss functions map two tensors to a scalar.

In contrast, divergences define a “distance” between a prior and posterior distribution during the training of a Bayesian neural network. These usually map a neural network (more specifically, a representation of the distribution of weights) to a scalar. While both terms contribute to the total loss during training, they are used in very different ways.

For most use cases, we recommend the corresponding Module of the same name rather than calling these functions directly. For example, using the Module GaussianKullbackLeiblerDivergence is preferred over calling the function gaussian_kullback_leibler_divergence(). If you do need to import these functions, we recommend the following import statements to prevent naming conflicts with torch.nn.functional.

>>> import torch.nn.functional as F
>>> import UQpy.scientific_machine_learning.functional as func

Gaussian Kullback-Leibler Divergence

This is an implementation of Kullback and Liebler’s work in closed form [37]. The function gaussian_kullback_leibler_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import gaussian_kullback_leibler_divergence
gaussian_kullback_leibler_divergence(posterior_mu, posterior_sigma, prior_mu, prior_sigma, reduction='sum')[source]

Compute the Gaussian Kullback-Leibler divergence for a prior and posterior distribution

Parameters:
  • posterior_mu (Tensor) – Mean of the posterior distribution

  • posterior_sigma (Tensor) – Standard deviation of the posterior distribution

  • prior_mu (Tensor) – Mean of the prior distribution

  • prior_sigma (Tensor) – Standard deviation of the prior distribution

  • reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

Gaussian KL divergence between prior and posterior distributions

Raises:

ValueError – If reduction is not one of ‘none’, ‘mean’, or ‘sum’

Formula

The Gaussian Kullback-Leiber divergence \(D_{KL}\) for two univariate normal distributions is computed as

\[D_{KL}(p, q) = \frac{1}{2} \left( 2\log \frac{\sigma_1}{\sigma_0} + \frac{\sigma_0^2}{\sigma_1^2} + \frac{\sigma_0^2 + (\mu_0-\mu_1)^2}{\sigma_1^2} -1 \right)\]

Monte Carlo Kullback-Leibler Divergence

This is based on Kullback and Liebler’s work [37]. The function mc_kullback_leibler_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import mc_kullback_leibler_divergence
mc_kullback_leibler_divergence(posterior_distributions, prior_distributions, n_samples=1000, reduction='sum')[source]

Compute the Kullback-Leibler divergence by sampling for a prior and posterior distribution

Parameters:
  • posterior_distributions (list) – List of UQpy distributions defining the variational posterior

  • prior_distributions (list) – List of UQpy distributions defining the prior

  • n_samples (int) – Number of samples in the Monte Carlo estimation

  • reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

KL divergence between prior and posterior distributions

Raises:

ValueError – If reduction is not one of ‘none’, ‘mean’, or ‘sum’


Generalized Jensen-Shannon Divergence

This implements a Jensen-Shannon formula [38]. The function generalized_jensen_shannon_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import generalized_jensen_shannon_divergence
generalized_jensen_shannon_divergence(posterior_distributions, prior_distributions, n_samples=1000, alpha=0.5, reduction='sum', device=None)[source]

Compute the generalized Jensen-Shannon divergence for a prior and posterior distribution

Parameters:
  • posterior_distributions (list) – List of UQpy distributions defining the variational posterior

  • prior_distributions (list) – List of UQpy distributions defining the prior

  • n_samples (int) – Number of samples in the Monte Carlo estimation. Default: 1,000

  • alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5

  • reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

JS divergence between prior and posterior distributions

Raises:
  • ValueError – If reduction is not one of ‘none’, ‘mean’, or ‘sum’

  • RuntimeError – If len(posterior_distributions) is not equal to len(prior_distributions)

Formula

The Jenson-Shannon divergence \(D_{JS}\) is computed as

\[D_{JS}(Q, P) = (1- \alpha) D_{KL}(Q, M) + \alpha D_{KL}(P, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=\alpha Q + (1-\alpha) P\) is the mixture distribution.


Geometric Jensen-Shannon Divergence

This implements a Jensen-Shannon formula [38] [39]. The function geometric_jensen_shannon_divergence() is imported using the following command:

>>> from UQpy.scientific_machine_learning.functional import geometric_jensen_shannon_divergence
geometric_jensen_shannon_divergence(posterior_mu, posterior_sigma, prior_mu, prior_sigma, alpha=0.5, reduction='sum')[source]

Compute the Geometric Jensen-Shannon divergence for a Gaussian prior and Gaussian posterior distributions

Parameters:
  • posterior_mu (Tensor) – Mean of the posterior distribution

  • posterior_sigma (Tensor) – Standard deviation of the posterior distribution

  • prior_mu (Tensor) – Mean of the prior distribution

  • prior_sigma (Tensor) – Standard deviation of the prior distribution

  • alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5

  • reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Return type:

Tensor

Returns:

Geometric JS divergence between prior and posterior distributions

Formula

The Geometric Jensen-Shannon divergence \(D_{JSG}\) is computed as

\[D_{JSG}(P, Q) = (1-\alpha) D_{KL}(P, M) + \alpha D_{KL}(Q, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=P^\alpha Q^{(1-\alpha)}\) is the geometric mean distribution. When the distributions \(P\) and \(Q\) are Gaussian, the closed form for Geometric Jensen-Shannon divergence is given as

\[D_{JSG}(P, Q) = \frac12 \left( \frac{(1-\alpha)\sigma_0^2 + \alpha\sigma_1^2}{\sigma_\alpha^2} + \log \frac{\sigma_\alpha^2}{\sigma_0^{2(1-\alpha)} \sigma_1^{2\alpha}} + (1-\alpha) \frac{(\mu_\alpha - \mu_0)^2}{\sigma_\alpha^2} + \frac{\alpha(\mu_\alpha - \mu_1)^2}{\sigma_\alpha^2} -1 \right)\]

where \(\sigma_\alpha^2 = \left( \frac{\alpha}{\sigma_0^2}+\frac{1-\alpha}{\sigma_1^2} \right)^{-1}\) and \(\mu_\alpha = \sigma_\alpha^2 \left[\frac{\alpha \mu_0}{\sigma_0^2} + \frac{(1-\alpha)\mu_1}{\sigma_1^2}\right]\)