Losses

Most lost functions behave similarly to PyTorch loss functions. The take in an input tensor \(x\) and a target \(y\) and return a tensor representing the distance between the two.

In contrast, the divergence functions presented here are not like the Torch loss functions. Divergences compute a distance between the prior and posterior distributions of a Bayesian neural network. They take a single torch.nn.Module as an input to compute a distance between the prior and posterior distribution.

Loss Baseclass

The Loss is an abstract baseclass and a subclass of torch.nn.Module. This is an abstract baseclass and the parent class to all loss functions. Like all abstract baseclasses, this cannot be instantiated but can be subclassed to write custom losses.

The documentation in the Loss may be inherited from PyTorch docstrings.

Methods

class Loss[source]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(*args, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.


List of Losses

\(L_p\) Loss

class LpLoss(ord=2, dim=None, reduction='mean')[source]

Construct a loss function \(L^p(x, y)\) where \(p=\text{dim}\)

Parameters:
  • ord (Union[int, float, str]) – Order of the norm. Default: 2

  • dim (Union[int, tuple, None]) – Dimensions over which to compute the norm specified as an integer or tuple. If dim=None, the vector is flattened before the norm is computed. Default: None

  • reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Note

This is an implementation of torch.linalg.vector_norm as a torch.nn.Module. This class implements most, but not all, of the vector_norm keywords. See the PyTorch vector_norm documentation. for details.

Formula

Ord

Norm

2 (default)

\(\sqrt{(x-y)^2}\)

int, float

\(((x-y)^n)^{1/n}\)

0

sum(x != 0), the number of non-zero elements

-inf

\(\min{|x-y|}\)

inf

\(\max{|x-y|}\)

where inf refers to float('inf'), torch.inf, or any equivalent object.

Example:

>>> loss = sml.LpLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()
forward(x, y)[source]

Compute the loss \(L_p(x, y)\).

The valid shapes for x and y depend on PyTorch broadcast semantics .

Parameters:
  • x (Tensor) – Tensor of any shape. Must be broadcastable with y

  • y (Tensor) – Tensor of any shape. Must be broadcastable with x.

Return type:

Tensor

Returns:

Tensor of shape x or y (depending on broadcasting semantics).


Gaussian Kullback-Leibler

This is an implementation of Kullback and Liebler’s work in a closed form [37].

class GaussianKullbackLeiblerDivergence(reduction='sum', device=None)[source]

Analytic form for Gaussian KL divergence for all Bayesian layers in a module

Parameters:

reduction (str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

The Gaussian Kullback-Leiber divergence \(D_{KL}\) for two univariate normal distributions is computed as

\[D_{KL}(p, q) = \frac{1}{2} \left( 2\log \frac{\sigma_1}{\sigma_0} + \frac{\sigma_0^2}{\sigma_1^2} + \frac{\sigma_0^2 + (\mu_0-\mu_1)^2}{\sigma_1^2} -1 \right)\]

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.GaussianKullbackLeiblerDivergence()
>>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.GaussianKullbackLeiblerDivergence()
>>> div = divergence_function(model)
forward(network)[source]

Compute the Gaussian KL divergence on all Bayesian layers in a module

Parameters:

network (Module) – Module containing Bayesian layers as class attributes

Return type:

Tensor

Returns:

Gaussian KL divergence between prior and posterior distributions


Monte Carlo Kullback-Leibler

This is based on Kullback and Liebler’s work [37].

class MCKullbackLeiblerDivergence(posterior_distribution, prior_distribution, n_samples=1000, reduction='sum', device=None)[source]

KL divergence by sampling for all Bayesian layers in a module.

Note

This is not identical to the Kullback-Leibler divergence computed in Bayes by Backprop

Parameters:
  • posterior_distribution (object) – A class, not an instance, of a UQpy distribution defining the variational posterior

  • prior_distribution (object) – A class, not an instance, of a UQpy distribution defining the prior

  • reduction (str) – Specifies the reduction to apply to the output: ‘mean’, or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.MCKullbackLeiblerDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.MCKullbackLeiblerDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(model)
forward(network)[source]

Compute the KL divergence by sampling the distributions on all Bayesian layers in a module

Parameters:

network (Module) – Network containing Bayesian layers

Return type:

Tensor

Returns:

KL divergence between prior and posterior distributions


Generalized Jensen-Shannon

This implements a Jensen-Shannon formula [38].

class GeneralizedJensenShannonDivergence(posterior_distribution, prior_distribution, alpha=0.5, n_samples=1000, reduction='sum', device=None)[source]

Estimate the Jensen-Shannon divergence using Monte Carlo sampling for all Bayesian layers in a module

Parameters:
  • posterior_distribution (object) – A class, not an instance, of a UQpy distribution defining the variational posterior

  • prior_distribution (object) – A class, not an instance, of a UQpy distribution defining the prior

  • alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5

  • n_samples (int) – Number of samples using in the Monte Carlo estimates. Default: 1,000

  • reduction (str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

The Jenson-Shannon divergence \(D_{JS}\) is computed as

\[D_{JS}(Q, P) = (1-\alpha) D_{KL}(Q, M) + \alpha D_{KL}(P, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=\alpha Q + (1-\alpha) P\) is the mixture distribution.

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.GeneralizedJensenShannonDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.GeneralizedJensenShannonDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(model)
forward(network)[source]

Compute the Generalized Jensen-Shannon divergence on all Bayesian layers in a module

Parameters:

network (Module) – Module containing Bayesian layers as class attributes

Return type:

Tensor

Returns:

Generalized JS divergence between prior and posterior distributions


Geometric Jensen-Shannon

This implements a Jensen-Shannon formula [38] [39].

class GeometricJensenShannonDivergence(alpha=0.5, reduction='sum', device=None)[source]

Analytic form for Geometric JS divergence for all Bayesian layers in a module

Parameters:
  • alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5

  • reduction (str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

The Geometric Jensen-Shannon divergence \(D_{JSG}\) is computed as

\[D_{JSG}(P, Q) = (1-\alpha) D_{KL}(P, M) + \alpha D_{KL}(Q, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=P^\alpha Q^{(1-\alpha)}\) is the geometric mean distribution. When the distributions \(P\) and \(Q\) are Gaussian, the closed form for Geometric Jensen-Shannon divergence is given as

\[D_{JSG}(P, Q) = \frac12 \left( \frac{(1-\alpha)\sigma_0^2 + \alpha\sigma_1^2}{\sigma_\alpha^2} + \log \frac{\sigma_\alpha^2}{\sigma_0^{2(1-\alpha)} \sigma_1^{2\alpha}} + (1-\alpha) \frac{(\mu_\alpha - \mu_0)^2}{\sigma_\alpha^2} + \frac{\alpha(\mu_\alpha - \mu_1)^2}{\sigma_\alpha^2} -1 \right)\]

where \(\sigma_\alpha^2 = \left( \frac{\alpha}{\sigma_0^2}+\frac{1-\alpha}{\sigma_1^2} \right)^{-1}\) and \(\mu_\alpha = \sigma_\alpha^2 \left[\frac{\alpha \mu_0}{\sigma_0^2} + \frac{(1-\alpha)\mu_1}{\sigma_1^2}\right]\)

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.GeometricJensenShannonDivergence()
>>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.GeometricJensenShannonDivergence()
>>> div = divergence_function(model)
forward(network)[source]

Compute the Geometric JS divergence on all Bayesian layers in a module

Parameters:

network (Module) – Module containing Bayesian layers as class attributes

Return type:

Tensor

Returns:

Geometric JS divergence between prior and posterior distributions