Losses

Most lost functions behave similarly to PyTorch loss functions. The take in an input tensor \(x\) and a target \(y\) and return a tensor representing the distance between the two.

In contrast, the divergence functions presented here are not like the Torch loss functions. Divergences compute a distance between the prior and posterior distributions of a Bayesian neural network. They take a single torch.nn.Module as an input to compute a distance between the prior and posterior distribution.

Loss Baseclass

The Loss is an abstract baseclass and a subclass of torch.nn.Module. This is an abstract baseclass and the parent class to all loss functions. Like all abstract baseclasses, this cannot be instantiated but can be subclassed to write custom losses.

The documentation in the Loss may be inherited from PyTorch docstrings.

Methods

class Loss[source]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(*args, **kwargs)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

List of Losses

\(L_p\) Loss

class LpLoss(ord=2, dim=None, reduction='mean')[source]

Construct a loss function \(L^p(x, y)\) where \(p=\text{dim}\)

Parameters:

ord (Union[int, float, str]) – Order of the norm. Default: 2
dim (Union[int, tuple, None]) – Dimensions over which to compute the norm specified as an integer or tuple. If dim=None, the vector is flattened before the norm is computed. Default: None
reduction (str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Note

This is an implementation of torch.linalg.vector_norm as a torch.nn.Module. This class implements most, but not all, of the vector_norm keywords. See the PyTorch vector_norm documentation. for details.

Formula

Ord	Norm
2 (default)	\(\sqrt{(x-y)^2}\)
int, float	\(((x-y)^n)^{1/n}\)
0	sum(x != 0), the number of non-zero elements
-inf	\(\min{\|x-y\|}\)
inf	\(\max{\|x-y\|}\)

where inf refers to float('inf'), torch.inf, or any equivalent object.

Example:

>>> loss = sml.LpLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()

forward(x, y)[source]

Compute the loss \(L_p(x, y)\).

The valid shapes for x and y depend on PyTorch broadcast semantics .

Parameters:

x (Tensor) – Tensor of any shape. Must be broadcastable with y
y (Tensor) – Tensor of any shape. Must be broadcastable with x.

Return type:

Tensor

Returns:

Tensor of shape x or y (depending on broadcasting semantics).

Gaussian Kullback-Leibler

This is an implementation of Kullback and Liebler’s work in a closed form [37].

class GaussianKullbackLeiblerDivergence(reduction='sum', device=None)[source]

Analytic form for Gaussian KL divergence for all Bayesian layers in a module

Parameters:: reduction (str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

The Gaussian Kullback-Leiber divergence \(D_{KL}\) for two univariate normal distributions is computed as

\[D_{KL}(p, q) = \frac{1}{2} \left( 2\log \frac{\sigma_1}{\sigma_0} + \frac{\sigma_0^2}{\sigma_1^2} + \frac{\sigma_0^2 + (\mu_0-\mu_1)^2}{\sigma_1^2} -1 \right)\]

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.GaussianKullbackLeiblerDivergence()
>>> div = divergence_function(layer)

>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.GaussianKullbackLeiblerDivergence()
>>> div = divergence_function(model)

forward(network)[source]

Compute the Gaussian KL divergence on all Bayesian layers in a module

Parameters:: network (Module) – Module containing Bayesian layers as class attributes
Return type:: Tensor
Returns:: Gaussian KL divergence between prior and posterior distributions

Monte Carlo Kullback-Leibler

This is based on Kullback and Liebler’s work [37].

class MCKullbackLeiblerDivergence(posterior_distribution, prior_distribution, n_samples=1000, reduction='sum', device=None)[source]

KL divergence by sampling for all Bayesian layers in a module.

Note

This is not identical to the Kullback-Leibler divergence computed in Bayes by Backprop

Parameters:

posterior_distribution (object) – A class, not an instance, of a UQpy distribution defining the variational posterior
prior_distribution (object) – A class, not an instance, of a UQpy distribution defining the prior
reduction (str) – Specifies the reduction to apply to the output: ‘mean’, or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.MCKullbackLeiblerDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(layer)

>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.MCKullbackLeiblerDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(model)

forward(network)[source]

Compute the KL divergence by sampling the distributions on all Bayesian layers in a module

Parameters:: network (Module) – Network containing Bayesian layers
Return type:: Tensor
Returns:: KL divergence between prior and posterior distributions

Generalized Jensen-Shannon

This implements a Jensen-Shannon formula [38].

class GeneralizedJensenShannonDivergence(posterior_distribution, prior_distribution, alpha=0.5, n_samples=1000, reduction='sum', device=None)[source]

Estimate the Jensen-Shannon divergence using Monte Carlo sampling for all Bayesian layers in a module

Parameters:

posterior_distribution (object) – A class, not an instance, of a UQpy distribution defining the variational posterior
prior_distribution (object) – A class, not an instance, of a UQpy distribution defining the prior
alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5
n_samples (int) – Number of samples using in the Monte Carlo estimates. Default: 1,000
reduction (str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

The Jenson-Shannon divergence \(D_{JS}\) is computed as

\[D_{JS}(Q, P) = (1-\alpha) D_{KL}(Q, M) + \alpha D_{KL}(P, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=\alpha Q + (1-\alpha) P\) is the mixture distribution.

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.GeneralizedJensenShannonDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(layer)

>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.GeneralizedJensenShannonDivergence(UQpy.Normal, UQpy.Normal)
>>> div = divergence_function(model)

forward(network)[source]

Compute the Generalized Jensen-Shannon divergence on all Bayesian layers in a module

Parameters:: network (Module) – Module containing Bayesian layers as class attributes
Return type:: Tensor
Returns:: Generalized JS divergence between prior and posterior distributions

Geometric Jensen-Shannon

This implements a Jensen-Shannon formula [38] [39].

class GeometricJensenShannonDivergence(alpha=0.5, reduction='sum', device=None)[source]

Analytic form for Geometric JS divergence for all Bayesian layers in a module

Parameters:

alpha (float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5
reduction (str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’

The Geometric Jensen-Shannon divergence \(D_{JSG}\) is computed as

\[D_{JSG}(P, Q) = (1-\alpha) D_{KL}(P, M) + \alpha D_{KL}(Q, M)\]

where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=P^\alpha Q^{(1-\alpha)}\) is the geometric mean distribution. When the distributions \(P\) and \(Q\) are Gaussian, the closed form for Geometric Jensen-Shannon divergence is given as

\[D_{JSG}(P, Q) = \frac12 \left( \frac{(1-\alpha)\sigma_0^2 + \alpha\sigma_1^2}{\sigma_\alpha^2} + \log \frac{\sigma_\alpha^2}{\sigma_0^{2(1-\alpha)} \sigma_1^{2\alpha}} + (1-\alpha) \frac{(\mu_\alpha - \mu_0)^2}{\sigma_\alpha^2} + \frac{\alpha(\mu_\alpha - \mu_1)^2}{\sigma_\alpha^2} -1 \right)\]

where \(\sigma_\alpha^2 = \left( \frac{\alpha}{\sigma_0^2}+\frac{1-\alpha}{\sigma_1^2} \right)^{-1}\) and \(\mu_\alpha = \sigma_\alpha^2 \left[\frac{\alpha \mu_0}{\sigma_0^2} + \frac{(1-\alpha)\mu_1}{\sigma_1^2}\right]\)

Examples:

>>> # Divergence of a single Bayesian Layer
>>> layer = sml.BayesianLinear(4, 5)
>>> divergence_function = sml.GeometricJensenShannonDivergence()
>>> div = divergence_function(layer)

>>> # Divergence of a Bayesian neural network
>>> network = nn.Sequential(
>>>     sml.BayesianLinear(1, 4),
>>>     nn.ReLU(),
>>>     nn.Linear(4, 4),
>>>     nn.ReLU(),
>>>     sml.BayesianLinear(4, 1),
>>> )
>>> model = sml.FeedForwardNeuralNetwork(network)
>>> divergence_function = sml.GeometricJensenShannonDivergence()
>>> div = divergence_function(model)

forward(network)[source]

Compute the Geometric JS divergence on all Bayesian layers in a module

Parameters:: network (Module) – Module containing Bayesian layers as class attributes
Return type:: Tensor
Returns:: Geometric JS divergence between prior and posterior distributions