Losses
Most lost functions behave similarly to PyTorch loss functions. The take in an input tensor \(x\) and a target \(y\) and return a tensor representing the distance between the two.
In contrast, the divergence functions presented here are not like the Torch loss functions.
Divergences compute a distance between the prior and posterior distributions of a Bayesian neural network.
They take a single torch.nn.Module as an input to compute a distance between the prior and posterior distribution.
Loss Baseclass
The Loss is an abstract baseclass and a subclass of torch.nn.Module.
This is an abstract baseclass and the parent class to all loss functions.
Like all abstract baseclasses, this cannot be instantiated but can be subclassed to write custom losses.
The documentation in the Loss may be inherited from PyTorch docstrings.
Methods
- class Loss[source]
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- abstract forward(*args, **kwargs)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
List of Losses
\(L_p\) Loss
- class LpLoss(ord=2, dim=None, reduction='mean')[source]
Construct a loss function \(L^p(x, y)\) where \(p=\text{dim}\)
- Parameters:
ord (
Union[int,float,str]) – Order of the norm. Default: 2dim (
Union[int,tuple,None]) – Dimensions over which to compute the norm specified as an integer or tuple. Ifdim=None, the vector is flattened before the norm is computed. Default: Nonereduction (
str) – Specifies the reduction to apply to the output: ‘none’, ‘mean’, or ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
Note
This is an implementation of
torch.linalg.vector_normas atorch.nn.Module. This class implements most, but not all, of thevector_normkeywords. See the PyTorch vector_norm documentation. for details.Formula
Ord
Norm
2 (default)
\(\sqrt{(x-y)^2}\)
int, float
\(((x-y)^n)^{1/n}\)
0
sum(x != 0), the number of non-zero elements
-inf
\(\min{|x-y|}\)
inf
\(\max{|x-y|}\)
where inf refers to
float('inf'),torch.inf, or any equivalent object.Example:
>>> loss = sml.LpLoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> output = loss(input, target) >>> output.backward()
- forward(x, y)[source]
Compute the loss \(L_p(x, y)\).
The valid shapes for
xandydepend on PyTorch broadcast semantics .- Parameters:
x (
Tensor) – Tensor of any shape. Must be broadcastable withyy (
Tensor) – Tensor of any shape. Must be broadcastable withx.
- Return type:
Tensor- Returns:
Tensor of shape
xory(depending on broadcasting semantics).
Gaussian Kullback-Leibler
This is an implementation of Kullback and Liebler’s work in a closed form [37].
- class GaussianKullbackLeiblerDivergence(reduction='sum', device=None)[source]
Analytic form for Gaussian KL divergence for all Bayesian layers in a module
- Parameters:
reduction (
str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
The Gaussian Kullback-Leiber divergence \(D_{KL}\) for two univariate normal distributions is computed as
\[D_{KL}(p, q) = \frac{1}{2} \left( 2\log \frac{\sigma_1}{\sigma_0} + \frac{\sigma_0^2}{\sigma_1^2} + \frac{\sigma_0^2 + (\mu_0-\mu_1)^2}{\sigma_1^2} -1 \right)\]Examples:
>>> # Divergence of a single Bayesian Layer >>> layer = sml.BayesianLinear(4, 5) >>> divergence_function = sml.GaussianKullbackLeiblerDivergence() >>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network >>> network = nn.Sequential( >>> sml.BayesianLinear(1, 4), >>> nn.ReLU(), >>> nn.Linear(4, 4), >>> nn.ReLU(), >>> sml.BayesianLinear(4, 1), >>> ) >>> model = sml.FeedForwardNeuralNetwork(network) >>> divergence_function = sml.GaussianKullbackLeiblerDivergence() >>> div = divergence_function(model)
Monte Carlo Kullback-Leibler
This is based on Kullback and Liebler’s work [37].
- class MCKullbackLeiblerDivergence(posterior_distribution, prior_distribution, n_samples=1000, reduction='sum', device=None)[source]
KL divergence by sampling for all Bayesian layers in a module.
Note
This is not identical to the Kullback-Leibler divergence computed in Bayes by Backprop
- Parameters:
posterior_distribution (
object) – A class, not an instance, of a UQpy distribution defining the variational posteriorprior_distribution (
object) – A class, not an instance, of a UQpy distribution defining the priorreduction (
str) – Specifies the reduction to apply to the output: ‘mean’, or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
Examples:
>>> # Divergence of a single Bayesian Layer >>> layer = sml.BayesianLinear(4, 5) >>> divergence_function = sml.MCKullbackLeiblerDivergence(UQpy.Normal, UQpy.Normal) >>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network >>> network = nn.Sequential( >>> sml.BayesianLinear(1, 4), >>> nn.ReLU(), >>> nn.Linear(4, 4), >>> nn.ReLU(), >>> sml.BayesianLinear(4, 1), >>> ) >>> model = sml.FeedForwardNeuralNetwork(network) >>> divergence_function = sml.MCKullbackLeiblerDivergence(UQpy.Normal, UQpy.Normal) >>> div = divergence_function(model)
Generalized Jensen-Shannon
This implements a Jensen-Shannon formula [38].
- class GeneralizedJensenShannonDivergence(posterior_distribution, prior_distribution, alpha=0.5, n_samples=1000, reduction='sum', device=None)[source]
Estimate the Jensen-Shannon divergence using Monte Carlo sampling for all Bayesian layers in a module
- Parameters:
posterior_distribution (
object) – A class, not an instance, of a UQpy distribution defining the variational posteriorprior_distribution (
object) – A class, not an instance, of a UQpy distribution defining the prioralpha (
float) – Weight of the mixture distribution, \(0 \leq \alpha \leq 1\). See formula for details. Default: 0.5n_samples (
int) – Number of samples using in the Monte Carlo estimates. Default: 1,000reduction (
str) – Specifies the reduction to apply to the output: ‘mean’ or ‘sum’. ‘mean’: the output will be averaged, ‘sum’: the output will be summed. Default: ‘sum’
The Jenson-Shannon divergence \(D_{JS}\) is computed as
\[D_{JS}(Q, P) = (1-\alpha) D_{KL}(Q, M) + \alpha D_{KL}(P, M)\]where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=\alpha Q + (1-\alpha) P\) is the mixture distribution.
Examples:
>>> # Divergence of a single Bayesian Layer >>> layer = sml.BayesianLinear(4, 5) >>> divergence_function = sml.GeneralizedJensenShannonDivergence(UQpy.Normal, UQpy.Normal) >>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network >>> network = nn.Sequential( >>> sml.BayesianLinear(1, 4), >>> nn.ReLU(), >>> nn.Linear(4, 4), >>> nn.ReLU(), >>> sml.BayesianLinear(4, 1), >>> ) >>> model = sml.FeedForwardNeuralNetwork(network) >>> divergence_function = sml.GeneralizedJensenShannonDivergence(UQpy.Normal, UQpy.Normal) >>> div = divergence_function(model)
Geometric Jensen-Shannon
This implements a Jensen-Shannon formula [38] [39].
- class GeometricJensenShannonDivergence(alpha=0.5, reduction='sum', device=None)[source]
Analytic form for Geometric JS divergence for all Bayesian layers in a module
- Parameters:
The Geometric Jensen-Shannon divergence \(D_{JSG}\) is computed as
\[D_{JSG}(P, Q) = (1-\alpha) D_{KL}(P, M) + \alpha D_{KL}(Q, M)\]where \(D_{KL}\) is the Kullback-Leibler divergence and \(M=P^\alpha Q^{(1-\alpha)}\) is the geometric mean distribution. When the distributions \(P\) and \(Q\) are Gaussian, the closed form for Geometric Jensen-Shannon divergence is given as
\[D_{JSG}(P, Q) = \frac12 \left( \frac{(1-\alpha)\sigma_0^2 + \alpha\sigma_1^2}{\sigma_\alpha^2} + \log \frac{\sigma_\alpha^2}{\sigma_0^{2(1-\alpha)} \sigma_1^{2\alpha}} + (1-\alpha) \frac{(\mu_\alpha - \mu_0)^2}{\sigma_\alpha^2} + \frac{\alpha(\mu_\alpha - \mu_1)^2}{\sigma_\alpha^2} -1 \right)\]where \(\sigma_\alpha^2 = \left( \frac{\alpha}{\sigma_0^2}+\frac{1-\alpha}{\sigma_1^2} \right)^{-1}\) and \(\mu_\alpha = \sigma_\alpha^2 \left[\frac{\alpha \mu_0}{\sigma_0^2} + \frac{(1-\alpha)\mu_1}{\sigma_1^2}\right]\)
Examples:
>>> # Divergence of a single Bayesian Layer >>> layer = sml.BayesianLinear(4, 5) >>> divergence_function = sml.GeometricJensenShannonDivergence() >>> div = divergence_function(layer)
>>> # Divergence of a Bayesian neural network >>> network = nn.Sequential( >>> sml.BayesianLinear(1, 4), >>> nn.ReLU(), >>> nn.Linear(4, 4), >>> nn.ReLU(), >>> sml.BayesianLinear(4, 1), >>> ) >>> model = sml.FeedForwardNeuralNetwork(network) >>> divergence_function = sml.GeometricJensenShannonDivergence() >>> div = divergence_function(model)