List of Bayesian Layers
All Bayesian layers use their counterparts in torch.nn.functional and/or
UQpy.scientific_machine_learning.functional to define their computation.
The difference between a PyTorch layer and it’s Bayesian counterpart is in the defition and training of the learnable parameters.
A PyTorch layer, like torch.nn.Conv1d version defines weights and biases as deterministic tensors
and learns a value for those parameters.
In contrast, UQpy’s Bayesian version, like UQpy.scientific_machine_learning.BayesianConv1d,
defines the weights and biases as random variables, and learns their distributions.
The purpose of these layers is not to recreate features in Pytorch, but to provide Bayesian implementations
that match Pytorch’s syntax as much as possible.
For example, BayesianLinear computes \(y=x A^T + b\) just as torch.nn.Linear does,
and uses torch.nn.functional.linear for the computation. For convenience, the first three parameters
of BayesianLinear are identical in name and purpose to Linear,
and are in_features, out_features, and bias.
Bayesian Linear
- class BayesianLinear(in_features, out_features, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]
Construct a Bayesian Linear layer as \(xA^T + b\) where \(A\) and \(b\) are normal random variables.
- Parameters:
in_features (
int) – Size of each input sampleout_features (
int) – Size of each output samplebias (
bool) – If set toFalse, the layer will not learn an additive bias. Default:Truesampling (
bool) – IfTrue, sample layer parameters from their respective Gaussian distributions. IfFalse, use distribution mean as parameter values. Default:Trueprior_mu (
float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0prior_sigma (
float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1posterior_mu_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)posterior_rho_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)
Shape:
Input: \((*, H_\text{in})\) where \(*\) means any number of dimensions including none and \(H_\text{in} = \text{in_features}\).
Output: \((*, H_\text{out})\) where all but the last dimension are the same shape as the input and \(H_\text{out} = \text{out_features}\).
Attributes:
Unless otherwise noted, all parameters are initialized using the
priorswith values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).weight_mu (
torch.nn.Parameter): The learnable distribution mean of the weights of shape \((\text{out_features}, \text{in_features})\).weight_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the weights of shape \((\text{out_features}, \text{in_features})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.bias_mu (
torch.nn.Parameter): The learnable distribution mean of the bias of shape \((\text{out_features})\). IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).bias_rho (
torch.nn.Parameter): The learnable distributinon standard deviation of the bias of shape \((\text{out_features})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
Example:
>>> layer = sml.BayesianLinear(4, 15) >>> input = torch.rand(20, 4) >>> layer.sample(False) >>> deterministic_output = layer(input) >>> layer.sample() >>> probabilistic_output = layer(input) >>> print(torch.all(deterministic_output == probabilistic_output)) tensor(False)
Bayesian Convolution 1D
- class BayesianConv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]
Applies a Bayesian 1D convolution over an input signal composed of several input planes.
- Parameters:
in_channels (
int) – Number of channels in the input imageout_channels (
int) – Number of channels produced by the convolutionkernel_size (
Union[int,tuple]) – Size of the convolving kernelstride (
Union[int,tuple]) – Stride of the convolution. Default: 1padding (
Union[int,str,tuple]) – Padding added to both sides of the input. Note padding=’valid’ is the same as no padding. padding=’same’ pads the input so the output has the shape as the input. However, this mode doesn’t support any stride values other than 1. Default: 0dilation (
Union[int,tuple]) – Spacing between kernel elements. Default: 1groups (
int) – Number of blocked connections from input channels to output channels.in_channelsandout_channelsmust both be divisible bygroups. Default: 1bias (
bool) – IfTrue, adds a learnable bias to the output. Default:Truesampling (
bool) – IfTrue, sample layer parameters from their respective Gaussian distributions. IfFalse, use distribution mean as parameter values. Default:Trueprior_mu (
float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0prior_sigma (
float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1posterior_mu_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)posterior_rho_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)
Note
This class calls
torch.nn.functional.conv1d()withpadding_mode='zeros'.Shape:
Input: \((N, C_\text{in}, L_\text{in})\) or \((C_\text{in}, L_\text{in})\)
Output: \((N, C_\text{out}, L_\text{out})\) or \((C_\text{out}, L_\text{out})\),
where \(L_\text{out}= \left\lfloor \frac{L_\text{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel size} - 1) - 1}{\text{stride}} \right\rfloor + 1\)
Attributes:
Unless otherwise noted, all parameters are initialized using the
priorswith values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).weight_mu (
torch.nn.Parameter): The learnable distribution mean of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size})\).weight_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.bias_mu (
torch.nn.Parameter): The learnable distribution mean of the bias of the module of shape \((\text{out_channels})\). IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).bias_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the bias of the module of shape \((\text{out_channels})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\).
Example:
>>> layer = sml.BayesianConv1d(16, 33, 3, stride=2) >>> layer.sample(False) >>> input = torch.randn(20, 16, 50) >>> deterministic_output = layer(input) >>> layer.sample() >>> probabilistic_output = layer(input) >>> print(torch.all(deterministic_output == probabilistic_output)) tensor(False)
Bayesian Convolution 2D
- class BayesianConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]
Applies a Bayesian 2D convolution over an input signal composed of several input planes.
- Parameters:
in_channels (
int) – Number of channels in the input imageout_channels (
int) – Number of channels produced by the convolutionkernel_size (
Union[int,tuple[int,int]]) – Size of the convolving kernelstride (
Union[int,tuple[int,int]]) – Stride of the convolution. Default: 1padding (
Union[str,int,tuple[int,int]]) – Padding added to both sides of the input. It can be a string"valid"or"same"or an integer. Default: 0 or a tuple of integers giving the amount of implicit padding applied on both sides.dilation (
Union[int,tuple[int,int]]) – Spacing between kernel elements. Default: 1groups (
int) – Number of blocked connections from input channels to output channels. Default: 1.in_channelsandout_channelsmust both be divisible bygroups.bias (
bool) – IfTrue, adds a learnable bias to the output. Default:Truesampling (
bool) – IfTrue, sample layer parameters from their respective Gaussian distributions. IfFalse, use distribution mean as parameter values. Default:Trueprior_mu (
float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0prior_sigma (
float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1posterior_mu_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)posterior_rho_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)
Note
This class calls
torch.nn.functional.conv2d()withpadding_mode='zeros'.Shape:
Input: \((N, C_\text{in}, H_\text{in}, W_\text{in})\) or \((C_\text{in}, H_\text{in}, W_\text{in})\)
Output: \((N, C_\text{out}, H_\text{out}, W_\text{out})\) or \((C_\text{out}, H_\text{out}, W_\text{out})\)
where \(H_\text{out} = \left\lfloor \frac{H_\text{in} + 2 \times \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0] - 1}) - 1}{\text{stride[0]}} + 1\right\rfloor\) and \(W_\text{out} = \left\lfloor \frac{W_\text{in} + 2 \times \text{padding[1]} - \text{dilation[1]} \times (\text{kernel\_size[1] - 1}) - 1}{\text{stride[1]}} + 1\right\rfloor\)
Attributes:
Unless otherwise noted, all parameters are initialized using the
priorswith values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).weight_mu (
torch.nn.Parameter): The learnable distribution mean of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]})\).weight_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.bias_mu (
torch.nn.Parameter): The learnable distribution mean of the bias of the module of shape \((\text{out_channels})\). IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).bias_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the bias of the module of shape \((\text{out_channels})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
Example:
>>> # With square kernels and equal stride >>> layer = sml.BayesianConv2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> layer = sml.BayesianConv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> layer = sml.BayesianConv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> layer.sample(False) >>> deterministic_output = layer(input) >>> layer.sample() >>> probabilistic_output = layer(input) >>> print(torch.all(deterministic_output == probabilistic_output)) tensor(False)
Bayesian Convolution 3D
- class BayesianConv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]
Applies a Bayesian 3D convolution over an input signal composed of several input planes.
- Parameters:
in_channels (
int) – Number of channels in the input imageout_channels (
int) – Number of channels produced by the convolutionkernel_size (
Union[int,tuple[int,int,int]]) – Size of the convolving kernelstride (
Union[int,tuple[int,int,int]]) – Stride of the convolution. Default: 1padding (
Union[str,int,tuple[int,int,int]]) – Padding added to all six sides of the input. It can be either a string {‘valid’, ‘same’} or a tuple of ints giving the amount of implicit padding applied on both sides. Default: 0dilation (
Union[int,tuple[int,int,int]]) – Spacing between kernel elements. Default: 1groups (
int) – Number of blocked connections from input channels to output channels.in_channelsandout_channelsmust both be divisible bygroups. Default: 1.bias (
bool) – IfTrue, adds a learnable bias to the output. Default:Truesampling (
bool) – IfTrue, sample layer parameters from their respective Gaussian distributions. IfFalse, use distribution mean as parameter values. Default:Trueprior_mu (
float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0prior_sigma (
float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1posterior_mu_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)posterior_rho_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)
Note
This class calls
torch.nn.functional.conv3d()withpadding_mode='zeros'.Shape:
Input: \((N, C_\text{in},D_\text{in}, H_\text{in}, W_\text{in})\) or \((C_\text{in},D_\text{in}, H_\text{in}, W_\text{in})\)
Output: \((N, C_\text{out},D_\text{out}, H_\text{out}, W_\text{out})\) or \((C_\text{out},D_\text{out}, H_\text{out}, W_\text{out})\)
where \(D_\text{out} = \left\lfloor \frac{D_\text{in} + 2 \times \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0] - 1}) - 1}{\text{stride[0]}} + 1\right\rfloor\)
\(H_\text{out} = \left\lfloor \frac{H_\text{in} + 2 \times \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0] - 1}) - 1}{\text{stride[0]}} + 1\right\rfloor\)
\(W_\text{out} = \left\lfloor \frac{W_\text{in} + 2 \times \text{padding[1]} - \text{dilation[1]} \times (\text{kernel\_size[1] - 1}) - 1}{\text{stride[1]}} + 1\right\rfloor\)
Attributes:
Unless otherwise noted, all parameters are initialized using the
priorswith values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\)weight_mu (
torch.nn.Parameter): The learnable distribution mean of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\).weight_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.bias_mu (
torch.nn.Parameter): The learnable distribution mean of the bias of the module of shape \((\text{out_channels})\). IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).bias_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the bias of the module of shape \((\text{out_channels})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
Example:
>>> # With cubic kernels and equal stride >>> layer = sml.BayesianConv3d(16, 33, 3, stride=2) >>> # non-cubic kernels and unequal stride and with padding >>> layer = sml.BayesianConv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0)) >>> input = torch.randn(20, 16, 10, 50, 100) >>> layer.sample(False) >>> deterministic_output = layer(input) >>> layer.sample() >>> probabilistic_output = layer(input) >>> print(torch.all(deterministic_output == probabilistic_output)) tensor(False)
- forward(x)[source]
Apply
F.conv3d()toxwhere the weight and bias are drawn from random variables- Parameters:
x (
Tensor) – Tensor of shape \((N, C_\text{in}, D_\text{in}, H_\text{in}, W_\text{in})\)- Return type:
Tensor- Returns:
Tensor of shape \((N, C_\text{out}, D_\text{out}, H_\text{out}, W_\text{out})\)
Bayesian Fourier 1D
- class BayesianFourier1d(width, modes, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None)[source]
A 1d Bayesian Fourier layer as \(\mathcal{F}^{-1} (R (\mathcal{F}x)) + W(x)\) where \(R\), along with the wieghts and bias for \(W\), are normal random variables.
- Parameters:
width (
int) – Number of neurons in the layer and channels in the spectral convolutionmodes (
int) – Number of Fourier modes to keep, at most \(\lfloor L / 2 \rfloor + 1\)bias (
bool) – IfTrue, adds a learnable bias to the convolution. Default:Truesampling (
bool) – IfTrue, sample layer parameters from their respective Gaussian distributions. IfFalse, use distribution mean as parameter values. Default:Trueprior_mu (
float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0prior_sigma (
float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1posterior_mu_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)posterior_rho_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)
Shape:
Input: \((N, \text{width}, L)\)
Output: \((N, \text{width}, L)\)
Attributes:
Unless otherwise noted, all parameters are initialized using the
priorswith values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).weight_spectral_mu (
torch.nn.Parameter): The learnable distribution mean of the weights of the spectral convolution of shape \((\text{width}, \text{width}, \text{modes})\) with complex entries.weight_spectral_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the weights of the spectral convolution of shape \((\text{width}, \text{width}, \text{modes})\) with complex entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.weight_conv_mu (
torch.nn.Parameter): The learnable distribution mean of the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size})\). The \(\text{kernel_size}=1\).weight_conv_rho (
torch.nn.Parameter) The learnable distribution standard deviation of the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size})\). The \(\text{kernel_size}=1\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.bias_conv_mu (
torch.nn.Parameter): The learnable distribution mean of the bias of the convolution of shape \((\text{width})\). IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).bias_conv_rho (
torch.nn.Parameter): The learnable distribution standard deviation of the bias of the convolution of shape \((\text{width})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
Example:
>>> length = 128 >>> modes = (length // 2) + 1 >>> width = 9 >>> layer = sml.BayesianFourier1d(width, modes) >>> layer.sample(False) >>> x = torch.randn(2, width, length) >>> deterministic_output = layer(x) >>> layer.sample(True) >>> probabilistic_output = layer(x) >>> print(torch.all(deterministic_output == probabilistic_output)) tensor(False)
Bayesian Fourier 2D
- class BayesianFourier2d(width, modes, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None)[source]
A 2d Bayesian Fourier layer as \(\mathcal{F}^{-1} ( R (\mathcal{F}x)) + W(x)\) where \(R\), along with the wieghts and bias for \(W\), are normal random variables.
- Parameters:
width (
int) – Number of neurons in the layer and channels in the spectral convolutionmodes (
tuple[int,int]) – Number of Fourier modes to keep, at most \((\lfloor H / 2 \rfloor + 1, \lfloor W / 2 \rfloor + 1)\)bias (
bool) – IfTrue, adds a learnable bias to the convolution. Default:Truesampling (
bool) – IfTrue, sample layer parameters from their respective Gaussian distributions. IfFalse, use distribution mean as parameter values. Default:Trueprior_mu (
float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0prior_sigma (
float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1posterior_mu_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)posterior_rho_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)
Shape:
Input: \((N, \text{width}, H, W)\)
Output: \((N, \text{width}, H, W)\)
Attributes:
Unless otherwise noted, all parameters are initialized using the
priorswith values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).weight_spectral_mu (
torch.nn.Parameter): The learnable distribution mean for the weights of the spectral convolution of shape \((2, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]})\) with complex entries.weight_spectral_rho (
torch.nn.Parameter): The learnable distribution standard deviation for the weights of the spectral convolution of shape \((2, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]})\) with complex entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.weight_conv_mu (
torch.nn.Parameter): The learnable distribution mean for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]})\) with real entries. The \(\text{kernel_size} = (1, 1)\).weight_conv_rho (
torch.nn.Parameter): The learnable distribution standard deviation for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]})\) with real entries. The \(\text{kernel_size} = (1, 1)\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.bias_conv_mu (
torch.nn.Parameter): The learnable distribution mean for the bias of the convolution of shape \((\text{width})\) with real entires. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).bias_conv_rho (
torch.nn.Parameter): The learnable distribution standard deviation for the bias of the convolution of shape \((\text{width})\) with real entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
Example:
>>> h, w = 32, 64 >>> modes = (17, 33) >>> width = 9 >>> layer = sml.BayesianFourier2d(width, modes) >>> x = torch.randn(1, width, h, w) >>> layer.sample(False) >>> deterministic_output = layer(x) >>> layer.sample() >>> probabilistic_output = layer(x) >>> print(torch.all(deterministic_output == probabilistic_output)) tensor(False)
Bayesian Fourier 3D
- class BayesianFourier3d(width, modes, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None)[source]
A 3d Bayesian Fourier layer as \(\mathcal{F}^{-1} ( R (\mathcal{F}x)) + W(x)\) where \(R\), along with the wieghts and bias for \(W\), are random variables.
- Parameters:
width (
int) – Number of neurons in the layer and channels in the spectral convolutionmodes (
tuple[int,int,int]) – Number of Fourier modes to keep, at most \((\lfloor D / 2 \rfloor + 1, \lfloor H / 2 \rfloor + 1, \lfloor W / 2 \rfloor + 1)\)bias (
bool) – IfTrue, adds a learnable bias to the convolution. Default:Truesampling (
bool) – IfTrue, sample layer parameters from their respective Gaussian distributions. IfFalse, use distribution mean as parameter values. Default:Trueprior_mu (
float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0prior_sigma (
float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1posterior_mu_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)posterior_rho_initial (
tuple[float,float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)
Shape:
Input: \((N, \text{width}, D, H, W)\)
Output: \((N, \text{width}, D, H, W)\)
Attributes:
Unless otherwise noted, all parameters are initialized using the
priorswith values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).weight_spectral_mu (
torch.nn.Parameter): The learnable distribution mean for the weights of the spectral convolution of shape \((4, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]}, \text{modes[2]})\) with complex entries.weight_spectral_rho (
torch.nn.Parameter): The learnable distribution standard deviation for the weights of the spectral convolution of shape \((4, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]}, \text{modes[2]})\) with complex entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.weight_conv_mu (
torch.nn.Parameter): The learnable distribution mean for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\) with real entries. The \(\text{kernel_size} = (1, 1, 1)\).weight_conv_rho (
torch.nn.Parameter): The learnable distribution standard deviation for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\) with real entries. The \(\text{kernel_size} = (1, 1, 1)\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.bias_conv_mu (
torch.nn.Parameter): The learnable distribution mean for the bias of the convolution of shape \((\text{width})\) with real entires. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).bias_conv_rho (
torch.nn.Parameter): The learnable distribution standard deviation for the bias of the convolution of shape \((\text{width})\) with real entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. IfbiasisTrue, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
Example:
>>> d, h, w = 16, 32, 64 >>> modes = (9, 17, 33) >>> width = 4 >>> layer = sml.BayesianFourier3d(width, modes) >>> x = torch.randn(1, width, d, h, w) >>> layer.sample(False) >>> deterministic_output = layer(x) >>> layer.sample() >>> probabilistic_output = layer(x) >>> print(torch.all(determinisitc_output == probabilistic_output)) tensor(False)