List of Bayesian Layers

All Bayesian layers use their counterparts in torch.nn.functional and/or UQpy.scientific_machine_learning.functional to define their computation. The difference between a PyTorch layer and it’s Bayesian counterpart is in the defition and training of the learnable parameters. A PyTorch layer, like torch.nn.Conv1d version defines weights and biases as deterministic tensors and learns a value for those parameters. In contrast, UQpy’s Bayesian version, like UQpy.scientific_machine_learning.BayesianConv1d, defines the weights and biases as random variables, and learns their distributions. The purpose of these layers is not to recreate features in Pytorch, but to provide Bayesian implementations that match Pytorch’s syntax as much as possible.

For example, BayesianLinear computes \(y=x A^T + b\) just as torch.nn.Linear does, and uses torch.nn.functional.linear for the computation. For convenience, the first three parameters of BayesianLinear are identical in name and purpose to Linear, and are in_features, out_features, and bias.

Bayesian Linear

class BayesianLinear(in_features, out_features, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]

Construct a Bayesian Linear layer as \(xA^T + b\) where \(A\) and \(b\) are normal random variables.

Parameters:

in_features (int) – Size of each input sample
out_features (int) – Size of each output sample
bias (bool) – If set to False, the layer will not learn an additive bias. Default: True
sampling (bool) – If True, sample layer parameters from their respective Gaussian distributions. If False, use distribution mean as parameter values. Default: True
prior_mu (float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0
prior_sigma (float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1
posterior_mu_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)
posterior_rho_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)

Shape:

Input: \((*, H_\text{in})\) where \(*\) means any number of dimensions including none and \(H_\text{in} = \text{in_features}\).
Output: \((*, H_\text{out})\) where all but the last dimension are the same shape as the input and \(H_\text{out} = \text{out_features}\).

Attributes:

Unless otherwise noted, all parameters are initialized using the priors with values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

weight_mu (torch.nn.Parameter): The learnable distribution mean of the weights of shape \((\text{out_features}, \text{in_features})\).
weight_rho (torch.nn.Parameter): The learnable distribution standard deviation of the weights of shape \((\text{out_features}, \text{in_features})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
bias_mu (torch.nn.Parameter): The learnable distribution mean of the bias of shape \((\text{out_features})\). If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
bias_rho (torch.nn.Parameter): The learnable distributinon standard deviation of the bias of shape \((\text{out_features})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

Example:

>>> layer = sml.BayesianLinear(4, 15)
>>> input = torch.rand(20, 4)
>>> layer.sample(False)
>>> deterministic_output = layer(input)
>>> layer.sample()
>>> probabilistic_output = layer(input)
>>> print(torch.all(deterministic_output == probabilistic_output))
tensor(False)

forward(x)[source]

Forward model evaluation

Parameters:: x (Tensor) – Tensor of shape \((*, \text{in_features})\)
Return type:: Tensor
Returns:: Tensor of shape \((*, \text{out_features})\)

Bayesian Convolution 1D

class BayesianConv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]

Applies a Bayesian 1D convolution over an input signal composed of several input planes.

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (Union[int, tuple]) – Size of the convolving kernel
stride (Union[int, tuple]) – Stride of the convolution. Default: 1
padding (Union[int, str, tuple]) – Padding added to both sides of the input. Note padding=’valid’ is the same as no padding. padding=’same’ pads the input so the output has the shape as the input. However, this mode doesn’t support any stride values other than 1. Default: 0
dilation (Union[int, tuple]) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. in_channels and out_channels must both be divisible by groups. Default: 1
bias (bool) – If True, adds a learnable bias to the output. Default: True
sampling (bool) – If True, sample layer parameters from their respective Gaussian distributions. If False, use distribution mean as parameter values. Default: True
prior_mu (float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0
prior_sigma (float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1
posterior_mu_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)
posterior_rho_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)

Note

This class calls torch.nn.functional.conv1d() with padding_mode='zeros'.

Shape:

Input: \((N, C_\text{in}, L_\text{in})\) or \((C_\text{in}, L_\text{in})\)
Output: \((N, C_\text{out}, L_\text{out})\) or \((C_\text{out}, L_\text{out})\),

where \(L_\text{out}= \left\lfloor \frac{L_\text{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel size} - 1) - 1}{\text{stride}} \right\rfloor + 1\)

Attributes:

Unless otherwise noted, all parameters are initialized using the priors with values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

weight_mu (torch.nn.Parameter): The learnable distribution mean of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size})\).
weight_rho (torch.nn.Parameter): The learnable distribution standard deviation of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
bias_mu (torch.nn.Parameter): The learnable distribution mean of the bias of the module of shape \((\text{out_channels})\). If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
bias_rho (torch.nn.Parameter): The learnable distribution standard deviation of the bias of the module of shape \((\text{out_channels})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. If bias is True, the values are initialized from \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\).

Example:

>>> layer = sml.BayesianConv1d(16, 33, 3, stride=2)
>>> layer.sample(False)
>>> input = torch.randn(20, 16, 50)
>>> deterministic_output = layer(input)
>>> layer.sample()
>>> probabilistic_output = layer(input)
>>> print(torch.all(deterministic_output == probabilistic_output))
tensor(False)

forward(x)[source]

Apply F.conv1d() to x where the weight and bias are drawn from random variables

Parameters:: x (Tensor) – Tensor of shape \((N, C_\text{in}, L)\) or \((C_\text{in}, L)\)
Return type:: Tensor
Returns:: Tensor of shape \((N, C_\text{out}, L)\) or \((C_\text{out}, L)\)

Bayesian Convolution 2D

class BayesianConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]

Applies a Bayesian 2D convolution over an input signal composed of several input planes.

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (Union[int, tuple[int, int]]) – Size of the convolving kernel
stride (Union[int, tuple[int, int]]) – Stride of the convolution. Default: 1
padding (Union[str, int, tuple[int, int]]) – Padding added to both sides of the input. It can be a string "valid" or "same" or an integer. Default: 0 or a tuple of integers giving the amount of implicit padding applied on both sides.
dilation (Union[int, tuple[int, int]]) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. Default: 1. in_channels and out_channels must both be divisible by groups.
bias (bool) – If True, adds a learnable bias to the output. Default: True
sampling (bool) – If True, sample layer parameters from their respective Gaussian distributions. If False, use distribution mean as parameter values. Default: True
prior_mu (float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0
prior_sigma (float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1
posterior_mu_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)
posterior_rho_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)

Note

This class calls torch.nn.functional.conv2d() with padding_mode='zeros'.

Shape:

Input: \((N, C_\text{in}, H_\text{in}, W_\text{in})\) or \((C_\text{in}, H_\text{in}, W_\text{in})\)
Output: \((N, C_\text{out}, H_\text{out}, W_\text{out})\) or \((C_\text{out}, H_\text{out}, W_\text{out})\)

where \(H_\text{out} = \left\lfloor \frac{H_\text{in} + 2 \times \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0] - 1}) - 1}{\text{stride[0]}} + 1\right\rfloor\) and \(W_\text{out} = \left\lfloor \frac{W_\text{in} + 2 \times \text{padding[1]} - \text{dilation[1]} \times (\text{kernel\_size[1] - 1}) - 1}{\text{stride[1]}} + 1\right\rfloor\)

Attributes:

Unless otherwise noted, all parameters are initialized using the priors with values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

weight_mu (torch.nn.Parameter): The learnable distribution mean of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]})\).
weight_rho (torch.nn.Parameter): The learnable distribution standard deviation of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
bias_mu (torch.nn.Parameter): The learnable distribution mean of the bias of the module of shape \((\text{out_channels})\). If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
bias_rho (torch.nn.Parameter): The learnable distribution standard deviation of the bias of the module of shape \((\text{out_channels})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

Example:

>>> # With square kernels and equal stride
>>> layer = sml.BayesianConv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> layer = sml.BayesianConv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> layer = sml.BayesianConv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> layer.sample(False)
>>> deterministic_output = layer(input)
>>> layer.sample()
>>> probabilistic_output = layer(input)
>>> print(torch.all(deterministic_output == probabilistic_output))
tensor(False)

forward(x)[source]

Apply F.conv2d() to x where the weight and bias are drawn from random variables

Parameters:: x (Tensor) – Tensor of shape \((N, C_\text{in}, H_\text{in}, W_\text{in})\)
Return type:: Tensor
Returns:: Tensor of shape \((N, C_\text{out}, H_\text{out}, W_\text{out})\)

Bayesian Convolution 3D

class BayesianConv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None, dtype=None)[source]

Applies a Bayesian 3D convolution over an input signal composed of several input planes.

Parameters:

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (Union[int, tuple[int, int, int]]) – Size of the convolving kernel
stride (Union[int, tuple[int, int, int]]) – Stride of the convolution. Default: 1
padding (Union[str, int, tuple[int, int, int]]) – Padding added to all six sides of the input. It can be either a string {‘valid’, ‘same’} or a tuple of ints giving the amount of implicit padding applied on both sides. Default: 0
dilation (Union[int, tuple[int, int, int]]) – Spacing between kernel elements. Default: 1
groups (int) – Number of blocked connections from input channels to output channels. in_channels and out_channels must both be divisible by groups. Default: 1.
bias (bool) – If True, adds a learnable bias to the output. Default: True
sampling (bool) – If True, sample layer parameters from their respective Gaussian distributions. If False, use distribution mean as parameter values. Default: True
prior_mu (float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0
prior_sigma (float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1
posterior_mu_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)
posterior_rho_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)

Note

This class calls torch.nn.functional.conv3d() with padding_mode='zeros'.

Shape:

Input: \((N, C_\text{in},D_\text{in}, H_\text{in}, W_\text{in})\) or \((C_\text{in},D_\text{in}, H_\text{in}, W_\text{in})\)
Output: \((N, C_\text{out},D_\text{out}, H_\text{out}, W_\text{out})\) or \((C_\text{out},D_\text{out}, H_\text{out}, W_\text{out})\)

where \(D_\text{out} = \left\lfloor \frac{D_\text{in} + 2 \times \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0] - 1}) - 1}{\text{stride[0]}} + 1\right\rfloor\)

\(H_\text{out} = \left\lfloor \frac{H_\text{in} + 2 \times \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0] - 1}) - 1}{\text{stride[0]}} + 1\right\rfloor\)

\(W_\text{out} = \left\lfloor \frac{W_\text{in} + 2 \times \text{padding[1]} - \text{dilation[1]} \times (\text{kernel\_size[1] - 1}) - 1}{\text{stride[1]}} + 1\right\rfloor\)

Attributes:

Unless otherwise noted, all parameters are initialized using the priors with values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\)

weight_mu (torch.nn.Parameter): The learnable distribution mean of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\).
weight_rho (torch.nn.Parameter): The learnable distribution standard deviation of the weights of the module of shape \((\text{out_channels}, \frac{\text{in_channels}}{\text{groups}}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
bias_mu (torch.nn.Parameter): The learnable distribution mean of the bias of the module of shape \((\text{out_channels})\). If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
bias_rho (torch.nn.Parameter): The learnable distribution standard deviation of the bias of the module of shape \((\text{out_channels})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

Example:

>>> # With cubic kernels and equal stride
>>> layer = sml.BayesianConv3d(16, 33, 3, stride=2)
>>> # non-cubic kernels and unequal stride and with padding
>>> layer = sml.BayesianConv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))
>>> input = torch.randn(20, 16, 10, 50, 100)
>>> layer.sample(False)
>>> deterministic_output = layer(input)
>>> layer.sample()
>>> probabilistic_output = layer(input)
>>> print(torch.all(deterministic_output == probabilistic_output))
tensor(False)

forward(x)[source]

Apply F.conv3d() to x where the weight and bias are drawn from random variables

Parameters:: x (Tensor) – Tensor of shape \((N, C_\text{in}, D_\text{in}, H_\text{in}, W_\text{in})\)
Return type:: Tensor
Returns:: Tensor of shape \((N, C_\text{out}, D_\text{out}, H_\text{out}, W_\text{out})\)

Bayesian Fourier 1D

class BayesianFourier1d(width, modes, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None)[source]

A 1d Bayesian Fourier layer as \(\mathcal{F}^{-1} (R (\mathcal{F}x)) + W(x)\) where \(R\), along with the wieghts and bias for \(W\), are normal random variables.

Parameters:

width (int) – Number of neurons in the layer and channels in the spectral convolution
modes (int) – Number of Fourier modes to keep, at most \(\lfloor L / 2 \rfloor + 1\)
bias (bool) – If True, adds a learnable bias to the convolution. Default: True
sampling (bool) – If True, sample layer parameters from their respective Gaussian distributions. If False, use distribution mean as parameter values. Default: True
prior_mu (float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0
prior_sigma (float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1
posterior_mu_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)
posterior_rho_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)

Shape:

Input: \((N, \text{width}, L)\)
Output: \((N, \text{width}, L)\)

Attributes:

Unless otherwise noted, all parameters are initialized using the priors with values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

weight_spectral_mu (torch.nn.Parameter): The learnable distribution mean of the weights of the spectral convolution of shape \((\text{width}, \text{width}, \text{modes})\) with complex entries.
weight_spectral_rho (torch.nn.Parameter): The learnable distribution standard deviation of the weights of the spectral convolution of shape \((\text{width}, \text{width}, \text{modes})\) with complex entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
weight_conv_mu (torch.nn.Parameter): The learnable distribution mean of the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size})\). The \(\text{kernel_size}=1\).
weight_conv_rho (torch.nn.Parameter) The learnable distribution standard deviation of the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size})\). The \(\text{kernel_size}=1\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
bias_conv_mu (torch.nn.Parameter): The learnable distribution mean of the bias of the convolution of shape \((\text{width})\). If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
bias_conv_rho (torch.nn.Parameter): The learnable distribution standard deviation of the bias of the convolution of shape \((\text{width})\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

Example:

>>> length = 128
>>> modes = (length // 2) + 1
>>> width = 9
>>> layer = sml.BayesianFourier1d(width, modes)
>>> layer.sample(False)
>>> x = torch.randn(2, width, length)
>>> deterministic_output = layer(x)
>>> layer.sample(True)
>>> probabilistic_output = layer(x)
>>> print(torch.all(deterministic_output == probabilistic_output))
tensor(False)

forward(x)[source]

Compute \(\mathcal{F}^{-1} (R (\mathcal{F}x)) + W(x)\)

Parameters:: x (Tensor) – Tensor of shape \((N, \text{width}, L)\)
Return type:: Tensor
Returns:: Tensor of shape \((N, \text{width}, L)\)

Bayesian Fourier 2D

class BayesianFourier2d(width, modes, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None)[source]

A 2d Bayesian Fourier layer as \(\mathcal{F}^{-1} ( R (\mathcal{F}x)) + W(x)\) where \(R\), along with the wieghts and bias for \(W\), are normal random variables.

Parameters:

width (int) – Number of neurons in the layer and channels in the spectral convolution
modes (tuple[int, int]) – Number of Fourier modes to keep, at most \((\lfloor H / 2 \rfloor + 1, \lfloor W / 2 \rfloor + 1)\)
bias (bool) – If True, adds a learnable bias to the convolution. Default: True
sampling (bool) – If True, sample layer parameters from their respective Gaussian distributions. If False, use distribution mean as parameter values. Default: True
prior_mu (float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0
prior_sigma (float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1
posterior_mu_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)
posterior_rho_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)

Shape:

Input: \((N, \text{width}, H, W)\)
Output: \((N, \text{width}, H, W)\)

Attributes:

Unless otherwise noted, all parameters are initialized using the priors with values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

weight_spectral_mu (torch.nn.Parameter): The learnable distribution mean for the weights of the spectral convolution of shape \((2, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]})\) with complex entries.
weight_spectral_rho (torch.nn.Parameter): The learnable distribution standard deviation for the weights of the spectral convolution of shape \((2, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]})\) with complex entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
weight_conv_mu (torch.nn.Parameter): The learnable distribution mean for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]})\) with real entries. The \(\text{kernel_size} = (1, 1)\).
weight_conv_rho (torch.nn.Parameter): The learnable distribution standard deviation for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]})\) with real entries. The \(\text{kernel_size} = (1, 1)\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
bias_conv_mu (torch.nn.Parameter): The learnable distribution mean for the bias of the convolution of shape \((\text{width})\) with real entires. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
bias_conv_rho (torch.nn.Parameter): The learnable distribution standard deviation for the bias of the convolution of shape \((\text{width})\) with real entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

Example:

>>> h, w = 32, 64
>>> modes = (17, 33)
>>> width = 9
>>> layer = sml.BayesianFourier2d(width, modes)
>>> x = torch.randn(1, width, h, w)
>>> layer.sample(False)
>>> deterministic_output = layer(x)
>>> layer.sample()
>>> probabilistic_output = layer(x)
>>> print(torch.all(deterministic_output == probabilistic_output))
tensor(False)

forward(x)[source]

Compute \(\mathcal{F}^{-1} (R (\mathcal{F}x)) + W(x)\)

Parameters:: x (Tensor) – Tensor of shape \((N, C_\text{in}, H, W)\)
Return type:: Tensor
Returns:: Tensor of shape \((N, C_\text{in}, H, W)\)

Bayesian Fourier 3D

class BayesianFourier3d(width, modes, bias=True, sampling=True, prior_mu=0.0, prior_sigma=0.1, posterior_mu_initial=(0.0, 0.1), posterior_rho_initial=(-3.0, 0.1), device=None)[source]

A 3d Bayesian Fourier layer as \(\mathcal{F}^{-1} ( R (\mathcal{F}x)) + W(x)\) where \(R\), along with the wieghts and bias for \(W\), are random variables.

Parameters:

width (int) – Number of neurons in the layer and channels in the spectral convolution
modes (tuple[int, int, int]) – Number of Fourier modes to keep, at most \((\lfloor D / 2 \rfloor + 1, \lfloor H / 2 \rfloor + 1, \lfloor W / 2 \rfloor + 1)\)
bias (bool) – If True, adds a learnable bias to the convolution. Default: True
sampling (bool) – If True, sample layer parameters from their respective Gaussian distributions. If False, use distribution mean as parameter values. Default: True
prior_mu (float) – Prior mean, \(\mu_\text{prior}\) of the prior normal distribution. Default: 0.0
prior_sigma (float) – Prior standard deviation, \(\sigma_\text{prior}\), of the prior normal distribution. Default: 0.1
posterior_mu_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\mu\). The initial posterior is \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\). Default: (0.0, 0.1)
posterior_rho_initial (tuple[float, float]) – Mean and standard deviation of the initial posterior distribution for \(\rho\). The initial posterior is \(\mathcal{N}(\rho_\text{posterior}[0], \rho_\text{posterior}[1])\). The standard deviation of the posterior is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to ensure it is positive. Default: (-3.0, 0.1)

Shape:

Input: \((N, \text{width}, D, H, W)\)
Output: \((N, \text{width}, D, H, W)\)

Attributes:

Unless otherwise noted, all parameters are initialized using the priors with values from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

weight_spectral_mu (torch.nn.Parameter): The learnable distribution mean for the weights of the spectral convolution of shape \((4, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]}, \text{modes[2]})\) with complex entries.
weight_spectral_rho (torch.nn.Parameter): The learnable distribution standard deviation for the weights of the spectral convolution of shape \((4, \text{width}, \text{width}, \text{modes[0]}, \text{modes[1]}, \text{modes[2]})\) with complex entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
weight_conv_mu (torch.nn.Parameter): The learnable distribution mean for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\) with real entries. The \(\text{kernel_size} = (1, 1, 1)\).
weight_conv_rho (torch.nn.Parameter): The learnable distribution standard deviation for the weights of the convolution of shape \((\text{width}, \text{width}, \text{kernel_size[0]}, \text{kernel_size[1]}, \text{kernel_size[2]})\) with real entries. The \(\text{kernel_size} = (1, 1, 1)\). The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive.
bias_conv_mu (torch.nn.Parameter): The learnable distribution mean for the bias of the convolution of shape \((\text{width})\) with real entires. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).
bias_conv_rho (torch.nn.Parameter): The learnable distribution standard deviation for the bias of the convolution of shape \((\text{width})\) with real entries. The standard deviation is computed as \(\sigma = \ln( 1 + \exp(\rho))\) to guarantee it is positive. If bias is True, the values are initialized from \(\mathcal{N}(\mu_\text{posterior}[0], \mu_\text{posterior}[1])\).

Example:

>>> d, h, w = 16, 32, 64
>>> modes = (9, 17, 33)
>>> width = 4
>>> layer = sml.BayesianFourier3d(width, modes)
>>> x = torch.randn(1, width, d, h, w)
>>> layer.sample(False)
>>> deterministic_output = layer(x)
>>> layer.sample()
>>> probabilistic_output = layer(x)
>>> print(torch.all(determinisitc_output == probabilistic_output))
tensor(False)

forward(x)[source]

Compute \(\mathcal{F}^{-1} (R (\mathcal{F}x)) + W(x)\)

Parameters:: x (Tensor) – Tensor of shape \((N, C_\text{in}, D, H, W)\)
Return type:: Tensor
Returns:: Tensor of shape \((N, C_\text{in}, D, H, W)\)