Gaussian Process Regression

The GaussianProcessRegression class defines an approximate surrogate model or response surface which can be used to predict the model response and its uncertainty at points where the model has not been previously evaluated. Gaussian Process regression utilizes the concepts of Bayesian modelling and identifies a suitable function, which fits the training data (\(X \in \mathbb{R}^{n \times d}, Y \in \mathbb{R}^{n, 1}\)). First, it formulates a prior function over the output variable (\(y=g(x))\)) as a Gaussian Process, which can be defined using a mean function and a kernel.

\[p(y|x, \theta) = \mathcal{N}(m(x), K(x, x))\]

The mean function \(m(x)\) is given as a linear combination of ‘\(p\)’ chosen scalar basis functions as:

\[m(\beta, x) = \mathbb{E}[g(x)] = \beta_1 f_1(x) + \dots + \beta_p f_p(x) = f(x)^T \beta=f(x)^T \beta.\]

The kernel \(k(x, s)\) defines the covariance function as:

\[k(x, s) = \mathbb{E}[(g(x)-m(x))(g(s)-m(s))]\]

and the Gaussian process can be written as,

\[y = g(x) \sim \mathcal{GP}(m(x, \beta), k(x,s, \theta)),\]

where, \(\beta\) is the regression coefficient estimated by least square solution and \(\theta=\{l_1, ..., l_d, \sigma \}\) are a set of hyperparameters generally governing the correlation length (lengthscale, \(l_i\)) and the process variance (\(\sigma\)) of the model, determined by maximixing the log-likelihood function

\[\text{log}(p(y|x, \theta)) = -\frac{1}{2}(Y-F\beta)^T K^{-1} (Y-F\beta) - \frac{1}{2}\text{log}(|K|) - \frac{n}{2}\text{log}(2\pi)\]

The covariance is evaluated between a set of existing sample points \(X\) in the domain of interest to form the covariance matrix \(K=K(X, X)\), and the basis functions are evaluated at the sample points \(X\) to form the matrix \(F\). Using these matrices, the regression coefficients, \(\beta\), is computed as

\[(F^T K^{-1} F)\beta^* = F^T K^{-1} Y\]

The joint distribution between the training outputs (\(g(X)\)) and test outputs (\(g(X^*)\)) can be expressed as:

\[\begin{split}\begin{bmatrix} Y \\ g(X^*)\end{bmatrix} = \mathcal{N} \Bigg(\begin{bmatrix} m(X) \\ m(X^*)\end{bmatrix}, \begin{bmatrix} K(X, X) & K(X, X^*)\\ K(X^*, X) & K(X^*, X^*) \end{bmatrix} \Bigg)\end{split}\]

The final predictor function is then given by the mean of the posterior distribution \(g(X^*)|Y, X, X^*\), defined as:

\[\hat{g}(X^*) = f(X^*)^T \beta^* + K(X, X^*)^T K^{-1}(Y - F\beta^*)\]

and the covariance matrix of the posterior distribution is expressed as:

\[cov(g(X^*)) = K(X^*, X^*) - K(X^*, X)K^{-1}K(X, X^*)\]

In case of noisy output (i.e. \(y = g(x)+\epsilon\)), where noise \(\epsilon\) is a independent gaussian distribution with variance \(\sigma_n^2\). The GaussianProcessRegression class includes noise standard deviation in the hyperparameters (\(\theta=\{l_1, ..., l_d, \sigma, \sigma_n \}\)) along with the lengthscales and process standard deviation, and identify them by maximixing log-likelihood function. The mean and covariance of the posterior distribution is modified by substituting \(K\) as \(K+\sigma_n^2 I\):

\[\begin{split}\hat{g}(X^*) = f(X^*)^T \beta^* + K(X, X^*)^T (K+\sigma_n^2 I)^{-1}(Y - F\beta^*) \\ cov(g(X^*)) = K(X^*, X^*) - K(X^*, X)(K+\sigma_n^2 I)^{-1}K(X, X^*)\end{split}\]

Regression Models

The GaussianProcessRegression class offers a variety of built-in regression models, specified by the regression input described below.

Constant Model

The ConstantRegression class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.regression_models.ConstantRegression import ConstantRegression

In Constant model, the regression model is assumed to take a constant value such that

\[f(\beta, x) = \beta_0\]

Linear Model

The LinearRegression class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.regression_models.LinearRegression import LinearRegression

The regression model is defined by the linear basis function on each input dimension.

\[f(\beta, x) = \beta_0 + \sum_{i=1}^d \beta_i x_i\]

Quadratic Model

The QuadraticRegression class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.regression_models.QuadraticRegression import QuadraticRegression

The quadratic regression model is given by:

\[f(\beta, x) = \beta_0 + \sum_{i=1}^d \beta_i x_i + \sum_{i=1}^d \sum_{j=1}^d \beta_{ij} x_i x_j\]

User-Defined Regression Model

Adding a new regression model to the GaussianProcessRegressor class is straightforward. This is done by creating a new class that evaluates the basis functions, by extending the Regression.

The Regression class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.regression_models.baseclass.Regression import Regression

class Regression[source]

Abstract base class of all Regressions. Serves as a template for creating new Gaussian Process regression functions.

abstract r(s)[source]: Abstract method that needs to be implemented by the user when creating a new Regression function.

This class may be passed directly as an object to the regression input of the GaussianProcessRegression class. This new class must have a method r(self,s) that takes as input the samples points at which to evaluate the model and return the value of the basis functions at these sample points.

The output of this function should be a two dimensional numpy array with the first dimension being the number of samples and the second dimension being the number of basis functions.

An example user-defined model is given below:

>>> class UserRegression(Regression):
>>>
>>>    def r(self, s):
>>>        s = np.atleast_2d(s)
>>>        fx = np.ones([np.size(s, 0), 1])
>>>        return fx

Kernels

The GaussianProcessRegression class offers a variety of built-in kernels, specified by the kernel input described below.

Radial Basis Function Kernel

The RBF class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.kernels.RBF import RBF

The RBF kernel takes the following form:

\[K(h_i, \theta_i) = \sigma^2 \prod_{1}^{d} \mathcal{R}_i(h_i, l_i) = \sigma^2 \prod_{1}^{d} \exp\bigg[ -\frac{h_i^2}{2l_i^2}\bigg]\]

where \(h_i = s_i-x_i\).

Matern Kernel

The Matern class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.kernels.Matern import Matern

The Matern kernel takes the following form:

\[K(x, s, \theta) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \bigg( \sqrt{2 \nu} \frac{d}{l} \bigg)^{\nu} K_v \bigg(\sqrt{2 \nu} \frac{d}{l} \bigg)\]

where \(d = ||x-s||_2^{1/2}\) is the euclidean distance and \(\theta\) is consist of lengthscale (\(l\)), process variance (\(\sigma^2\)) and smoothing parameter (\(\nu\)). Also, \(\Gamma\) is tha gamma function and \(K_v\) is the modified Bessel function. This kernel concides with absolute exponential and RBF kernel for \(\nu=0.5\) and \(\infty\), respectively.

User-Defined Kernel

Adding a new kernel to the GaussianProcessRegression class is straightforward. This is done by creating a new class that extends the Kernel abstract base class. This new class must have a method c(self, x, s, params) that takes as input the new points, training points and hyperparameters. Notice that the input params include lengthscales and process standard deviation, not noise standard deviation (even for noisy data).

The Kernel class is imported using the following command:

>>> from UQpy.utilities.kernels.baseclass.Kernel import Kernel

The method should return covariance matrix, i.e. a 2-D array with first dimension being the number of new points and second dimension being the number of training points.

An example user-defined kernel is given below:

>>> class RBF(Kernel):
>>>
>>>    def c(self, x, s, params):
>>>         l, sigma = params[:-1], params[-1]
>>>         stack = cdist(x/l, s/l, metric='euclidean')
>>>         cx = sigma**2 * np.exp(-(stack**2)/2)
>>>         return cx

Constraints

The GaussianProcessRegression class offers a built-in constraints on the hyperparameters estimation while maximixing loglikelihood function.

NonNegative

The NonNegative class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.constraints.NonNegative import NonNegative

The NonNegative constraints enforces non-negativity in the GP model, this is useful for treating many physical properties as output variable. This constraint improves prediction and error estimates, where model output is always greater than or equal to zero. This is ensured by solving the maximum likelihood problem with the following constraints.

\[\begin{split}\mathrm{argmax}_{\theta} \log{p(y|x, \theta)} \quad \quad \quad \text{s.t. } \begin{matrix} \hat{g}(X_c) - z\sigma_{\hat{g}}(X_c) \geq 0 \\ |Y-\hat{g}(X)| \geq \epsilon \end{matrix}\end{split}\]

where \(X_c\) is the set of constraint points defined by user, \(z\) is the number of standard deviation below mean, that is non-negative for constraint points. \(\epsilon\) is the error tolerance between output value and prediction at the sample points.

User-Defined Constraints

Adding a new kernel to the GaussianProcessRegression class is straightforward. This is done by creating a new class that extends the UQpy.surrogates.gaussian_process.kernels.baseclass.Constraints abstract base class. This new class must have two methods define_arguments(self, x_train, y_train, predict_function) and constraints(theta, kwargs).

The UQpy.surrogates.gaussian_process.constraints.baseclass.Constraints class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.constraints.baseclass.Constraints import ConstraintsGPR

class ConstraintsGPR[source]

Abstract base class of all Constraints. Serves as a template for creating new Kriging constraints for log-likelihood function.

abstract define_arguments(x_train, y_train, predict_function)[source]: Abstract method that needs to be implemented by the user which stores all the arguments in a dictionary and return that dictionary inside a list.

static constraints(theta_, kwargs)[source]: A static method, which take hyperaparameters and constraints argument and evaluate constraints value.

The first method takes the sample points, output values and prediction method as inputs. This method combines the input attributes (defined while initiating the new class) and inputs of the define_arguments method. This method returns a list containing a dictionary of all the arguments to the constraints function.

The second method constraints is static method, which takes first input as the log-transformed hyperparameters (\(\theta\)) and second input is a dictionary containing all arguments as defined in the first method (define_arguments). User can unpacked the stored variables in the dictionary and return the constraints function. Notice that this is a static method as optimizers require constraints to be a callable.

The constraints on the hyperparameters are only compatible with FminCobyla class as optimizer.

An example user-defined kernel is given below:

>>> class NonNegative(ConstraintsGPR):
>>>     def __init__(self, constraint_points, observed_error=0.01, z_value=2):
>>>         self.constraint_points = constraint_points
>>>         self.observed_error = observed_error
>>>         self.z_value = z_value
>>>         self.kwargs = {}
>>>         self.constraint_args = None
>>>
>>>     def define_arguments(self, x_train, y_train, predict_function):
>>>         self.kwargs['x_t'] = x_train
>>>         self.kwargs['y_t'] = y_train
>>>         self.kwargs['pred'] = predict_function
>>>         self.kwargs['const_points'] = self.constraint_points
>>>         self.kwargs['obs_err'] = self.observed_error
>>>         self.kwargs['z_'] = self.z_value
>>>         self.constraint_args = [self.kwargs]
>>>         return self.constraints
>>>
>>>     @staticmethod
>>>     def constraints(theta_, kwargs):
>>>         x_t, y_t, pred = kwargs['x_t'], kwargs['y_t'], kwargs['pred']
>>>         const_points, obs_err, z_ = kwargs['const_points'], kwargs['obs_err'], kwargs['z_']
>>>
>>>         tmp_predict, tmp_error = pred(const_points, True, hyperparameters=10**theta_)
>>>         constraint1 = tmp_predict - z_ * tmp_error
>>>
>>>         tmp_predict2 = pred(x_t, False, hyperparameters=10**theta_)
>>>         constraint2 = obs_err - np.abs(y_t[:, 0] - tmp_predict2)
>>>
>>>         constraints = np.concatenate((constraint1, constraint2), axis=None)
>>>         return constraints

GaussianProcessRegression Class

The GaussianProcessRegression class is imported using the following command:

>>> from UQpy.surrogates.gaussian_process.GaussianProcessRegression import GaussianProcessRegression

Methods

class GaussianProcessRegression(kernel, hyperparameters, regression_model=None, optimizer=None, bounds=None, optimize_constraints=None, optimizations_number=1, normalize=False, noise=False, random_state=None)[source]

GaussianProcessRegressor an Gaussian process regression-based surrogate model to predict the model output at new sample points.

Parameters:

kernel (Kernel) – kernel specifies and evaluates the kernel. Built-in options: RBF
hyperparameters (list) – List or array of initial values for the kernel hyperparameters/scale parameters. This input attribute defines the dimension of input training point, thus its length/shape should be equal to the input dimension plus one (d+1), this list/array includes ‘d’ length scale and process standard deviation. In case of noisy observations/output, its length/shape should be equal to the input dimension plus two (d+2), this list/array includes ‘d’ lengthscales, process standard deviation and noise variance.
regression_model – A class object, which computes the basis function at a sample point. If regression_model is None, this class will train GP with regression. Default: None
bounds – This input attribute defines the bounds of the loguniform distributions, which randomly generates new starting point for optimization problem to estimate maximum likelihood estimator. This should be a closed bound. Default: [10**-3, 10**3] for each hyperparameter and [10**-10, 10**-1] for variance in noise (if applicable).
optimizations_number (int) – Number of times MLE optimization problem is to be solved with a random starting point. Default: 1.
normalize (bool) – Boolean flag used in case data normalization is required.
optimizer – A class object of ‘MinimizeOptimizer’ or ‘FminCobyla’ from UQpy.utilities module. If optimizer is not defined, this class will not solve MLE problem and predictions will be based on hyperparameters provided as input. Default: None.
noise (bool) – Boolean flag used in case of noisy training data. Default: False
random_state (Union[None, int, RandomState]) – Random seed used to initialize the pseudo-random number generator. If an integer is provided, this sets the seed for an object of numpy.random.RandomState. Otherwise, the object itself can be passed directly.

fit(samples, values, optimizations_number=None, hyperparameters=None)[source]

Fit the surrogate model using the training samples and the corresponding model values.

The user can run this method multiple time after initiating the GaussianProcessRegressor class object.

This method updates the samples and parameters of the GaussianProcessRegressor object. This method uses hyperparameters from previous run as the starting point for MLE problem unless user provides a new starting point.

Parameters:

samples – ndarray containing the training points.
values – ndarray containing the model evaluations at the training points.
optimizations_number – number of optimization iterations
hyperparameters – List or array of initial values for the kernel hyperparameters/scale parameters.

The fit() method has no returns, although it creates the beta, err_var and C_inv attributes of the GaussianProcessRegressor class.

predict(points, return_std=False, hyperparameters=None)[source]

Predict the model response at new points.

This method evaluates the regression and correlation model at new sample points. Then, it predicts the function value and standard deviation.

Parameters:

points – Points at which to predict the model response.
return_std (bool) – Indicator to estimate standard deviation.
hyperparameters (Optional[list]) – Hyperparameters for correlation model.

Returns:

Predicted values at the new points, Standard deviation of predicted values at the new points

Attributes

GaussianProcessRegression.beta: Regression coefficients.

GaussianProcessRegression.err_var: Variance of the Gaussian random process.

GaussianProcessRegression.C_inv: Inverse Cholesky decomposition of the correlation matrix.

Examples

Gaussian Process Regression Examples