Note

Go to the end to download the full example code or to run this example in your browser via Binder

Complex probability distribution model

Here we define a bivariate probability model, with a dependence structure defined using a gumbel copula. The goal of inference is to learn the paremeters of the Gaussian marginals and the copula parameter, i.e., the model has 5 unknown parameters.

Initially we have to import the necessary modules.

import matplotlib.pyplot as plt
from UQpy.inference import DistributionModel, MLE
from UQpy.distributions import Normal
from UQpy.inference import MinimizeOptimizer
from UQpy.distributions import JointIndependent, JointCopula, Gumbel
from UQpy.sampling import ImportanceSampling

First data is generated from a true model. A distribution with copulas does not possess a fit method, thus sampling is performed using importance sampling/resampling.

# dist_true exhibits dependence between the two dimensions, defined using a gumbel copula
dist_true = JointCopula(marginals=[Normal(), Normal()], copula=Gumbel(theta=2.))

# generate data using importance sampling: sample from a bivariate gaussian without copula, then weight samples
u = ImportanceSampling(proposal = JointIndependent(marginals=[Normal(), Normal()]),
                       log_pdf_target = dist_true.log_pdf,
                       nsamples=500)
print(u.samples.shape)
print(u.weights.shape)
# Resample to obtain 5,000 data points
u.resample(nsamples=5000)
data_2 = u.unweighted_samples
print('Shape of data: {}'.format(data_2.shape))

fig, ax = plt.subplots()
ax.scatter(data_2[:, 0], data_2[:, 1], alpha=0.2)
ax.set_title('Data points from true bivariate normal with gumbel dependency structure')
plt.show()

Data points from true bivariate normal with gumbel dependency structure

(500, 2)
(500,)
Shape of data: (5000, 2)

To define a model for inference, the user must create a custom file, here bivariate_normal_gumbel.py, to compute the log_pdf of the distribution, given a bivariate data matrix and a parameter vector of length 5. Note that for any probability model that is not one of the simple univariate pdfs supported by UQpy, such a custom file will be necessary.

d_guess = JointCopula(marginals=[Normal(loc=None, scale=None), Normal(loc=None, scale=None)],
                      copula=Gumbel(theta=None))
print(d_guess.get_parameters())
candidate_model = DistributionModel(n_parameters=5, distributions=d_guess)
print(candidate_model.list_params)

{'loc_0': None, 'scale_0': None, 'loc_1': None, 'scale_1': None, 'theta_c': None}
['loc_0', 'scale_0', 'loc_1', 'scale_1', 'theta_c']

When calling MLEstimation, the function minimize from the scipy.optimize package is used by default. The user can define bounds for the optimization, a seed, the algorithm to be used, and set the algorithm to perform several optimization iterations, starting at a different random seed every time.

optimizer = MinimizeOptimizer(bounds=[[-5, 5], [0, 10], [-5, 5], [0, 10], [1.1, 4]], method="SLSQP")
ml_estimator = MLE(inference_model=candidate_model, data=data_2, optimizer=optimizer)

ml_estimator = MLE(inference_model=candidate_model, data=data_2, optimizer=optimizer,
                   initial_parameters=[1., 1., 1., 1., 4.])

print('ML estimates of the mean={0:.3f} and std. dev={1:.3f} of 1st marginal (true: 0.0, 1.0)'.
      format(ml_estimator.mle[0], ml_estimator.mle[1]))
print('ML estimates of the mean={0:.3f} and std. dev={1:.3f} of 2nd marginal (true: 0.0, 1.0)'.
      format(ml_estimator.mle[2], ml_estimator.mle[3]))
print('ML estimates of the copula parameter={0:.3f} (true: 2.0)'.format(ml_estimator.mle[4]))

/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/utilities/MinimizeOptimizer.py:29: OptimizeWarning: Unknown solver options: catol
  return minimize(function, initial_guess, args=args,
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:63: RuntimeWarning: divide by zero encountered in log
  c = exp(-(((-log(u)) ** theta + (-log(v)) ** theta) ** (1 / theta)))
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:65: RuntimeWarning: invalid value encountered in divide
  pdf_val = (c * 1 / u * 1 / v
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:66: RuntimeWarning: divide by zero encountered in log
  * ((-log(u)) ** theta + (-log(v)) ** theta) ** (-2 + 2 / theta)
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:67: RuntimeWarning: divide by zero encountered in log
  * (log(u) * log(v)) ** (theta - 1)
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:67: RuntimeWarning: invalid value encountered in multiply
  * (log(u) * log(v)) ** (theta - 1)
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:68: RuntimeWarning: divide by zero encountered in log
  * (1 + (theta - 1) * ((-log(u)) ** theta + (-log(v)) ** theta) ** (-1 / theta)))
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:66: RuntimeWarning: divide by zero encountered in power
  * ((-log(u)) ** theta + (-log(v)) ** theta) ** (-2 + 2 / theta)
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:65: RuntimeWarning: invalid value encountered in multiply
  pdf_val = (c * 1 / u * 1 / v
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/copulas/Gumbel.py:68: RuntimeWarning: divide by zero encountered in power
  * (1 + (theta - 1) * ((-log(u)) ** theta + (-log(v)) ** theta) ** (-1 / theta)))
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/scipy/stats/_distn_infrastructure.py:2029: RuntimeWarning: divide by zero encountered in divide
  x = np.asarray((x - loc)/scale, dtype=dtyp)
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/scipy/stats/_distn_infrastructure.py:2071: RuntimeWarning: divide by zero encountered in divide
  x = np.asarray((x - loc)/scale, dtype=dtyp)
/home/docs/checkouts/readthedocs.org/user_builds/uqpyproject/envs/stable/lib/python3.9/site-packages/UQpy/distributions/collection/JointCopula.py:121: RuntimeWarning: divide by zero encountered in log
  return np.log(c_) + logpdf_val
ML estimates of the mean=0.152 and std. dev=1.038 of 1st marginal (true: 0.0, 1.0)
ML estimates of the mean=0.165 and std. dev=1.066 of 2nd marginal (true: 0.0, 1.0)
ML estimates of the copula parameter=2.087 (true: 2.0)

Again, some known parameters can be fixed during learning.

d_guess = JointCopula(marginals=[Normal(loc=None, scale=None), Normal(loc=0., scale=1.)],
                      copula=Gumbel(theta=None))
candidate_model = DistributionModel(n_parameters=3, distributions=d_guess)

optimizer = MinimizeOptimizer(bounds=[[-5, 5], [0, 10], [1.1, 4]],
                              method="SLSQP")
ml_estimator = MLE(inference_model=candidate_model, data=data_2, optimizer=optimizer,
                   initial_parameters=[1., 1., 4.])

print('ML estimates of the mean={0:.3f} and std. dev={1:.3f} of 1st marginal (true: 0.0, 1.0)'.
      format(ml_estimator.mle[0], ml_estimator.mle[1]))
print('ML estimates of the copula parameter={0:.3f} (true: 2.0)'.format(ml_estimator.mle[2]))

ML estimates of the mean=0.025 and std. dev=0.983 of 1st marginal (true: 0.0, 1.0)
ML estimates of the copula parameter=1.923 (true: 2.0)

Total running time of the script: ( 0 minutes 0.359 seconds)

Gallery generated by Sphinx-Gallery