Stratified Sampling

Stratified sampling is a variance reduction technique that divides the parameter space into a set of disjoint and space-filling strata. Samples are then drawn from these strata in order to improve the space-filling properties of the sample design. Stratified sampling allows for unequally weighted samples, such that a Monte Carlo estimator of the quantity \(E[Y]\) takes the following form:

\[E[Y] \approx \sum_{i=1}^N w_i Y_i\]

where \(w_i\) are the sample weights and \(Y_i\) are the model evaluations. The individual sample weights are computed as:

\[w_i = \dfrac{V_{i}}{N_{i}}\]

where \(V_{i}\le 1\) is the volume of stratum \(i\) in the unit hypercube (i.e. the probability that a random sample will fall in stratum \(i\)) and \(N_{i}\) is the number of samples drawn from stratum \(i\).

StratifiedSampling Class

The TrueStratifiedSampling class is the parent class for stratified sampling. The various TrueStratifiedSampling classes generate random samples from a specified probability distribution(s) using stratified sampling with strata specified by an object of one of the Strata classes.

The StratifiedSampling class is imported using the following command:

>>> from UQpy.sampling.stratified_sampling.refinement.StratifiedSampling import StratifiedSampling

Methods

class TrueStratifiedSampling(distributions, strata_object, nsamples_per_stratum=None, nsamples=None, random_state=None)[source]

Class for Stratified Sampling ([18]).

Parameters:
  • distributions (Union[DistributionContinuous1D, JointIndependent, list[DistributionContinuous1D]]) – List of Distribution objects corresponding to each random variable.

  • strata_object (Strata) – Defines the stratification of the unit hypercube. This must be provided and must be an object of a Strata child class: Rectangular, Voronoi, or Delaunay.

  • nsamples_per_stratum (Union[int, list[int], None]) – Specifies the number of samples in each stratum. This must be either an integer, in which case an equal number of samples are drawn from each stratum, or a list. If it is provided as a list, the length of the list must be equal to the number of strata. If nsamples_per_stratum is provided when the class is defined, the run() method will be executed automatically. If neither nsamples_per_stratum or nsamples are provided when the class is defined, the user must call the run() method to perform stratified sampling.

  • nsamples (Optional[int]) – Specify the total number of samples. If nsamples is specified, the samples will be drawn in proportion to the volume of the strata. Thus, each stratum will contain round(V_i* nsamples) samples. If nsamples is provided when the class is defined, the run() method will be executed automatically. If neither nsamples_per_stratum or nsamples are provided when the class is defined, the user must call the run() method to perform stratified sampling.

  • random_state (Union[None, int, RandomState]) – Random seed used to initialize the pseudo-random number generator. Default is None. If an int is provided, this sets the seed for an object of numpy.random.RandomState. Otherwise, the object itself can be passed directly.

transform_samples(samples01)[source]

Transform samples in the unit hypercube \([0, 1]^n\) to the prescribed distribution using the inverse CDF.

Parameters:

samples01numpy.ndarray containing the generated samples on \([0, 1]^n\).

Returns:

numpy.ndarray containing the generated samples following the prescribed distribution.

run(nsamples_per_stratum=None, nsamples=None)[source]

Executes stratified sampling.

This method performs the sampling for each of the child classes by running two methods: create_samplesu01(), and transform_samples(). The create_samplesu01() method is unique to each child class and therefore must be overwritten when a new child class is defined. The transform_samples() method is common to all stratified sampling classes and is therefore defined by the parent class. It does not need to be modified.

If nsamples or nsamples_per_stratum is provided when the class is defined, the run() method will be executed automatically. If neither nsamples_per_stratum or nsamples are provided when the class is defined, the user must call the run() method to perform stratified sampling.

Parameters:
  • nsamples_per_stratum (Union[None, int, list[int]]) – Specifies the number of samples in each stratum. This must be either an integer, in which case an equal number of samples are drawn from each stratum, or a list. If it is provided as a list, the length of the list must be equal to the number of strata. If nsamples_per_stratum is provided when the class is defined, the run() method will be executed automatically. If neither nsamples_per_stratum or nsamples are provided when the class is defined, the user must call the run() method to perform stratified sampling.

  • nsamples (Optional[int]) – Specify the total number of samples. If nsamples is specified, the samples will be drawn in proportion to the volume of the strata. Thus, each stratum will contain round(V_i*nsamples) samples where \(V_i \le 1\) is the volume of stratum i in the unit hypercube. If nsamples is provided when the class is defined, the run() method will be executed automatically. If neither nsamples_per_stratum or nsamples are provided when the class is defined, the user must call the run() method to perform stratified sampling.

Attributes

TrueStratifiedSampling.weights: ndarray

Individual sample weights.

TrueStratifiedSampling.samples: ndarray

The generated samples following the prescribed distribution.

TrueStratifiedSampling.samplesU01: ndarray

The generated samples on the unit hypercube.

Examples

UQpy supports several stratified sampling variations that vary from conventional stratified sampling designs to advanced gradient informed methods for adaptive stratified sampling. These class structures facilitate a highly flexible and varied range of stratified sampling designs that can be extended in a straightforward way. Specifically, the existing classes allow stratification of n-dimensional parameter spaces based on three common spatial discretizations: a rectilinear decomposition into hyper-rectangles (orthotopes), a Voronoi decomposition, and a Delaunay decomposition. This structure is based on three classes:

1. The Strata class defines the geometric structure of the stratification of the parameter space and it has three existing subclasses - Rectangular, Voronoi, and Delaunay that correspond to geometric decompositions of the parameter space based on rectilinear strata of orthotopes, strata composed of Voronoi cells, and strata composed of Delaunay simplexes respectively. These classes live in the UQpy.sampling.stratified_sampling.strata folder.

  1. The TrueStratifiedSampling class defines a set of subclasses used to draw samples from strata defined by a Strata class object.

  2. The RefinedStratifiedSampling class defines a set of subclasses for refinement of TrueStratifiedSampling stratified sampling designs.

Strata