Chatterjee indices

The Chatterjee index measures the strength of the relationship between \(X\) and \(Y\) using rank statistics [36].

Consider \(n\) samples of random variables \(X\) and \(Y\), with \((X_{(1)}, Y_{(1)}), \ldots,(X_{(n)}, Y_{(n)})\) such that \(X_{(1)} \leq \cdots \leq X_{(n)}\). Here, random variable \(X\) can be one of the inputs of a model and \(Y\) be the model response. If \(X_{i}\)’s have no ties, there is a unique way of doing this (case of ties is also taken into account in the implementation, see [36]). Let \(r_{i}`\) be the rank of \(Y_{(i)}\), that is, the number of \(j\) such that \(Y_{(j)} \leq Y_{(i)}\).The Chatterjee index \(\xi_{n}(X, Y)\) is defined as:

\[\xi_{n}(X, Y):=1-\frac{3 \sum_{i=1}^{n-1}\left|r_{i+1}-r_{i}\right|}{n^{2}-1}\]

The Chatterjee index converges for \(n \rightarrow \infty\) to the Cramér-von Mises index and is faster to estimate than using the Pick and Freeze approach to compute the the Cramér-von Mises index.

Furthermore, the Sobol indices can be efficiently estimated by leveraging the same rank statistics, which has the advantage that any sample can be used and no specific pick and freeze scheme is required.

Chatterjee Class

The ChatterjeeSensitivity class is imported using the following command:

>>> from UQpy.sensitivity.ChatterjeeSensitivity import ChatterjeeSensitivity

Methods

class ChatterjeeSensitivity(runmodel_object, dist_object, random_state=None)[source]

Compute sensitivity indices using the Chatterjee correlation coefficient.

Using the same model evaluations, we can also estimate the Sobol indices.

Parameters:

runmodel_object – The computational model. It should be of type RunModel. The output QoI can be a scalar or vector of length ny, then the sensitivity indices of all ny outputs are computed independently.
distributions – List of Distribution objects corresponding to each random variable, or JointIndependent object (multivariate RV with independent marginals).
random_state – Random seed used to initialize the pseudo-random number generator. Default is None.

Methods:

run(n_samples=1000, estimate_sobol_indices=False, n_bootstrap_samples=None, confidence_level=0.95)[source]

Compute the sensitivity indices using the Chatterjee method. Employing the run method will initialize n_samples simulations using RunModel. To compute sensitivity indices using pre-computed inputs and outputs, use the static methods described below.

Parameters:

n_samples (int) – Number of samples used to compute the Chatterjee indices. Default is 1,000.
estimate_sobol_indices (bool) – If True, the Sobol indices are estimated using the pick-and-freeze samples.
n_bootstrap_samples (Optional[int]) – Number of bootstrap samples used to estimate the Sobol indices. Default is None.
confidence_level (float) – Confidence level used to compute the confidence intervals of the Cramér-von Mises indices.

static compute_chatterjee_indices(X, Y, seed=None)[source]

Compute the Chatterjee sensitivity indices between the input random vectors \(X=\left[ X_{1}, X_{2},…,X_{d} \right]\) and output random vector Y.

Parameters:

X (ndarray) – Input random vectors, numpy.ndarray of shape (n_samples, n_variables)
Y (ndarray) – Output random vector, numpy.ndarray of shape (n_samples, 1)
seed (Union[None, int, RandomState]) – Seed for the random number generator.

Returns:

Chatterjee sensitivity indices, numpy.ndarray of shape (n_variables, 1)

static rank_analog_to_pickfreeze(X, j)[source]

Computing the \(N(j)\) for each \(j \in \{1, \ldots, n\}\) as in eq.(8) in [37], where \(n\) is the size of \(X\).

\begin{equation} N(j):= \begin{cases} \pi^{-1}(\pi(j)+1) &\text { if } \pi(j)+1 \leqslant n \\ \pi^{-1}(1) &\text { if } \pi(j)=n \end{cases} \end{equation}

where, \(\pi(j) := \mathrm{rank}(x_j)\)

Parameters:

X (ndarray) – Input random vector, numpy.ndarray of shape (n_samples, 1)
j (Integral) – Index of the sample \(j \in \{1, \ldots, n\}\)

Returns:

\(N(j)\) int

static rank_analog_to_pickfreeze_vec(X)[source]

Computing the \(N(j)\) for each \(j \in \{1, \ldots, n\}\) in a vectorized manner., where \(n\) is the size of \(X\).

This method is significantly faster than the looping version rank_analog_to_pickfreeze but is also more complicated.

\begin{equation} N(j):= \begin{cases} \pi^{-1}(\pi(j)+1) &\text { if } \pi(j)+1 \leqslant n \\ \pi^{-1}(1) &\text { if } \pi(j)=n \end{cases} \end{equation}

where, \(\pi(j) := \mathrm{rank}(x_j)\)

Key idea: \(\pi^{-1}\) is rank_X.argsort() ( see also)

Example: X = [22, 74, 44, 11, 1]

N_J = [3, 5, 2, 1, 4] (1-based indexing)

N_J = [2, 4, 1, 0, 3] (0-based indexing)

Parameters:: X (ndarray) – Input random vector, numpy.ndarray of shape (n_samples, 1)
Returns:: \(N(j)\), numpy.ndarray of shape (n_samples, 1)

static compute_Sobol_indices(A_model_evals, C_i_model_evals)[source]

A method to estimate the first order Sobol indices using the Chatterjee method.

\begin{equation} \xi_{n}^{\mathrm{Sobol}}\left(X_{1}, Y\right):= \frac{\frac{1}{n} \sum_{j=1}^{n} Y_{j} Y_{N(j)}-\left(\frac{1}{n} \sum_{j=1}^{n} Y_{j}\right)^{2}} {\frac{1}{n} \sum_{j=1}^{n}\left(Y_{j}\right)^{2}-\left(\frac{1}{n} \sum_{j=1}^{n} Y_{j}\right)^{2}} \end{equation}

where the term \(Y_{N(j)}\) is computed using the method:rank_analog_to_pickfreeze_vec.

Parameters:

A_model_evals (ndarray) – Model evaluations, numpy.ndarray of shape (n_samples, 1)
C_i_model_evals (ndarray) – Model evaluations, numpy.ndarray of shape (n_samples, n_variables)

Returns:

First order Sobol indices, numpy.ndarray of shape (n_variables, 1)

Attributes

ChatterjeeSensitivity.first_order_chatterjee_indices: Chatterjee sensitivity indices (First order), numpy.ndarray of shape (n_variables, 1)

ChatterjeeSensitivity.first_order_sobol_indices: Sobol indices computed using the rank statistics, numpy.ndarray of shape (n_variables, 1)

ChatterjeeSensitivity.confidence_interval_chatterjee: Confidence intervals for the Chatterjee sensitivity indices, numpy.ndarray of shape (n_variables, 2)

ChatterjeeSensitivity.n_variables: Number of input random variables, int

ChatterjeeSensitivity.n_samples: Number of samples used to estimate the sensitivity indices, int

Examples

Chatterjee Examples