Chatterjee indices
The Chatterjee index measures the strength of the relationship between \(X\) and \(Y\) using rank statistics [36].
Consider \(n\) samples of random variables \(X\) and \(Y\), with \((X_{(1)}, Y_{(1)}), \ldots,(X_{(n)}, Y_{(n)})\) such that \(X_{(1)} \leq \cdots \leq X_{(n)}\). Here, random variable \(X\) can be one of the inputs of a model and \(Y\) be the model response. If \(X_{i}\)’s have no ties, there is a unique way of doing this (case of ties is also taken into account in the implementation, see [36]). Let \(r_{i}`\) be the rank of \(Y_{(i)}\), that is, the number of \(j\) such that \(Y_{(j)} \leq Y_{(i)}\).The Chatterjee index \(\xi_{n}(X, Y)\) is defined as:
The Chatterjee index converges for \(n \rightarrow \infty\) to the Cramér-von Mises index and is faster to estimate than using the Pick and Freeze approach to compute the the Cramér-von Mises index.
Furthermore, the Sobol indices can be efficiently estimated by leveraging the same rank statistics, which has the advantage that any sample can be used and no specific pick and freeze scheme is required.
Chatterjee Class
The ChatterjeeSensitivity
class is imported using the following command:
>>> from UQpy.sensitivity.ChatterjeeSensitivity import ChatterjeeSensitivity
Methods
- class ChatterjeeSensitivity(runmodel_object, dist_object, random_state=None)[source]
Compute sensitivity indices using the Chatterjee correlation coefficient.
Using the same model evaluations, we can also estimate the Sobol indices.
- Parameters:
runmodel_object – The computational model. It should be of type
RunModel
. The output QoI can be a scalar or vector of lengthny
, then the sensitivity indices of allny
outputs are computed independently.distributions – List of
Distribution
objects corresponding to each random variable, orJointIndependent
object (multivariate RV with independent marginals).random_state – Random seed used to initialize the pseudo-random number generator. Default is
None
.
Methods:
- run(n_samples=1000, estimate_sobol_indices=False, n_bootstrap_samples=None, confidence_level=0.95)[source]
Compute the sensitivity indices using the Chatterjee method. Employing the
run
method will initializen_samples
simulations usingRunModel
. To compute sensitivity indices using pre-computed inputs and outputs, use the static methods described below.- Parameters:
n_samples (
int
) – Number of samples used to compute the Chatterjee indices. Default is 1,000.estimate_sobol_indices (
bool
) – IfTrue
, the Sobol indices are estimated using the pick-and-freeze samples.n_bootstrap_samples (
Optional
[int
]) – Number of bootstrap samples used to estimate the Sobol indices. Default isNone
.confidence_level (
float
) – Confidence level used to compute the confidence intervals of the Cramér-von Mises indices.
- static compute_chatterjee_indices(X, Y, seed=None)[source]
Compute the Chatterjee sensitivity indices between the input random vectors \(X=\left[ X_{1}, X_{2},…,X_{d} \right]\) and output random vector Y.
- Parameters:
X (
ndarray
) – Input random vectors,numpy.ndarray
of shape(n_samples, n_variables)
Y (
ndarray
) – Output random vector,numpy.ndarray
of shape(n_samples, 1)
seed (
Union
[None
,int
,RandomState
]) – Seed for the random number generator.
- Returns:
Chatterjee sensitivity indices,
numpy.ndarray
of shape(n_variables, 1)
- static rank_analog_to_pickfreeze(X, j)[source]
Computing the \(N(j)\) for each \(j \in \{1, \ldots, n\}\) as in eq.(8) in [37], where \(n\) is the size of \(X\).
\begin{equation} N(j):= \begin{cases} \pi^{-1}(\pi(j)+1) &\text { if } \pi(j)+1 \leqslant n \\ \pi^{-1}(1) &\text { if } \pi(j)=n \end{cases} \end{equation}where, \(\pi(j) := \mathrm{rank}(x_j)\)
- Parameters:
X (
ndarray
) – Input random vector,numpy.ndarray
of shape(n_samples, 1)
j (
Integral
) – Index of the sample \(j \in \{1, \ldots, n\}\)
- Returns:
\(N(j)\)
int
- static rank_analog_to_pickfreeze_vec(X)[source]
Computing the \(N(j)\) for each \(j \in \{1, \ldots, n\}\) in a vectorized manner., where \(n\) is the size of \(X\).
This method is significantly faster than the looping version
rank_analog_to_pickfreeze
but is also more complicated.\begin{equation} N(j):= \begin{cases} \pi^{-1}(\pi(j)+1) &\text { if } \pi(j)+1 \leqslant n \\ \pi^{-1}(1) &\text { if } \pi(j)=n \end{cases} \end{equation}where, \(\pi(j) := \mathrm{rank}(x_j)\)
Key idea: \(\pi^{-1}\) is rank_X.argsort() ( see also)
Example: X = [22, 74, 44, 11, 1]
N_J = [3, 5, 2, 1, 4] (1-based indexing)
N_J = [2, 4, 1, 0, 3] (0-based indexing)
- Parameters:
X (
ndarray
) – Input random vector,numpy.ndarray
of shape(n_samples, 1)
- Returns:
\(N(j)\),
numpy.ndarray
of shape(n_samples, 1)
- static compute_Sobol_indices(A_model_evals, C_i_model_evals)[source]
A method to estimate the first order Sobol indices using the Chatterjee method.
\begin{equation} \xi_{n}^{\mathrm{Sobol}}\left(X_{1}, Y\right):= \frac{\frac{1}{n} \sum_{j=1}^{n} Y_{j} Y_{N(j)}-\left(\frac{1}{n} \sum_{j=1}^{n} Y_{j}\right)^{2}} {\frac{1}{n} \sum_{j=1}^{n}\left(Y_{j}\right)^{2}-\left(\frac{1}{n} \sum_{j=1}^{n} Y_{j}\right)^{2}} \end{equation}where the term \(Y_{N(j)}\) is computed using the method:
rank_analog_to_pickfreeze_vec
.- Parameters:
A_model_evals (
ndarray
) – Model evaluations,numpy.ndarray
of shape(n_samples, 1)
C_i_model_evals (
ndarray
) – Model evaluations,numpy.ndarray
of shape(n_samples, n_variables)
- Returns:
First order Sobol indices,
numpy.ndarray
of shape(n_variables, 1)
Attributes
- ChatterjeeSensitivity.first_order_chatterjee_indices
Chatterjee sensitivity indices (First order),
numpy.ndarray
of shape(n_variables, 1)
- ChatterjeeSensitivity.first_order_sobol_indices
Sobol indices computed using the rank statistics,
numpy.ndarray
of shape(n_variables, 1)
- ChatterjeeSensitivity.confidence_interval_chatterjee
Confidence intervals for the Chatterjee sensitivity indices,
numpy.ndarray
of shape(n_variables, 2)