Diffusion Maps
Diffusion Maps (Coifman and Lafon [3]) is a nonlinear dimension reduction technique used to learn (i.e., parametrize) a manifold from some data. Diffusion maps are based on the assumption that the data is represented in a high-dimensional space, while the points lie on or close to a low-dimensional manifold. The algorithm operates by defining a graph over the data. On this graph a random walk is defined with a Markov transition probability determined by a distance between data points. An eigendecomposition of the Markov transition probability matrix is used to obtain lower-dimensional coordinates that reveal the instrinsic structure of the data.
The DiffusionMaps
class also implements the parsimonious Diffusion Maps representation from
Dsilva et al. [4].
DiffusionMaps Class
The DiffusionMaps
class is imported using the following command:
>>> from UQpy.dimension_reduction.diffusion_maps.DiffusionMaps import DiffusionMaps
One can use the following method to instantiate the DiffusionMaps
class.
Methods
- class DiffusionMaps(kernel_matrix=None, data=None, kernel=None, alpha=0.5, n_eigenvectors=2, is_sparse=False, n_neighbors=1, random_state=None, t=1)[source]
- Parameters:
kernel_matrix (
Optional
[ndarray
]) – Kernel matrix defining the similarity between the data points. Either kernel_matrix or both data and kernel parameters must be provided. In the former case, kernel_matrix is precomputed using aKernel
class. In the second case the kernel_matrix is internally and used for the evaluation of theDiffusionMaps
. In case all three of the aforementioned parameters are provided, thenDiffusionMaps
will be fitted only using the kernel_matrixdata (
Union
[ndarray
,list
[GrassmannPoint
],None
]) – Set of data points. Either kernel_matrix or both data and kernel parameters must be provided.kernel (
Optional
[Kernel
]) – Kernel object used to compute the kernel matrix defining similarity between the data points. Either kernel_matrix or both data and kernel parameters must be provided.alpha (
Union
[float
,int
]) – A scalar that corresponds to different diffusion operators. alpha should be between zero and one.n_eigenvectors (
int
) – Number of eigenvectors to retain.is_sparse (
bool
) – Work with sparse matrices to improve computational performance.n_neighbors (
int
) – Ifis_sparse is True
, defines the number of nearest neighbors to use when making matrices sparse.random_state (
Union
[None
,int
,RandomState
]) – Random seed used to initialize the pseudo-random number generator. If anint
is provided, this sets the seed for an object ofnumpy.random.RandomState
. Otherwise, the object itself can be passed directly.t (
int
) – Time exponent.
- parsimonious(dim)[source]
Selection of independent vectors for parsimonious data manifold embedding, based on local regression. The eigenvectors with the largest residuals are considered for the embedding. The scale of the kernel used for the local linear regression is:
scale = median(distances) / 3
- Parameters:
dim (
int
) – Number of eigenvectors to select with largest residuals.
Attributes
- DiffusionMaps.parsimonious_indices
Indices of the most important eigenvectors. This attribute will only be populated if the
parsimonious()
method is invoked.
- DiffusionMaps.parsimonious_residuals
Residuals calculated from the Parsimonious Representation. This attribute will only be populated if the
parsimonious()
method is invoked.