Diffusion Maps

Diffusion Maps (Coifman and Lafon [3]) is a nonlinear dimension reduction technique used to learn (i.e., parametrize) a manifold from some data. Diffusion maps are based on the assumption that the data is represented in a high-dimensional space, while the points lie on or close to a low-dimensional manifold. The algorithm operates by defining a graph over the data. On this graph a random walk is defined with a Markov transition probability determined by a distance between data points. An eigendecomposition of the Markov transition probability matrix is used to obtain lower-dimensional coordinates that reveal the instrinsic structure of the data.

The DiffusionMaps class also implements the parsimonious Diffusion Maps representation from Dsilva et al. [4].

DiffusionMaps Class

The DiffusionMaps class is imported using the following command:

>>> from UQpy.dimension_reduction.diffusion_maps.DiffusionMaps import DiffusionMaps

One can use the following method to instantiate the DiffusionMaps class.

Methods

class DiffusionMaps(kernel_matrix=None, data=None, kernel=None, alpha=0.5, n_eigenvectors=2, is_sparse=False, n_neighbors=1, random_state=None, t=1)[source]
Parameters:
  • kernel_matrix (Optional[ndarray]) – Kernel matrix defining the similarity between the data points. Either kernel_matrix or both data and kernel parameters must be provided. In the former case, kernel_matrix is precomputed using a Kernel class. In the second case the kernel_matrix is internally and used for the evaluation of the DiffusionMaps. In case all three of the aforementioned parameters are provided, then DiffusionMaps will be fitted only using the kernel_matrix

  • data (Union[ndarray, list[GrassmannPoint], None]) – Set of data points. Either kernel_matrix or both data and kernel parameters must be provided.

  • kernel (Optional[Kernel]) – Kernel object used to compute the kernel matrix defining similarity between the data points. Either kernel_matrix or both data and kernel parameters must be provided.

  • alpha (Union[float, int]) – A scalar that corresponds to different diffusion operators. alpha should be between zero and one.

  • n_eigenvectors (int) – Number of eigenvectors to retain.

  • is_sparse (bool) – Work with sparse matrices to improve computational performance.

  • n_neighbors (int) – If is_sparse is True, defines the number of nearest neighbors to use when making matrices sparse.

  • random_state (Union[None, int, RandomState]) – Random seed used to initialize the pseudo-random number generator. If an int is provided, this sets the seed for an object of numpy.random.RandomState. Otherwise, the object itself can be passed directly.

  • t (int) – Time exponent.

parsimonious(dim)[source]

Selection of independent vectors for parsimonious data manifold embedding, based on local regression. The eigenvectors with the largest residuals are considered for the embedding. The scale of the kernel used for the local linear regression is:

scale = median(distances) / 3
Parameters:

dim (int) – Number of eigenvectors to select with largest residuals.

Attributes

DiffusionMaps.transition_matrix: ndarray

Markov Transition Probability Matrix.

DiffusionMaps.diffusion_coordinates: ndarray

Coordinates of the data in the diffusion space.

DiffusionMaps.eigenvectors: ndarray

Eigenvectors of the transition probability matrix.

DiffusionMaps.eigenvalues: ndarray

Eigenvalues of the transition probability matrix.

DiffusionMaps.parsimonious_indices

Indices of the most important eigenvectors. This attribute will only be populated if the parsimonious() method is invoked.

DiffusionMaps.parsimonious_residuals

Residuals calculated from the Parsimonious Representation. This attribute will only be populated if the parsimonious() method is invoked.

Examples