The clustering module

The clustering module contains the Clustering class.

Density-based clustering algorithms are implemented as methods of this class.

class clustering.Clustering(coordinates=None, distances=None, maxk=None, verbose=False, n_jobs=2)[source]

Perform clustering using various density-based clustering algorithms.

Inherits from the DensityEstimation class.

N_clusters

Number of clusters found

Type:

int

cluster_assignment

A list of length N containing the cluster assignment of each point as an integer from 0 to N_clusters-1.

Type:

list(int)

cluster_centers

Indices of the centroids of each cluster (density peak)

Type:

list(int)

cluster_indices

A list of lists. Each sublist contains the indices belonging to the corresponding cluster.

Type:

list(list(int))

log_den_bord

A matrix of dimensions N_clusters x N_clusters containing the estimated log density of the saddle point between each couple of peaks.

Type:

np.ndarray(float)

log_den_bord_err

A matrix of dimensions N_clusters x N_clusters containing the estimated error on the log density of the saddle point between each couple of peaks.

Type:

np.ndarray(float)

bord_indices

A matrix of dimensions N_clusters x N_clusters containing the indices of the saddle point between each couple of peaks.

Type:

np.ndarray(float)

compute_DecGraph()[source]

Compute the decision graph.

compute_clustering_ADP(Z=1.65, halo=False, v2=False)[source]

Compute clustering according to the algorithm DPA.

The only free parameter is the merging factor Z, which controls how the different density peaks are merged together. The higher the Z, the more aggressive the merging, the smaller the number of clusters. The calculation is optimized though cython

Parameters:
  • Z (float) – merging parameter

  • halo (bool) – compute (or not) the halo points

Returns:

cluster_assignment (np.ndarray(int)) – assignment of points to specific clusters

References

  1. d’Errico, E. Facco, A. Laio, A. Rodriguez, Automatic topography of high-dimensional data sets by

    non-parametric density peak clustering, Information Sciences 560 (2021) 476–492

compute_clustering_ADP_pure_python(Z=1.65, halo=False, v2=False)[source]

Compute ADP clustering, but without the cython optimization.

compute_clustering_DP(dens_cut=0.0, delta_cut=0.0, halo=False)[source]

Compute clustering using the Density Peak algorithm.

Parameters:
  • dens_cut (float) – cutoff on density values

  • delta_cut (float) – cutoff on distance values

  • halo (bool) – use or not halo points

Returns:

cluster_assignment (np.ndarray(int)) – assignment of points to specific clusters

References

A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science 344 (6191) (2014) 1492–1496.