The clustering module
The clustering module contains the Clustering class.
Density-based clustering algorithms are implemented as methods of this class.
- class clustering.Clustering(coordinates=None, distances=None, maxk=None, verbose=False, n_jobs=2)[source]
Perform clustering using various density-based clustering algorithms.
Inherits from the DensityEstimation class.
- N_clusters
Number of clusters found
- Type:
int
- cluster_assignment
A list of length N containing the cluster assignment of each point as an integer from 0 to N_clusters-1.
- Type:
list(int)
- cluster_centers
Indices of the centroids of each cluster (density peak)
- Type:
list(int)
- cluster_indices
A list of lists. Each sublist contains the indices belonging to the corresponding cluster.
- Type:
list(list(int))
- log_den_bord
A matrix of dimensions N_clusters x N_clusters containing the estimated log density of the saddle point between each couple of peaks.
- Type:
np.ndarray(float)
- log_den_bord_err
A matrix of dimensions N_clusters x N_clusters containing the estimated error on the log density of the saddle point between each couple of peaks.
- Type:
np.ndarray(float)
- bord_indices
A matrix of dimensions N_clusters x N_clusters containing the indices of the saddle point between each couple of peaks.
- Type:
np.ndarray(float)
- compute_clustering_ADP(Z=1.65, halo=False, v2=False)[source]
Compute clustering according to the algorithm DPA.
The only free parameter is the merging factor Z, which controls how the different density peaks are merged together. The higher the Z, the more aggressive the merging, the smaller the number of clusters. The calculation is optimized though cython
- Parameters:
Z (float) – merging parameter
halo (bool) – compute (or not) the halo points
- Returns:
cluster_assignment (np.ndarray(int)) – assignment of points to specific clusters
References
- d’Errico, E. Facco, A. Laio, A. Rodriguez, Automatic topography of high-dimensional data sets by
non-parametric density peak clustering, Information Sciences 560 (2021) 476–492
- compute_clustering_ADP_pure_python(Z=1.65, halo=False, v2=False)[source]
Compute ADP clustering, but without the cython optimization.
- compute_clustering_DP(dens_cut=0.0, delta_cut=0.0, halo=False)[source]
Compute clustering using the Density Peak algorithm.
- Parameters:
dens_cut (float) – cutoff on density values
delta_cut (float) – cutoff on distance values
halo (bool) – use or not halo points
- Returns:
cluster_assignment (np.ndarray(int)) – assignment of points to specific clusters
References
A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science 344 (6191) (2014) 1492–1496.