The clustering module

The clustering module contains the Clustering class.

Density-based clustering algorithms are implemented as methods of this class.

class clustering.Clustering(coordinates=None, distances=None, maxk=None, verbose=False, n_jobs=2)[source]

Perform clustering using various density-based clustering algorithms.

Inherits from the DensityEstimation class.

N_clusters

Number of clusters found

Type:: int

cluster_assignment

A list of length N containing the cluster assignment of each point as an integer from 0 to N_clusters-1.

Type:: list(int)

cluster_centers

Indices of the centroids of each cluster (density peak)

Type:: list(int)

cluster_indices

A list of lists. Each sublist contains the indices belonging to the corresponding cluster.

Type:: list(list(int))

log_den_bord

A matrix of dimensions N_clusters x N_clusters containing the estimated log density of the saddle point between each couple of peaks.

Type:: np.ndarray(float)

log_den_bord_err

A matrix of dimensions N_clusters x N_clusters containing the estimated error on the log density of the saddle point between each couple of peaks.

Type:: np.ndarray(float)

bord_indices

A matrix of dimensions N_clusters x N_clusters containing the indices of the saddle point between each couple of peaks.

Type:: np.ndarray(float)

compute_DecGraph()[source]: Compute the decision graph.

compute_clustering_ADP(Z=1.65, halo=False, v2=False)[source]

Compute clustering according to the algorithm DPA.

The only free parameter is the merging factor Z, which controls how the different density peaks are merged together. The higher the Z, the more aggressive the merging, the smaller the number of clusters. The calculation is optimized though cython

Parameters:

Z (float) – merging parameter
halo (bool) – compute (or not) the halo points

Returns:

cluster_assignment (np.ndarray(int)) – assignment of points to specific clusters

References

d’Errico, E. Facco, A. Laio, A. Rodriguez, Automatic topography of high-dimensional data sets by
non-parametric density peak clustering, Information Sciences 560 (2021) 476–492

compute_clustering_ADP_pure_python(Z=1.65, halo=False, v2=False)[source]: Compute ADP clustering, but without the cython optimization.

compute_clustering_DP(dens_cut=0.0, delta_cut=0.0, halo=False)[source]

Compute clustering using the Density Peak algorithm.

Parameters:

dens_cut (float) – cutoff on density values
delta_cut (float) – cutoff on distance values
halo (bool) – use or not halo points

Returns:

cluster_assignment (np.ndarray(int)) – assignment of points to specific clusters

References

A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science 344 (6191) (2014) 1492–1496.