The density_estimation module
The density_estimation module contains the DensityEstimation class.
The different algorithms of density estimation are implemented as methods of this class.
- class density_estimation.DensityEstimation(coordinates=None, distances=None, maxk=None, verbose=False, n_jobs=2)[source]
Computes the log-density and its error at each point and other properties.
Inherits from class KStar. Can compute the log-density and its error at each point choosing among various kNN-based methods.
- log_den
array containing the N log-densities
- Type:
np.array(float), optional
- log_den_err
array containing the N errors on the log_den
- Type:
np.array(float), optional
- compute_density_PAk(Dthr=23.92812698, optimized=True)[source]
Compute the density of each point using the PAk estimator.
- Parameters:
Dthr (float) – Likelihood ratio parameter used to compute optimal k, the value of Dthr=23.92 corresponds to a p-value of 1e-6.
- Returns:
log_den (np.ndarray(float)) – estimated log density
log_den_err (np.ndarray(float)) – estimated error on log density
- compute_density_kNN(k=10, bias=False)[source]
Compute the density of each point using a simple kNN estimator.
- Parameters:
k (int) – number of neighbours used to compute the density
- Returns:
log_den (np.ndarray(float)) – estimated log density
log_den_err (np.ndarray(float)) – estimated error on log density
- compute_density_kpeaks(Dthr=23.92812698)[source]
Compute the density of each point as proportional to the optimal k value found for that point.
This method is mostly useful for the kpeaks clustering algorithm.
- Parameters:
Dthr – Likelihood ratio parameter used to compute optimal k, the value of Dthr=23.92 corresponds to a p-value of 1e-6.
- Returns:
log_den (np.ndarray(float)) – estimated log density
log_den_err (np.ndarray(float)) – estimated error on log density
- compute_density_kstarNN(Dthr=23.92812698, bias=False)[source]
Compute the density of each point using a simple kNN estimator with an optimal choice of k.
- Parameters:
Dthr (float) – Likelihood ratio parameter used to compute optimal k, the value of Dthr=23.92 corresponds to a p-value of 1e-6.
- Returns:
log_den (np.ndarray(float)) – estimated log density
log_den_err (np.ndarray(float)) – estimated error on log density
- return_entropy()[source]
Compute a very rough estimate of the sample Shannon entropy of the data distribution.
The computation simply returns the average negative log probability estimates.
- Returns:
H (float) – the estimated entropy of the distribution
- return_interpolated_density_PAk(X_new, Dthr=23.92812698)[source]
Return the PAk density of the primary dataset, evaluated on a new set of points “X_new”.
- Parameters:
X_new (np.ndarray(float)) – The points onto which the density should be computed
Dthr – Likelihood ratio parameter used to compute optimal k
- Returns:
log_den (np.ndarray(float)) – log density of dataset evaluated on X_new
log_den_err (np.ndarray(float)) – error on log density estimates
- return_interpolated_density_kNN(X_new, k)[source]
Return the kNN density of the primary dataset, evaluated on a new set of points “X_new”.
- Parameters:
X_new (np.ndarray(float)) – The points onto which the density should be computed
k (int) – the number of neighbours considered for the kNN estimator
- Returns:
log_den (np.ndarray(float)) – log density of dataset evaluated on X_new
log_den_err (np.ndarray(float)) – error on log density estimates
- return_interpolated_density_kstarNN(X_new, Dthr=23.92812698)[source]
Return the kstarNN density of the primary dataset, evaluated on a new set of points “X_new”.
- Parameters:
X_new (np.ndarray(float)) – The points onto which the density should be computed
Dthr – Likelihood ratio parameter used to compute optimal k
- Returns:
log_den (np.ndarray(float)) – log density of dataset evaluated on X_new
log_den_err (np.ndarray(float)) – error on log density estimates
- set_kstar(k=0)[source]
Set all elements of kstar to a fixed value k.
Overload the set_kstar method from the superior class. First, call the set_kstar from the superior class. Then also reset all other DensityEstimation attributes depending on kstar to None.
- Parameters:
k – number of neighbours used to compute the density. It can be an iteger or an array of integers