The density_estimation module

The density_estimation module contains the DensityEstimation class.

The different algorithms of density estimation are implemented as methods of this class.

class density_estimation.DensityEstimation(coordinates=None, distances=None, maxk=None, verbose=False, njobs=2)[source]

Computes the log-density and its error at each point and other properties.

Inherits from class IdEstimation. Can estimate the optimal number k* of neighbors for each points. Can compute the log-density and its error at each point choosing among various kNN-based methods.

kstar

array containing the chosen number k* in the neighbourhood of each of the N points

Type:

np.array(float)

dc

array containing the distance of the k*th neighbor from each of the N points

Type:

np.array(float), optional

log_den

array containing the N log-densities

Type:

np.array(float), optional

log_den_err

array containing the N errors on the log_den

Type:

np.array(float), optional

compute_density_PAk(Dthr=23.92812698, optimized=True, bias=False)[source]

Compute the density of each point using the PAk estimator.

Parameters:

Dthr (float) – Likelihood ratio parameter used to compute optimal k, the value of Dthr=23.92 corresponds to a p-value of 1e-6.

Returns:
  • log_den (np.ndarray(float)) – estimated log density

  • log_den_err (np.ndarray(float)) – estimated error on log density

compute_density_kNN(k=10, bias=False)[source]

Compute the density of each point using a simple kNN estimator.

Parameters:

k (int) – number of neighbours used to compute the density

Returns:
  • log_den (np.ndarray(float)) – estimated log density

  • log_den_err (np.ndarray(float)) – estimated error on log density

compute_density_kpeaks(Dthr=23.92812698)[source]

Compute the density of each point as proportional to the optimal k value found for that point.

This method is mostly useful for the kpeaks clustering algorithm.

Parameters:

Dthr – Likelihood ratio parameter used to compute optimal k, the value of Dthr=23.92 corresponds to a p-value of 1e-6.

Returns:
  • log_den (np.ndarray(float)) – estimated log density

  • log_den_err (np.ndarray(float)) – estimated error on log density

compute_density_kstarNN(Dthr=23.92812698, bias=False)[source]

Compute the density of each point using a simple kNN estimator with an optimal choice of k.

Parameters:

Dthr (float) – Likelihood ratio parameter used to compute optimal k, the value of Dthr=23.92 corresponds to a p-value of 1e-6.

Returns:
  • log_den (np.ndarray(float)) – estimated log density

  • log_den_err (np.ndarray(float)) – estimated error on log density

compute_kstar(Dthr=23.92812698)[source]

Compute an optimal choice of k for each point.

Parameters:

Dthr (float) – Likelihood ratio parameter used to compute optimal k, the value of Dthr=23.92 corresponds to a p-value of 1e-6.

return_entropy()[source]

Compute a very rough estimate of the entropy of the data distribution.

The cimputation simply returns the average negative log probability estimates.

Returns:

H (float) – the estimate entropy of the distribution

return_interpolated_density_PAk(X_new, Dthr=23.92812698)[source]

Return the PAk density of the primary dataset, evaluated on a new set of points “X_new”.

Parameters:
  • X_new (np.ndarray(float)) – The points onto which the density should be computed

  • Dthr – Likelihood ratio parameter used to compute optimal k

Returns:
  • log_den (np.ndarray(float)) – log density of dataset evaluated on X_new

  • log_den_err (np.ndarray(float)) – error on log density estimates

return_interpolated_density_kNN(X_new, k)[source]

Return the kNN density of the primary dataset, evaluated on a new set of points “X_new”.

Parameters:
  • X_new (np.ndarray(float)) – The points onto which the density should be computed

  • k (int) – the number of neighbours considered for the kNN estimator

Returns:
  • log_den (np.ndarray(float)) – log density of dataset evaluated on X_new

  • log_den_err (np.ndarray(float)) – error on log density estimates

return_interpolated_density_kstarNN(X_new, Dthr=23.92812698)[source]

Return the kstarNN density of the primary dataset, evaluated on a new set of points “X_new”.

Parameters:
  • X_new (np.ndarray(float)) – The points onto which the density should be computed

  • Dthr – Likelihood ratio parameter used to compute optimal k

Returns:
  • log_den (np.ndarray(float)) – log density of dataset evaluated on X_new

  • log_den_err (np.ndarray(float)) – error on log density estimates

set_kstar(k=0)[source]

Set all elements of kstar to a fixed value k.

Reset all other class attributes (all depending on kstar).

Parameters:

k – number of neighbours used to compute the density it can be an iteger or an array of integers