The density_advanced module

The density_advanced module contains the DensityEstimation class.

Different algorithms to estimate the logdensity, the logdensity gradientest and the logdensity differences are implemented as methods of this class. In particular, differently from the methods implemented in the DensityEstimation, the methods in the DensityAdvanced class are based on the sparse neighbourhood graph structure which is implemented in the NeighGraph class.

class density_advanced.DensityAdvanced(coordinates=None, distances=None, maxk=None, verbose=False, n_jobs=2)[source]

Computes the log-density gradient and its covariance at each point and other log-density-related properties.

Can return an estimate of the gradient of the log-density at each point and an estimate of the error on each component. Can return an estimate of log-density differences and their error each point based on the gradient estimates. Can compute the log-density and its error at each point using BMTI, i.e. integrating the log-density differences on the neighbourhood graph

grads

size N. Contains the gradient components estimated at each point i

Type:

np.ndarray(float), optional

grads_var

size N x dims. For each line i contains the estimated variance of the gradient components at point i

Type:

np.ndarray(float), optional

grads_covmat

size N x dims x dims. For each line i contains the estimated covariance matrix of the gradient components at point i

Type:

np.ndarray(float), optional

pearson_array

size nspar. At position p corresponding to the directed edge (i,j) of the neighbourhood graph, it contains an estimate of the Pearson correlation coefficient between the directed deltaFij computed with the gradients in i and in j, namely between dot(g_i,(x_j-x_i)) and dot(g_j,(x_j-x_i)).

Type:

np.ndarray(float), optional

Fij_array

size nspar. Stores for each couple in nind_list the estimates of deltaF_ij computed from point i as semisum of the gradients in i and minus the gradient in j

Type:

list(np.array(float)), optional

Fij_var_array

size nspar. Stores for each couple in nind_list the estimates of the squared errors on the values in Fij_array

Type:

np.array(float), optional

inv_deltaFs_cov

size nspar. Stores for each couple in nind_list the estimates of the inverse cross-covariance of the deltaFs, that is: cov [ deltaFij , deltaFlm ] .

Type:

np.array(float), optional

compute_deltaFs(similarity_method='jaccard', comp_p_mat=False)[source]
Compute deviations deltaFij to standard kNN log-densities at point j as seen from point i using

a linear expansion with as slope the semisum of the average gradient of the log-density over the neighbourhood of points i and j.

If not defined, compute the Pearson coefficients p (see docs for pearson_array) by running compute_pearson. Then use these p in the estimate of the variances on the deltaFij as 1/4*(E_i^2+E_j^2+2*E_i*E_j*chi), where E_i is the error on the estimate of grad_i*DeltaX_ij (see [Carli2024]). The log-density differences are stored Fij_array, their variances in Fij_array_var.

Parameters:
  • similarity_method – see docs for neigh_graph.compute_neigh_similarity_index function

  • comp_p_mat – see docs for compute_pearson function

compute_density_BMTI(delta_F_inv_cov='uncorr', comp_log_den_err=False, solver='sp_direct', sp_direct_perm_spec='NATURAL', alpha=1, log_den=None, log_den_err=None)[source]

Compute the log-density for each point using BMTI.

If alpha<1, the algorithm also includes a regularisatin. The regulariser log-density and its errors can be passed as arguments: log_den and log_den_err. If any of these two is not specified, use kstarNN estimator as a regulariser.

Parameters:
  • delta_F_inv_cov (str) –

    specify the method used to invert the cross-covariance matrix C of the log-density deviations cov[deltaF_ij,deltaF_kl]. Currently implemented methods:

    ”uncorr” (default): all the deltaFs are assumed uncorrelated, i.e. C is assumed to be diagonal with

    diagonal = Fij_var_array

    ”identity”: C is assumed as the identity matrix, so that all terms in the BMTI likelihood are taken

    unweighted (variance of deltaF_ij = 1 for all (i,j) couples)

    ”LSDI”: (Least Squares with respect to a Diagonal Inverse). Invert the cross-covariance C by

    finding the approximate diagonal inverse which multiplied by C gives the least-squares closest matrix to the identity in the Frobenius norm

  • comp_log_den_err (bool) – if True, compute the error on the BMTI estimates. Can be highly time consuming

  • solver (str) –

    specify the solver to use when solving the BMSTI linear system. Three sparse (memory efficient) and a dense solvers are implemented:

    ’sp_direct’ (default): scipy.sparse.linalg.spsolve. Performs a LU decomposition of the matrix and

    then solves the linear system directly. More robust but less memory efficient than other implemented sparse solvers. Slower than iterative solvers for very sparse and large matrices.

    ’sp_cg’: scipy.sparse.linalg.cg. This is the iterative conjugate gradient method. It might be

    preferred to ‘direct’ for large and sparse matrices. If a log-density estimate is alredy stored in self.log_den, it will be used as a guess for the solution for a great spedup. If this option is chosen, we suggest you call compute_density_kstarNN() right before computing BMTI.

    ’sp_cg_precond’: same as ‘cg’, scipy.sparse.linalg.cg, but with a preconditioner estimated

    unsuperivisedly with a partial LU decomposition (scipy.sparse.linalg.spilu) of the matrix. In settings where ‘direct’ performs better than ‘cg’, ‘cg_precond’ is likely to perform better than ‘spolve’ and ‘cg’. If ‘cg’ already performs better than ‘direct’, ‘cg_precond’ is likely to perform worse than ‘cg’ alone.

    ’dense’: numpy.linalg.solve. Direct solver for dense matrices. O(N^3) complexity, O(N^2) memory

    complexity. The solver automatically uses multiprocessing if available. This option is suited for small datasets or when memory and cores are not an issue.

  • sp_direct_perm_spec (str) – specify the permutation strategy to use when solving the linear system with the ‘sp_direct’ solver. See the scipy.sparse.linalg.spsolve documentation for more information.

  • alpha (float) – can take values from 0.0 to 1.0. Indicates the portion of BMTI in the sum of the likelihoods alpha*L_BMTI + (1-alpha)*L_kstarNN. Setting alpha=1.0 corresponds to not reguarising BMTI.

  • log_den (np.ndarray(float)) – size N. The array of the log-densities of the regulariser.

  • log_den_err (np.ndarray(float)) – size N. The array of the log-density errors of the regulariser.

compute_diag_inv_deltaFs_cross_covariance_LSDI(similarity_method='jaccard')[source]

Compute the diagonal of the appoximate inverse of the deltaFs cross-covariance cov[deltaFij,deltaFlm] using the LSDI approximation (see compute_density_BMTI docs)

Parameters:

similarity_method – see docs for neigh_graph.compute_neigh_similarity_index function

compute_grads(comp_covmat=False)[source]

Compute the gradient of the log density each point using kstar nearest neighbors and store

Estimate the gradient using an improved version of the mean-shift gradient algorithm [Fukunaga1975] as presented in [Carli2024]. Store the computed gradients in grads. Also compute the variance of the gradient and store it in grads_var. Optionally, the whole covariance matrix can be estimated for gradient

Parameters:
  • comp_covmat (bool) – if True, the whole covariance matrix is computed for each gradient and stored in

  • grads_covmat

compute_pearson(similarity_method='jaccard')[source]

Compute, for any couple (i,j) of points connected on the directed neighbourhood graph, an estimate of the Pearson correlation coefficient between the directed deltaFij computed with the gradients in i and in j, namely between dot(g_i,(x_j-x_i)) and dot(g_j,(x_j-x_i)). These are needed in order to compute the errors on the deltaFs. They are estimated as the neighbourhood similarity index (see documentation for compute_neigh_similarity_index) times the sign of the product of the two directed deltaFijs. The Pearson coefficients take values between -1 and 1 and are stored in the pearson_array attribute.

Parameters:

similarity_method (str) – similarity_method to compute the neighbourhood similarity index (see documentation for compute_neigh_similarity_index).

set_kstar(k=0)[source]

Set all elements of kstar to a fixed value k.

Overload the set_kstar method from the superior classes. First, call the set_kstar from the superior classes. Then also reset all other AdvanceDensity attributes depending on kstar to None.

Parameters:

k – number of neighbours used to compute the density. It can be an iteger or an array of integers