The density_advanced module
The density_advanced module contains the DensityEstimation class.
Different algorithms to estimate the logdensity, the logdensity gradientest and the logdensity differences are implemented as methods of this class. In particular, differently from the methods implemented in the DensityEstimation, the methods in the DensityAdvanced class are based on the sparse neighbourhood graph structure which is implemented in the NeighGraph class.
- class density_advanced.DensityAdvanced(coordinates=None, distances=None, maxk=None, verbose=False, n_jobs=2)[source]
Computes the log-density gradient and its covariance at each point and other log-density-related properties.
Can return an estimate of the gradient of the log-density at each point and an estimate of the error on each component. Can return an estimate of log-density differences and their error each point based on the gradient estimates. Can compute the log-density and its error at each point using BMTI, i.e. integrating the log-density differences on the neighbourhood graph
- grads
size N. Contains the gradient components estimated at each point i
- Type:
np.ndarray(float), optional
- grads_var
size N x dims. For each line i contains the estimated variance of the gradient components at point i
- Type:
np.ndarray(float), optional
- grads_covmat
size N x dims x dims. For each line i contains the estimated covariance matrix of the gradient components at point i
- Type:
np.ndarray(float), optional
- pearson_array
size nspar. At position p corresponding to the directed edge (i,j) of the neighbourhood graph, it contains an estimate of the Pearson correlation coefficient between the directed deltaFij computed with the gradients in i and in j, namely between dot(g_i,(x_j-x_i)) and dot(g_j,(x_j-x_i)).
- Type:
np.ndarray(float), optional
- Fij_array
size nspar. Stores for each couple in nind_list the estimates of deltaF_ij computed from point i as semisum of the gradients in i and minus the gradient in j
- Type:
list(np.array(float)), optional
- Fij_var_array
size nspar. Stores for each couple in nind_list the estimates of the squared errors on the values in Fij_array
- Type:
np.array(float), optional
- inv_deltaFs_cov
size nspar. Stores for each couple in nind_list the estimates of the inverse cross-covariance of the deltaFs, that is: cov [ deltaFij , deltaFlm ] .
- Type:
np.array(float), optional
- compute_deltaFs(similarity_method='jaccard', comp_p_mat=False)[source]
- Compute deviations deltaFij to standard kNN log-densities at point j as seen from point i using
a linear expansion with as slope the semisum of the average gradient of the log-density over the neighbourhood of points i and j.
If not defined, compute the Pearson coefficients p (see docs for pearson_array) by running compute_pearson. Then use these p in the estimate of the variances on the deltaFij as 1/4*(E_i^2+E_j^2+2*E_i*E_j*chi), where E_i is the error on the estimate of grad_i*DeltaX_ij (see [Carli2024]). The log-density differences are stored Fij_array, their variances in Fij_array_var.
- Parameters:
similarity_method – see docs for neigh_graph.compute_neigh_similarity_index function
comp_p_mat – see docs for compute_pearson function
- compute_density_BMTI(delta_F_inv_cov='uncorr', comp_log_den_err=False, solver='sp_direct', sp_direct_perm_spec='NATURAL', alpha=1, log_den=None, log_den_err=None)[source]
Compute the log-density for each point using BMTI.
If alpha<1, the algorithm also includes a regularisatin. The regulariser log-density and its errors can be passed as arguments: log_den and log_den_err. If any of these two is not specified, use kstarNN estimator as a regulariser.
- Parameters:
delta_F_inv_cov (str) –
specify the method used to invert the cross-covariance matrix C of the log-density deviations cov[deltaF_ij,deltaF_kl]. Currently implemented methods:
- ”uncorr” (default): all the deltaFs are assumed uncorrelated, i.e. C is assumed to be diagonal with
diagonal = Fij_var_array
- ”identity”: C is assumed as the identity matrix, so that all terms in the BMTI likelihood are taken
unweighted (variance of deltaF_ij = 1 for all (i,j) couples)
- ”LSDI”: (Least Squares with respect to a Diagonal Inverse). Invert the cross-covariance C by
finding the approximate diagonal inverse which multiplied by C gives the least-squares closest matrix to the identity in the Frobenius norm
comp_log_den_err (bool) – if True, compute the error on the BMTI estimates. Can be highly time consuming
solver (str) –
specify the solver to use when solving the BMSTI linear system. Three sparse (memory efficient) and a dense solvers are implemented:
- ’sp_direct’ (default): scipy.sparse.linalg.spsolve. Performs a LU decomposition of the matrix and
then solves the linear system directly. More robust but less memory efficient than other implemented sparse solvers. Slower than iterative solvers for very sparse and large matrices.
- ’sp_cg’: scipy.sparse.linalg.cg. This is the iterative conjugate gradient method. It might be
preferred to ‘direct’ for large and sparse matrices. If a log-density estimate is alredy stored in self.log_den, it will be used as a guess for the solution for a great spedup. If this option is chosen, we suggest you call compute_density_kstarNN() right before computing BMTI.
- ’sp_cg_precond’: same as ‘cg’, scipy.sparse.linalg.cg, but with a preconditioner estimated
unsuperivisedly with a partial LU decomposition (scipy.sparse.linalg.spilu) of the matrix. In settings where ‘direct’ performs better than ‘cg’, ‘cg_precond’ is likely to perform better than ‘spolve’ and ‘cg’. If ‘cg’ already performs better than ‘direct’, ‘cg_precond’ is likely to perform worse than ‘cg’ alone.
- ’dense’: numpy.linalg.solve. Direct solver for dense matrices. O(N^3) complexity, O(N^2) memory
complexity. The solver automatically uses multiprocessing if available. This option is suited for small datasets or when memory and cores are not an issue.
sp_direct_perm_spec (str) – specify the permutation strategy to use when solving the linear system with the ‘sp_direct’ solver. See the scipy.sparse.linalg.spsolve documentation for more information.
alpha (float) – can take values from 0.0 to 1.0. Indicates the portion of BMTI in the sum of the likelihoods alpha*L_BMTI + (1-alpha)*L_kstarNN. Setting alpha=1.0 corresponds to not reguarising BMTI.
log_den (np.ndarray(float)) – size N. The array of the log-densities of the regulariser.
log_den_err (np.ndarray(float)) – size N. The array of the log-density errors of the regulariser.
- compute_diag_inv_deltaFs_cross_covariance_LSDI(similarity_method='jaccard')[source]
Compute the diagonal of the appoximate inverse of the deltaFs cross-covariance cov[deltaFij,deltaFlm] using the LSDI approximation (see compute_density_BMTI docs)
- Parameters:
similarity_method – see docs for neigh_graph.compute_neigh_similarity_index function
- compute_grads(comp_covmat=False)[source]
Compute the gradient of the log density each point using kstar nearest neighbors and store
Estimate the gradient using an improved version of the mean-shift gradient algorithm [Fukunaga1975] as presented in [Carli2024]. Store the computed gradients in grads. Also compute the variance of the gradient and store it in grads_var. Optionally, the whole covariance matrix can be estimated for gradient
- Parameters:
comp_covmat (bool) – if True, the whole covariance matrix is computed for each gradient and stored in
grads_covmat
- compute_pearson(similarity_method='jaccard')[source]
Compute, for any couple (i,j) of points connected on the directed neighbourhood graph, an estimate of the Pearson correlation coefficient between the directed deltaFij computed with the gradients in i and in j, namely between dot(g_i,(x_j-x_i)) and dot(g_j,(x_j-x_i)). These are needed in order to compute the errors on the deltaFs. They are estimated as the neighbourhood similarity index (see documentation for compute_neigh_similarity_index) times the sign of the product of the two directed deltaFijs. The Pearson coefficients take values between -1 and 1 and are stored in the pearson_array attribute.
- Parameters:
similarity_method (str) – similarity_method to compute the neighbourhood similarity index (see documentation for compute_neigh_similarity_index).
- set_kstar(k=0)[source]
Set all elements of kstar to a fixed value k.
Overload the set_kstar method from the superior classes. First, call the set_kstar from the superior classes. Then also reset all other AdvanceDensity attributes depending on kstar to None.
- Parameters:
k – number of neighbours used to compute the density. It can be an iteger or an array of integers