The diff_imbalance module

The diff_imbalance module contains the DiffImbalance class, implemented with JAX.

The only method supposed to be called by the user is ‘train’, which carries out the automatic optimization ot the Differential Information as a function of the weights of the features in the first distance space. The code can be runned on gpu using the command

jax.config.update(‘jax_platform_name’, ‘gpu’) # set ‘cpu’ or ‘gpu’

class diff_imbalance.DiffImbalance(data_A, data_B, periods_A=None, periods_B=None, num_epochs=200, batches_per_epoch=1, seed=0, l1_strength=0.0, point_adapt_lambda=True, k_init=1, k_final=1, lambda_factor=0.1, params_init=None, optimizer_name='sgd', learning_rate=0.01, learning_rate_decay=None, num_points_rows=None)[source]

Carries out the optimization of the DII(A(w)->B) with respect to the weights in the first distance space.

The class ‘DiffImbalance’ supports two schemes for setting the smoothing parameter lambda, which tunes the size of neighborhoods in space A. In both schemes lambda can be epoch-dependent, i.e. decreased during the training according to a cosine decay between ‘init’ and ‘final’ values. The schemes are:

1. Adaptive: lambda is equal for all the points and is set to a fraction (given by lambda_factor, default is 1/10) of the average square distance of k-th neighbors.

Example:
point_adapt_lambda: False k_init: 10 k_final: 1 lambda_factor=1/10

2. Point-adaptive: lambda is different for each point. For point i, it is set to a fraction of the square distance between i and its k-th neighbor.

Example:
point_adapt_lambda: True k_init: 10 k_final: 1 lambda_factor=1/10

As a rule of thumb, we suggest to set k_init and k_final to ~5% of the points in the data set, if mini- batches are not employed, or to ~5% of the points within each mini-batch, if they are employed.

data_A

feature space A, matrix of shape (n_points, n_features_A).

Type:: np.array(float), jnp.array(float)

data_B

feature space B, matrix of shape (n_points, n_features_B).

Type:: np.array(float), jnp.array(float)

periods_A

array of shape (n_features_A,), periods of features A. Default is None, which means that features A are treated as nonperiodic. If not all features are periodic, the entries of the nonperiodic ones should be set to 0.

Type:: np.array(float), jnp.array(float)

periods_B

array of shape (n_features_B,), periods of features B. Default is None, which means that features B are treated as nonperiodic. If not all features are periodic, the entries of the nonperiodic ones should be set to 0.

Type:: np.array(float), jnp.array(float)

num_epochs

number of training epochs. Default is 200.

Type:: int

batches_per_epoch

number of minibatches; must be a divisor of n_points. Each weight update is carried out by computing the DII gradient over n_points / batches_per_epoch points. Default is 1, which means that the gradient is computed over all the available points (batch GD).

Type:: int

seed

seed of JAX random generator, default is 0. Different seeds determine different mini-batch partitions.

Type:: int

l1_strength

strength of the L1 regularization (LASSO) term. Default is 0.

Type:: float

point_adapt_lambda

whether to use a global smoothing parameter lambda for the c_ij coefficients in the DII (if False), or a different parameter for each point (if True). Default is True.

Type:: bool

k_init

initial rank of neighbors used to set lambda. Ranks are defined starting from 1. If batches_per_epoch > 1, neighbors are recomputed within each mini-batch. Default is 1.

Type:: int

k_final

final rank of neighbors used to set lambda. If batches_per_epoch > 1, neighbors are recomputed within each mini-batch. Default is 1.

Type:: int

lambda_factor

factor defining the scale of lambda. Default is 0.1.

Type:: float

params_init

array of shape (n_features_A,) containing the initial values of the scaling weights to be optimized. If None, params_init is set to [0.1, 0.1, …, 0.1].

Type:: np.array(float), jnp.array(float)

optimizer_name

name of the optimizer, calling the Optax library. Possible choices are ‘sgd’ (default), ‘adam’ and ‘adamw’. See https://optax.readthedocs.io/en/latest/api/optimizers.html for additional details.

Type:: str

learning_rate

value of the learning rate. Default is 1e-2.

Type:: float

learning_rate_decay

schedule to damp the learning rate to zero starting from the value provided with the attribute learning_rate. The available schedules are: cosine decay (“cos”), exponential decay (“exp”; the initial learning rate is halved every 10 steps), or constant learning rate (None). Default is None (constant learning rate).

Type:: str

num_points_rows

number of points sampled from the rows of rank and distance matrices. In case of large datasets, choosing num_points_rows < n_points can significantly speed up the training. The default is None, for which num_points_rows == n_points.

Type:: int

backward_greedy_feature_selection(n_features_min=1, n_best=10, compute_error=False, ratio_rows_columns=1, seed=0, discard_close_ind=0)[source]

Performs backward greedy feature selection using the Differentiable Information Imbalance.

Starting with all features, the algorithm progressively removes the least informative features one at a time, until either no features are left or n_features_min is reached. For each iteration, the algorithm selects the n_best feature sets with the lowest DII values for consideration in the next round. The method should be called after calling the train() method, which performs the first optimization.

For each candidate feature set, the weights are optimized specifically for that subset. When mini-batches are used, the same random seed ensures consistent mini-batch sequences, and the same split of points along rows and columns of distance matrices if compute_error is True.

Parameters:

n_features_min (int) – minimum number of features to select. Default is 1.
n_best (int) – number of best feature tuples to consider at each iteration. Default is 10.
compute_error (bool) – whether to compute error estimates for the DII. Default is False.
ratio_rows_columns (float) – ratio between the number of points along rows and columns when computing the DII. Only used when compute_error is True. Default is 1.
seed (int) – seed for random number generation. Default is 0.
discard_close_ind (int) – index to discard close points when computing the DII. Default is 0.

Returns:

feature_sets (list) – list of lists, where each sublist contains the indices of the selected features at each iteration.
diis (list) – list of DII values corresponding to each set of selected features.
errors (list) – list of error estimates for each DII value. Only meaningful if compute_error is True.
best_weights_list (list) – list of arrays containing the optimal weights for each set of selected features.

forward_greedy_feature_selection(n_features_max=None, n_best=10, compute_error=False, ratio_rows_columns=1, seed=0, discard_close_ind=0)[source]

Performs forward greedy feature selection using the Differentiable Information Imbalance.

Starting with all individual features, the algorithm evaluates which single feature has the lowest DII. Then it combines the best n_best single features with each of the remaining features to find the best 2-feature combination. This process continues until n_features_max features are selected or all features are included.

For each candidate feature set, the weights are optimized specifically for that subset. When mini-batches are used, the same random seed ensures consistent mini-batch sequences, and the same split of points along rows and columns of distance matrices if compute_error is True.

Parameters:

n_features_max (int) – maximum number of features to select. If None, will select up to all features.
n_best (int) – number of best feature tuples to consider at each iteration. Default is 10.
compute_error (bool) – whether to compute error estimates for the DII. Default is False.
ratio_rows_columns (float) – ratio between the number of points along rows and columns when computing the DII. Only used when compute_error is True. Default is 1.
seed (int) – seed for random number generation. Default is 0.
discard_close_ind (int) – index to discard close points when computing the DII. Default is 0.

Returns:

best_feature_sets (list) – list of lists, where each sublist contains the indices of the selected features at each iteration.
best_diis (list) – list of DII values corresponding to each set of selected features.
best_errors (list) – list of error estimates for each DII value. Only meaningful if compute_error is True.
best_weights_list (list) – list of arrays containing the optimal weights for each set of selected features.

return_final_dii(compute_error=True, ratio_rows_columns=1, seed=0, discard_close_ind=0)[source]

Returns final DII computed over the full data set using the optimal weights.

If the training was carried out with mini-batches of small size, this method allows computing a better estimate of the DII than the final DII value produced by ‘train’. When ‘compute_error=False’ and ‘discard_close_ind=0’, the final DII produced by ‘train’ is the same computed by ‘return_final_dii’ if the training was performed without mini-batches (batches_per_epoch=1) and without row subsampling (‘num_points_rows=None’). The value of k for computing the smoothing parameter lambda is set in order to keep the same ratio k/N used in the training phase (if batches_per_epoch > 1, N is the size of mini-batches used during the training).

Parameters:

compute_error (bool) – whether to compute the final DII and its error by sampling different points along rows and columns of the distance matrix. If False, the final DII is computed using the same points along rows and columns, which does not allow for an error estimation. Default is True.
ratio_rows_columns (float) –
only read when compute_error is True; defines the ratio between the number of points along rows (nrows) and along columns (ncolumns) of distance and rank matrices, in two groups randomly sampled. In general, nrows and ncolumns are determined by solving the equations

nrows / ncolumns = ratio_rows_columns, nrows + ncolumns = n_total_points.

Default is 1, which means that both groups have n_points / 2 elements.
discard_close_ind (int) – given any point i, defines the “close” points (following the labelling order along axis=0 of data_A and data_B) that are known to be significantly correlated with i. For example, this may occur when the data set is a time series, and axis=0 is the time dimension. If compute_error is True, “time-correlated” points are excluded by subsampling the data along axis=0 with stride discard_close_ind + 1. If compute_error is False, distances between each point i and points within the time window [i-discard_close_ind, i+discard_close_ind] are discarded. Default is 0, for which no distances between points close in time are discarded.
seed (int) – seed of JAX random generator, default is 0.

Returns:

imb_final (float) – final DII, also accessible as attribute of the CausalGraph object.
error_final (float) – error associated to final DII, also accessible as attribute of the CausalGraph object. If compute_error is False, error_final is set to None.

train(bar_label=None)[source]

Performs the full training of the DII, using the input attributes of the DiffImbalance object.

Notice that when mini-batches are employed, for efficiency reasons the DII is not recomputed over the full data set at each training epoch. To access the value of the DII over the full data set, use after training the method ‘return_final_dii’.

Parameters:

bar_label (str) – label on the tqdm training bar, useful when several trains are performed.

Returns:

params_training (np.array(float)) – matrix of shape (num_epochs+1, n_features_A) containing the feature weights during the training, starting from their initialization. Also accessible as attribute of the CausalGraph object.
imbs_training (np.array(float)) – array of shape (num_epochs+1,) containing the DII during the training. Element imbs_training[i] is the DII computed over the last mini-batch used in training epoch i. The same output is accessible as attribute of the CausalGraph object.