The feature_weighting module

The feature_weighting module contains the FeatureWeighting class.

This class uses Differentiable Information Imbalance

class feature_weighting.FeatureWeighting(coordinates=None, distances=None, maxk=None, period=None, verbose=False, n_jobs=2)[source]

return_backward_greedy_dii_elimination(target_data: Type[Base], initial_weights: ndarray | int | float = None, lambd: float = None, n_epochs: int = 100, learning_rate: float = None, constrain: bool = False, decaying_lr: str = 'exp')[source]

Do a stepwise backward elimination of feature weights, always eliminating the lowest weight;: after each elimination the DII is optimized by gradient descent using the remaining features

Parameters:

target_data – FeatureWeighting object, containing the groundtruth data (D_groundtruth x N array, period (optional)) to be compared to.
initial_weights (np.ndarray or list) – D(input) initial weights for the input features. No zeros allowed here
lambd (float) – softmax scaling. If None (preferred) this chosen automatically with compute_optimal_lambda
n_epochs (int) – number of epochs in each optimization cycle
learning_rate (float) – learning rate. Has to be tuned, especially if constrain=True (otherwise optmization could fail)
constrain (bool) – if True, rescale the weights so the biggest weight = 1
l1_penalty (float) – l1 regularization strength
decaying_lr (string) – Default: “exp”. “exp” for exponentially decaying learning rate (cut in half every 10 epochs): lrate = l_rate_initial * 2**(-i_epoch/10), or “cos” for cosine decaying learning rate: lrate = l_rate_initial * 0.5 * (1+ cos((pi * i_epoch)/n_epochs)). “static” for no decay in the learning rate.

Returns:

final_diis – np.ndarray, shape (D). Array of the optmized DII for each of the according weights.
final_weights – np.ndarray, shape (D x D). Array of the optmized weights for each number of non-zero weights.

History entries added to FeatureWeighting object:

dii_per_epoch: np.ndarray, shape (D, n_epochs+1, D).: Weights during optimisation for every epoch and every number of non-zero weights. For final weights: weights_list[:,-1,:]

weights_per_epoch: np.ndarray, shape (D, n_epochs+1, ). DII during optimization for every epoch and number of non-zero weights. For final imbalances: diis_list[:,-1]

These history entries can be accessed as follows: objectname.history[‘entry_name’]

return_dii(target_data: Type[Base], lambd: float = None)[source]

Computes the DII between two FeatureWeighting objects based: on distances of input data and rank information of groundtruth data.

Parameters:

target_data – FeatureWeighting object, containing the groundtruth data (D_groundtruth x N array, period (optional)) to be compared to.
lambd (float, optional) – The regularization parameter. Default: 0.1. The higher this value, the more nearest neighbors are included. Can be calculated automatically with ‘return_optimal_lambda’. This sets lambda to a distance smaller than the average distance in the data set but bigger than the minimal distance

Returns:

dii (float) – The computed DII value. Depends on the softmax scale lambda.

Raises:

None. –

return_dii_gradient(target_data: Type[Base], weights: ndarray, lambd: float = None)[source]

Computes the gradient of the DII between two FeatureWeighting objects: (input object and ground truth object (= target_data)) with respect to the weights of the input features.

Parameters:

target_data – FeatureWeighting object, containing the groundtruth data (D_groundtruth x N array, period (optional)) to be compared to.
weights (np.ndarray) – The array of weight values for the input values, where D is the dimension of data.
lambd (float, optional) – The regularization parameter. Default: 0.1. The higher this value, the more nearest neighbors are included. Can be calculated automatically with ‘return_optimal_lambda’. This sets lambda to a distance smaller than the average distance in the data set but bigger than the minimal distance

Returns:

dii_weight_gradient (np.ndarray) – The computed gradient of DII with respect to the weights. Depends on the softmax scale lambda.

return_lasso_optimization_dii_search(target_data: Type[Base], initial_weights: ndarray | int | float = None, lambd: float = None, n_epochs: int = 100, learning_rate: float = None, l1_penalties: list | float = None, constrain: bool = False, decaying_lr: str = 'exp', refine: bool = False, plotlasso: bool = True)[source]

Search the number of resulting non-zero weights and the optimized DII for several l1-regularization strengths :param target_data: FeatureWeighting object, containing the groundtruth data

(D_groundtruth x N array, period (optional)) to be compared to.

Parameters:

initial_weights (np.ndarray or list) – D(input) initial weights for the input features. No zeros allowed. If None (default), the inverse standard deviation of the input features is used
lambd (float or None) – softmax scaling. If None (default), lambd is chosen automatically with compute_optimial_lambda.
n_epochs (int) – number of epochs in each optimization cycle. Default: 100.
learning_rate (float or None) – learning rate. If None (default) is tuned and chosen automatically. Has to be tuned if constrain=True (otherwise optmization could fail).
constrain (bool) – if True, rescale the weights so the biggest weight = 1. Default: False.
l1_penalties (list or None) – l1 regularization strengths to be tested. If None (default), a list of 10 sensible l1-penalties is tested, which are chosen depending on the learning rate.
decaying_lr (string) – Default: “exp”. “exp” for exponentially decaying learning rate (cut in half every 10 epochs): lrate = l_rate_initial * 2**(-i_epoch/10), or “cos” for cosine decaying learning rate: lrate = l_rate_initial * 0.5 * (1+ cos((pi * i_epoch)/n_epochs)). “static” for no decay in the learning rate.
refine (bool) – default: False. If True, the l1-penalties are added in between penalties where the number of non-zero weights changes by more than one. This is done to find the optimal l1-penalty for each number of non-zero weights. This option is not suitable for high-dimensional data with more than ~100 features, because the computational time scales with the number of dimensions.
plotlasso (bool) – default: True. If True, a plot is shown, with the optimal DII for each number of non-zero weights, colored by the l1-penalty used. This plot can be used to select select results with reasonably low DII.

Returns:

num_nonzero_features (np.ndarray) – D-dimensional numbers of non-zero features. Returns nan if no solution was found for a certain number of non-zero weights. In the same order as the according l1-penalties used, final DIIs and final weights.
l1_penalties_opt_per_nfeatures – (np.ndarray): D-dimensional. L1-regularization strengths for each num_nonzero_features, in the same order as the according final DIIs and final weights. If several l1-penalties led to the same number of non-zero weights, the solution with the lowest DII is selected. Returns nan if no solution was found for a certain number of non-zero weights.
dii_opt_per_nfeatures – (np.ndarray): D-dimensional. Final DIIs for each num_nonzero_features, in the same order as the according l1-penalties used and final weights. Returns nan if no solution was found for a certain number of non-zero weights.
weights_opt_per_nfeatures – (np.ndarray): D x D-dimensional. Final weights for each num_nonzero_features, in the same order as the according l1-penalties used and final DIIs used. Returns nan if no solution was found for a certain number of non-zero weights.

History entries added to FeatureWeighting object:

l1_penalties (np.ndarray): len(l1_penalties). The l1-regularization strengths tested: (in the order of the returned weights, diis and l1_loss_contributions)
weights_per_l1_per_epoch (np.ndarray): len(l1_penalties) x n_epochs x D.: All weights for each optimization step for each number of l1-regularization. For final weights: weights_list[:,-1,:]
dii_per_l1_per_epoch (np.ndarray): len(l1_penalties) x n_epochs.: Imbalance for each optimization step for each number of l1-regularization strength. For final imbalances: diis_list[:,-1]
l1_term_per_l1_per_epoch (np.ndarray): len(l1_penalties) x n_epochs.: L1 loss contributions for each optimization step for each number of nonzero weights. For final l1_loss_contributions: l1_loss_contributions[:,-1]

These history entries can be accessed as follows: objectname.history[‘entry_name’]

return_optimal_lambda(fraction: float = 1.0)[source]

Computes the optimal softmax scaling parameter lambda for the DII optimization. This parameter represents a reasonable scale of distances of the data points in the input data set. :param fraction: Zoom in or out from the optimal distance scale.

Default: 1.0. Suggested to keep it at default. Values > 1. show a bigger scale (in the optimization, this means include more neigbors), values < 1 show a smaller scale (in the optimization, this means include less neighbors in the softmax). Values < 1. include on average less neighbors, and very small values only the first neighbor

return_optimal_learning_rate(target_data: Type[Base], n_epochs: int = 50, n_samples: int = 200, initial_weights: ndarray | int | float = None, lambd: float = None, decaying_lr: str = 'exp', trial_learning_rates: ndarray = None)[source]

Find the optimal learning rate for the optimization of the DII by testing several on a reduced set :param target_data: FeatureWeighting object, containing the

groundtruth data (D_groundtruth x N array, period (optional)) to be compared to.

Parameters:

n_epochs (int) – number of epochs in each optimization cycle
n_samples (int) – Number of samples to use for the learning rate screening. Default = 300.
initial_weights (np.ndarray or list) – D(input) initial weights for the input features. No zeros allowed here
lambd (float) – softmax scaling. If None (preferred), this chosen automatically with compute_optimial_lambda
decaying_lr (string) – Default: “exp”. “exp” for exponentially decaying learning rate (cut in half every 10 epochs): lrate = l_rate_initial * 2**(-i_epoch/10), or “cos” for cosine decaying learning rate: lrate = l_rate_initial * 0.5 * (1+ cos((pi * i_epoch)/n_epochs)). “static” for no decay in the learning rate.
trial_learning_rates (np.ndarray or list or None) – learning rates to try. If None are given, a sensible set of learning rates is tested.

Returns:

opt_l_rate (float) – Learning rate, which leads to optimal unregularized (no l1-penalty) result in the specified number of epochs.

History entries added to FeatureWeighting object:

trial_learning_rates: np.ndarray. learning rates which were tested to find optimal one. dii_per_epoch_per_lr: np.ndarray, shape (len(trial_learning_rates), n_epochs+1).

DII for each trial learning rate at each epoch.

weights_per_epoch_per_lr: np.ndarray, shape (len(trial_learning_rates), n_epochs+1, D).: Weights for each trial learning rate and at each epoch.

These history entries can be accessed as follows: objectname.history[‘entry_name’]

return_weights_optimize_dii(target_data: Type[Base], n_epochs: int = 100, constrain: bool = False, initial_weights: ndarray | int | float = None, lambd: float = None, learning_rate: float = None, l1_penalty: float = 0.0, decaying_lr: str = 'exp')[source]

Optimize the differentiable information imbalance using gradient descent: of the DII between input data object A and groundtruth data object B.

Parameters:

target_data – FeatureWeighting object, containing the groundtruth data (D_groundtruth x N array, period (optional)) to be compared to.
n_epochs – int, optional The number of epochs in the gradient descent optimization. If None, it is set to 100.
constrain – bool Constrain the sum of the weights to sum up to the number of weights. Default: False
weights (initial) – numpy.ndarray, shape (D,) The array of starting weight values for the input values, where D is the dimension of data. If none, it is initialized to 1/var for each variable This cannot be initialized to 0’s. It can be initialized to all 1 or the inverse of the standard deviation
lambd – float, optional The lambda scaling parameter of the softmax. If None, it is calculated automatically. Default is None.
learning_rate – float, optional The learning rate of the gradient descent. If None, automatically estimated to be fast.
l1_penalty – float, optional The l1-regularization strength, if sparcity is needed. Default: 0 (l1-regularization turned off).
decaying_lr (string) – Default: “exp”. “exp” for exponentially decaying learning rate (cut in half every 10 epochs): lrate = l_rate_initial * 2**(-i_epoch/10), or “cos” for cosine decaying learning rate: lrate = l_rate_initial * 0.5 * (1+ cos((pi * i_epoch)/n_epochs)). “static” for no decay in the learning rate.

Returns:

final_weights – np.ndarray, shape (D). Array of the optmized weights.

History entries added to FeatureWeighting object:

weights_per_epoch: np.ndarray, shape (n_epochs+1, D).: List of lists of the weights during optimization.
dii_per_epoch: np.ndarray, shape (n_epochs+1, ).: List of the differentiable information imbalances during optimization.
l1_term_per_epoch: np.ndarray, shape (n_epochs+1, ).: List of the l1_penalty terms contributing to the the loss function during optimization.

These history entries can be accessed as follows: objectname.history[‘entry_name’]