xai_compare.comparisons.feature_selection

Classes

FeatureSelection(model, data, target[, ...])

A class to evaluate different feature elimination strategies provided by the list of explainers on a specified model.

class xai_compare.comparisons.feature_selection.FeatureSelection(model, data: DataFrame, target: DataFrame | Series | ndarray, mode: str = 'regression', fast_mode: bool = False, random_state: int = 42, verbose: bool = True, threshold: float = 0.2, metric: str | None = None, default_explainers: List[str] = ['shap', 'lime', 'permutations'], custom_explainer: Type[Explainer] | List[Type[Explainer]] | None = None)

A class to evaluate different feature elimination strategies provided by the list of explainers on a specified model.

Attributes:

model (Model):: The machine learning model to be evaluated.
data (pd.DataFrame):: The feature dataset used for model training and explanation.
target (Union[pd.DataFrame, pd.Series, np.ndarray]):: The target variables associated with the data.
mode (str):: The mode of operation, either ‘REGRESSION’ or ‘CLASSIFICATION’.
fast_mode (bool):: If True, uses a faster but potentially less accurate method for feature importance extraction and elimination.
random_state (int):: Seed used by the random number generator to ensure reproducibility.
verbose (bool):: If True, prints additional information during the function’s execution.
threshold (float):: The threshold for feature importance below which features are considered for elimination.
metric (Union[str, None]):: The evaluation metric used for assessing model performance after feature elimination.
default_explainers (List[str]):: List of default explainers to be used.
custom_explainer (Union[Type[Explainer], List[Type[Explainer]], None]):: Custom explainer(s) provided by the user.

add_best_feature_set()

Appends the best feature set analysis results to each entry in the results dictionary based on a specified metric.

Returns:

dict:: Updated results dictionary with best feature set analysis appended.

Notes:

The function iterates over the results dictionary, applies a best feature set selection based on the specified main metric, and appends the results back into the dictionary.

apply(): Applies feature elimination and updates the results dictionary with the best feature set analysis.

best_result()

Determines the best feature set based on the specified metric and optionally visualizes the results.

Returns:

float:: The maximum value of the specified evaluation metric across all explainer results, indicating the best feature set performance.

choose_best_feature_set(evaluation_results, data_type='val')

Evaluates and visualizes the best feature set based on a provided metric from model evaluation results.

This function calculates performance metrics for different numbers of features removed and identifies the optimal number of features by finding the highest metric value.

Returns:

int:: The number of features suggested to be removed for optimal performance.

create_list_explainers(custom_explainer: Type[Explainer] | List[Type[Explainer]] | None) → List[Explainer]

Creates a list of explainer classes from default and custom explainers.

Attributes:

custom_explainer (Union[Type[Explainer], List[Type[Explainer]], None]):: Custom explainer or a list of custom explainer classes.

Returns:

List[Explainer]:: A list of initialized explainer classes.

display(): Displays the results of feature selection through visualizations. This method generates plots to visualize the outcomes of feature selection, including feature importances and other relevant metrics.

evaluate_explainer(explainer)

Evaluates the performance of a machine learning model with progressively fewer features based on the importance determined by various explainer methods.

This function iteratively eliminates the least important features as determined by the specified explainer until the number of features is reduced to the desired threshold.

Returns:

list:: A list containing a list of DataFrames with feature importances and model evaluation results.

static evaluate_models(model, X_train, y_train, X_val, y_val, X_test, y_test, mode)

Evaluates a model’s performance metrics on training, validation, and test datasets.

Parameters:

model:: The model to evaluate.
X_train (pd.DataFrame):: Training features.
y_train (Union[pd.DataFrame, pd.Series, np.ndarray]):: Training labels.
X_val (pd.DataFrame):: Validation features.
y_val (Union[pd.DataFrame, pd.Series, np.ndarray]):: Validation labels.
X_test (pd.DataFrame):: Test features.
y_test (Union[pd.DataFrame, pd.Series, np.ndarray]):: Test labels.
mode (str):: Operation mode, ‘classification’ or ‘regression’.

Returns:

pd.DataFrame:: A DataFrame containing performance metrics for each dataset.

Notes:

The function calculates accuracy, precision, recall, and F1 scores for classification mode, and mean squared error and mean absolute error for regression mode.

get_feature_elimination_results()

Evaluates different feature elimination strategies provided by the list of explainers on a specified model.

Each explainer is used to assess the importance of features, and based on that, evaluate the model’s performance with progressively eliminated features.

Returns:

dict:: A dictionary containing the results from each explainer.

plot_feature_selection_outcomes()

Generates visualizations of performance metrics for selected feature sets evaluated on the test set.

This function creates two types of visualizations:

Bar charts for feature importance.
A table of performance metrics.

train_test_val()

Splits the data into training, validation, and testing sets.

This method splits the data into a training set, a validation set, and a testing set. It uses a stratified split to maintain the distribution of the target variable across the sets.

Returns:

tuple:

X_train (pd.DataFrame):
Training features.
y_train (Union[pd.DataFrame, pd.Series, np.ndarray]):
Training target.
X_val (pd.DataFrame):
Validation features.
y_val (Union[pd.DataFrame, pd.Series, np.ndarray]):
Validation target.
X_test (pd.DataFrame):
Testing features.
y_test (Union[pd.DataFrame, pd.Series, np.ndarray]):
Testing target.