Captum v0.4.0 Release
The Captum 0.4.0 release adds new functionalities for concept-based interpretability, evaluating model robustness, new attribution methods including Layerwise Relevance Propagation (LRP), and improvements to existing attribution methods.
Concept-Based Interpretability
Captum 0.4.0 adds TCAV (Testing with Concept Activation Vectors) to Captum, allowing users to identify significance of user-defined concepts on a model’s prediction. TCAV has been implemented in a generic manner, allowing users to define custom concepts with example inputs for any modality including vision and text.
Robustness
Captum 0.4.0 also includes new tools to understand model robustness including implementations of adversarial attacks (Fast Gradient Sign Method and Projected Gradient Descent) as well as robustness metrics to evaluate the impact of different attacks or perturbations on a model. Robustness metrics included in this release include:
- Attack Comparator - Allows users to quantify the impact of any input perturbation (such as torchvision transforms, text augmentation, etc.) or adversarial attack on a model and compare the impact of different attacks
- Minimal Perturbation - Identifies the minimum perturbation needed to cause a model to misclassify the perturbed input
This robustness tooling enables model developers to better understand potential model vulnerabilities as well as analyze counterfactual examples to better understand a model’s decision boundary.
Layerwise Relevance Propagation (LRP)
We also add a new attribution method LRP (Layerwise Relevance Propagation) to Captum in the 0.4.0 release, as well as a layer attribution variant, Layer LRP. Layer-wise relevance propagation is based on a backward propagation mechanism applied sequentially to all layers of the model. The model output score represents the initial relevance which is decomposed into values for each neuron of the underlying layers. Thanks to @nanohanno for contributing this method to Captum and @rGure for providing feedback!
New Tutorials
We have added new tutorials to demonstrate Captum with BERT, usage of Lime, and DLRM recommender models. These tutorials are:
- Interpreting BERT Models (Part 2)
- LIME for Image & Text Classification
- Interpreting Deep Learning Recommender Model (DLRMs) (https://github.com/facebookresearch/dlrm) with Captum
Additionally, the following fixes and updates to existing tutorials have been added:
- The IMDB tutorial has been updated with a new model (trained with a larger embedding and updated dependencies) for reproducibility.
- Interpreting BERT Models (Part 1) has been updated to make use of LayerIntegratedGradients with multiple layers to obtain attributions simultaneously, and
Attribution Improvements
Captum 0.4.0 has added improvements to existing attribution methods including:
- Neuron conductance now supports a selector function (in addition to providing a neuron index) to select the target neuron for attribution, which enables support for layers with input / output as a tuple of tensors (PR #602).
- Lime now supports a generator to be returned by the perturbation function, rather than only a single sample, to better support enumeration of perturbations for interpretable model training (PR #619).
- KernelSHAP has been improved to perform weighted sampling of vectors for interpretable model training, rather than uniformly sampling vectors and weighting only when training. This change scales better with larger numbers of features, since weights for larger numbers of features were previously leading to arithmetic underflow (PR #619).
- A new option show_progress has been added to all perturbation-based attribution methods, which shows a progress bar to help users track progress of attribution computation (Issue #630 , PR #581).
- A new option / flag normalize has been added to infidelity evaluation metric that normalizes and scales infidelity score based on an input flag normalize (Issue: #613, PR: #639 )
- All perturbation-based attribution methods now support boolean input tensors (PR #666).
- Lime’s default regularization for Lasso regression has been reduced from 1.0 to 0.01 to avoid frequent issues with attribution results being 0 (Issue #679, PR #689).
Bug Fixes
- Gradient-based attribution methods have been fixed to not zero previously stored grads, which avoids warnings related to accessing grad of non-leaf tensors (Issue #421, #491, PR #597).
- Captum tests were previously included in Captum distributions unnecessarily; tests are no longer packaged with Captum releases (Issue #629 , PR #635).
- Captum’s dependency on matplotlib in Conda environments has been changed to matplotlib-base, since pyqt is not used in Captum (Issue #644, PR #648).
- Layer attribution methods now set gradient requirements only starting at the target layer rather than at the inputs, which ensures support for models with int or boolean input tensors (PR #647, #643).
- Lime and Kernel SHAP int overflow issues (with sklearn interpretable model training) have been resolved, and all interpretable model inputs / outputs are converted to floats prior to training (PR #649).
- Original parameter names which were renamed in v0.3 for NoiseTunnel, Kernel Shap, and Lime no longer lead to deprecation warnings and were removed in 0.4.0 (PR #558).