-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding FastCAM Functionality #442
base: master
Are you sure you want to change the base?
Conversation
I have updated the code such that it uses the multi-layer Thanks~ |
Thanks for updating with LayerActivation @ryanchankh ! I am finishing a review and will share comments soon, by the end of the week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in getting back with feedback! Thanks again for adding this to Captum, it looks good overall @ryanchankh . I've added some comments on the API and tests, let us know if you have any questions.
self, | ||
forward_func: Callable, | ||
layers: ModuleOrModuleList, | ||
norm: Any = "gamma", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: String
Note that currently it is assumed that either the input | ||
or the outputs of internal layers, depending on whether we | ||
attribute to the input or output, are single tensors. | ||
Support for multiple tensors will be added later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can update / remove these lines, LayerActivation already supports multiple tensors, but I don't think it's applicable for FastCAM, would be good to document that limitation as well.
additional_forward_args: Any = None, | ||
attribute_to_layer_input: bool = False, | ||
) -> Tuple[Tensor, ...]: | ||
r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all other input attribution methods, we've generally followed the method of having attribute return attributions matching the input shape. Could we possibly keep that convention here to return the combined results and provide another instance method to obtain just the normalized activations for each layer?
|
||
The recommended use case for FastCAM is to compute saliency maps for multiple | ||
layers with different scales in a deep network, then combine them to obtain | ||
a more meaningful saliency map for the original input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add more details summarizing the approach as well as explain any limitations, e.g. this is primarily intended for CNN architectures, or particularly where intermediate layers are spatially aligned with the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more details here would be great if possible!
th = k * m | ||
return th | ||
|
||
def _compute_gamma_norm(self, inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems in the paper only normalizing with the Gaussian CDF is mentioned, could we possibly keep that as the default to be consistent?
It might be also worth considering whether it's essential to include this gamma normalization code here. This makes the implementation trickier to follow, and is not essential to the core method. Can we possibly only support Gaussian normalization for now and not include this here? From the original repo, it seems that GammaNorm leads to only a very slight improvement over Gaussian normalization, but substantially slower, and isn't mentioned in the paper.
Additionally, it seems that this code is very similar to that of the normalization in the original repo (https://github.com/LLNL/fastcam/blob/master/norm.py), I'm not sure if this could cause any copyright issues, since the copyright header needs to generally be included for reproductions. cc: @NarineK
) | ||
attributes = [] | ||
for layer_attr in layer_attrs: | ||
smoe_attr = self._compute_smoe_scale(layer_attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we expose some baseline approaches here in addition to SMOE scale for combining channels? From a quick skim of the paper, it seems that standard deviation also performs reasonably well compared to other methods. It would be great to have the flexibility to experiment with different approaches of combining information from activations, particularly the simple baselines of standard deviation and mean.
|
||
|
||
class Test(BaseTest): | ||
def test_one_layer_gamma(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like these tests all hardcode expected attributions for BasicModel_ConvNet, which randomly initializes parameters. PyTorch doesn't guarantee that random seeds lead to consistent behavior across versions / releases, so this could potentially lead to issues in the future. It would be preferable to instead use simple models with fixed parameters to ensure reproducibility. It would also be great to manually confirm the expected result, which can be done with small models. A potential model that can be used here is BasicModel_ConvNet_One_Conv, which has fixed parameters for the convolution layer.
ex["attributions"] = [ | ||
[ | ||
[ | ||
26.5969, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, how were these example results obtained? Also, when updating, it would be ideal to have mostly smaller examples, so the tests have fewer hardcoded values and would be easier to read and verify correctness (e.g. for LayerActivation here).
weights = [1.0, 1.0] | ||
self._fastcam_test_assert(net, inp, ex, layers, norm, combine, weights) | ||
|
||
def _fastcam_test_assert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to include type hints here, as well as all utility methods in the main implementation.
|
||
from ..helpers.basic import BaseTest, assertTensorAlmostEqual | ||
from ..helpers.basic_models import BasicModel_ConvNet | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to add DataParallel tests and ensure all functionality works appropriately on GPU. You can take a look at the test generator here: https://github.com/pytorch/captum/blob/master/tests/attr/test_data_parallel.py, which is based on this config https://github.com/pytorch/captum/blob/master/tests/attr/helpers/test_config.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, would be great if these DataParallel / GPU tests could be added as well, thanks!
Hey @vivekmig, thank you for the feedback! And don't worry about the delays. Once I have everything fixed up and added, I'll let you know again in this thread. |
Hi @ryanchankh , just wanted to follow up on this, will you still be able to make the updates to this PR? Thanks! cc: @NarineK |
Sorry I got caught up with some of my other projects this past month. I am aiming to finish updating this pull request by the end of this month. Is this timeline okay? |
No problem, sure, that sounds great, thanks! |
… normalizing and scaling functions
Hi @ryanchankh! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Hi @vivekmig, I recently updated the
As to the copyright issue, I am actually an employee at LLNL, working with the original authors. So there shouldn't be any copyright issues from the LLNL side. But if there is anything we need to do to ensure everything is okay legal-wise, I am willing to add any changes. Last but not least, I am having a hard time passing some of the automatic tests. It seems like it has something to do with the files I didn't touch. What is the best way to resolve these issues? Sorry for the long wait! And we really appreciate the feedback! Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments @ryanchankh , this looks great :) ! I've added a few more minor comments. @NarineK may also take another look to provide some additional feedback.
Regarding the CircleCI tests, I think the errors are related to issues we have fixed previously, can you try pulling the latest master into your branch? That should likely resolve the issues.
|
||
The recommended use case for FastCAM is to compute saliency maps for multiple | ||
layers with different scales in a deep network, then combine them to obtain | ||
a more meaningful saliency map for the original input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more details here would be great if possible!
r""" | ||
Args: | ||
|
||
inputs (tensor or tuple of tensors): Input for which attributions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below, it looks like inputs should only be a single tensor based on
bn, channels, height, width = inputs.shape
If so, can we update this docstring and the type hint to reflect this?
are provided, the examples must be aligned appropriately. | ||
scale (str, optional): The choice of scale to pass through attributes. | ||
The available options are: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some detail to clarify that all these methods are different methods of combining elements on the channel dimension (for each spatial location independently) and resulting in size from N x C X H x W to N x 1 x H x W? This might also be clarified by more details in the algorithm description explaining the scale / normalization steps applied to layer activations prior to interpolation and weighted summation?
Args: | ||
forward_func (callable): The forward function of the model or any | ||
modification of it | ||
layers (torch.nn.Module or listt(torch.nn.Module)): A list of layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: list
else: | ||
msg = ( | ||
f"{norm} norming option not found or invalid. " | ||
+ "Available options: [gamma, normal, None]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Options should be gaussian, identity?
else: | ||
msg = ( | ||
f"{scale} scaling option not found or invalid. " | ||
+ "Available options: [smoe, std, mean, normal]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Missing some options, max identity, etc.?
|
||
from ..helpers.basic import BaseTest, assertTensorAlmostEqual | ||
from ..helpers.basic_models import BasicModel_ConvNet | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, would be great if these DataParallel / GPU tests could be added as well, thanks!
[0.5515, 0.5881, 0.6614, 0.6980], | ||
[0.7127, 0.7140, 0.7165, 0.7178], | ||
] | ||
self._fastcam_test_assert(net, layers, inp, ex, scale, norm, ex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add a few more tests here with multiple layers provided to confirm behavior and aggregation of multiple interpolated layer results. A simple addition could be testing with both conv1 and relu1 of this model, as well as another model where the output shape is different for the 2 layers.
) | ||
weighted_maps = [[] for _ in range(bn)] # type: List[List[Any]] | ||
for m, smap in enumerate(attributes): | ||
for i in range(bn): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this inner loop necessary? Can we just interpolate an input of N x 1 x H x W , which should be equivalent to interpolating each sample in the batch individually?
False, return the weighted maps individually. | ||
Default: True | ||
resize_mode (str, optional): An argument to interpolation method for | ||
rescaling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we include options here or link to available options?
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
1 similar comment
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Hello there, any updates on this one? |
@vivekmig is there a possibility you will merge this? If there is some work that needs to be done, LMK, I am more than willing to help. Thanks |
Hi Captum Developers,
We want to add FastCAM, a attribution method that uses information at the end of each network scale which is then combined into a single saliency map, to the current Captum repository. We implemented our method in a file named
multiscale_fast_cam.py
, and created tests cases for our method. I have attached the links to the method, including a jupyter notebook demo, below.Links:
paper
original repo
jupyter notebook demo