Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding FastCAM Functionality #442

Open
wants to merge 48 commits into
base: master
Choose a base branch
from
Open

Conversation

ryanchankh
Copy link
Contributor

Hi Captum Developers,

We want to add FastCAM, a attribution method that uses information at the end of each network scale which is then combined into a single saliency map, to the current Captum repository. We implemented our method in a file named multiscale_fast_cam.py, and created tests cases for our method. I have attached the links to the method, including a jupyter notebook demo, below.

Links:
paper
original repo
jupyter notebook demo

@vivekmig
Copy link
Contributor

No worries about the delay! I am happy to keep updating my current implementation with the newest codebase you have at master. For instance, I see that you just added Multi-layer support, which is exactly something that I can use, and I am happy to update that within this week.

I can also add the FastCAM to the Awesome List for now. The implementation for FastCAM is actually fairly simple and straightforward. All the functions you will need are already implemented, and should be easy to update if your guidelines do change.

The results in the paper are already reproduced using the implemented I had. You can see it in the demo here. I admit that test cases can be updated in a more thorough manner. I am happy to wait for your feedback and suggestions.

Thank you for getting back to me! For now, let me add FastCAM to the awesome list and update my code with the multi-layer support.

Awesome, sounds great! I have merged my multi-layer support diff, so you should be able to rebase with that now. We'll also take a closer look next week and provide more detailed code review comments.

In the meantime, the addition to the AwesomeList looks good! Just have a minor style change request there, can merge that soon. Thanks again for the great contribution 👍 !

@ryanchankh
Copy link
Contributor Author

I have updated the code such that it uses the multi-layer LayerActivation() code. What other updates should I add to my code to match the contribution guidelines?

Thanks~

@vivekmig
Copy link
Contributor

I have updated the code such that it uses the multi-layer LayerActivation() code. What other updates should I add to my code to match the contribution guidelines?

Thanks~

Thanks for updating with LayerActivation @ryanchankh ! I am finishing a review and will share comments soon, by the end of the week.

Copy link
Contributor

@vivekmig vivekmig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back with feedback! Thanks again for adding this to Captum, it looks good overall @ryanchankh . I've added some comments on the API and tests, let us know if you have any questions.

self,
forward_func: Callable,
layers: ModuleOrModuleList,
norm: Any = "gamma",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: String

Note that currently it is assumed that either the input
or the outputs of internal layers, depending on whether we
attribute to the input or output, are single tensors.
Support for multiple tensors will be added later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can update / remove these lines, LayerActivation already supports multiple tensors, but I don't think it's applicable for FastCAM, would be good to document that limitation as well.

additional_forward_args: Any = None,
attribute_to_layer_input: bool = False,
) -> Tuple[Tensor, ...]:
r"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all other input attribution methods, we've generally followed the method of having attribute return attributions matching the input shape. Could we possibly keep that convention here to return the combined results and provide another instance method to obtain just the normalized activations for each layer?


The recommended use case for FastCAM is to compute saliency maps for multiple
layers with different scales in a deep network, then combine them to obtain
a more meaningful saliency map for the original input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add more details summarizing the approach as well as explain any limitations, e.g. this is primarily intended for CNN architectures, or particularly where intermediate layers are spatially aligned with the input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more details here would be great if possible!

th = k * m
return th

def _compute_gamma_norm(self, inputs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems in the paper only normalizing with the Gaussian CDF is mentioned, could we possibly keep that as the default to be consistent?

It might be also worth considering whether it's essential to include this gamma normalization code here. This makes the implementation trickier to follow, and is not essential to the core method. Can we possibly only support Gaussian normalization for now and not include this here? From the original repo, it seems that GammaNorm leads to only a very slight improvement over Gaussian normalization, but substantially slower, and isn't mentioned in the paper.

Additionally, it seems that this code is very similar to that of the normalization in the original repo (https://github.com/LLNL/fastcam/blob/master/norm.py), I'm not sure if this could cause any copyright issues, since the copyright header needs to generally be included for reproductions. cc: @NarineK

)
attributes = []
for layer_attr in layer_attrs:
smoe_attr = self._compute_smoe_scale(layer_attr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expose some baseline approaches here in addition to SMOE scale for combining channels? From a quick skim of the paper, it seems that standard deviation also performs reasonably well compared to other methods. It would be great to have the flexibility to experiment with different approaches of combining information from activations, particularly the simple baselines of standard deviation and mean.



class Test(BaseTest):
def test_one_layer_gamma(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these tests all hardcode expected attributions for BasicModel_ConvNet, which randomly initializes parameters. PyTorch doesn't guarantee that random seeds lead to consistent behavior across versions / releases, so this could potentially lead to issues in the future. It would be preferable to instead use simple models with fixed parameters to ensure reproducibility. It would also be great to manually confirm the expected result, which can be done with small models. A potential model that can be used here is BasicModel_ConvNet_One_Conv, which has fixed parameters for the convolution layer.

ex["attributions"] = [
[
[
26.5969,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, how were these example results obtained? Also, when updating, it would be ideal to have mostly smaller examples, so the tests have fewer hardcoded values and would be easier to read and verify correctness (e.g. for LayerActivation here).

weights = [1.0, 1.0]
self._fastcam_test_assert(net, inp, ex, layers, norm, combine, weights)

def _fastcam_test_assert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to include type hints here, as well as all utility methods in the main implementation.


from ..helpers.basic import BaseTest, assertTensorAlmostEqual
from ..helpers.basic_models import BasicModel_ConvNet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to add DataParallel tests and ensure all functionality works appropriately on GPU. You can take a look at the test generator here: https://github.com/pytorch/captum/blob/master/tests/attr/test_data_parallel.py, which is based on this config https://github.com/pytorch/captum/blob/master/tests/attr/helpers/test_config.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, would be great if these DataParallel / GPU tests could be added as well, thanks!

@ryanchankh
Copy link
Contributor Author

Hey @vivekmig, thank you for the feedback! And don't worry about the delays. Once I have everything fixed up and added, I'll let you know again in this thread.

@vivekmig
Copy link
Contributor

Hi @ryanchankh , just wanted to follow up on this, will you still be able to make the updates to this PR? Thanks! cc: @NarineK

@ryanchankh
Copy link
Contributor Author

Sorry I got caught up with some of my other projects this past month. I am aiming to finish updating this pull request by the end of this month. Is this timeline okay?

@vivekmig
Copy link
Contributor

Sorry I got caught up with some of my other projects this past month. I am aiming to finish updating this pull request by the end of this month. Is this timeline okay?

No problem, sure, that sounds great, thanks!

@facebook-github-bot
Copy link
Contributor

Hi @ryanchankh!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@ryanchankh
Copy link
Contributor Author

ryanchankh commented Jan 16, 2021

Hi @vivekmig, I recently updated the captum/attr/_core/multiscale_fast_cam.py file based on your suggestions from above. In short, these are the changes:

  1. I added the Gaussian normalization method, and made it the default choice for nomalization.
  2. I added the mean, std, and normal entropy method scaling methods. This is to address the suggestion of having comparisons between different scales versus SMOE scale.
  3. I also modified the code such that now the output shape of the attribute method matches with the shape of the input. Essentially, there is an argument called combine (Defaults to True) that tells the method to combine the different scales. If the user want to get the individual scaled maps, one can simple set combine=False. Hopefully this better align with the code philosophy for this repository.
  4. Since the method has changed quite a bit, I also modified the test cases, and use smaller inputs for better readability. These outputs are based on the original repo (link) which are implemented by the original author Nathan. The test cases are quite simple, so please feel free to suggest if there should be any more comprehensive test cases. Most test cases I created references the ones from other attribution methods such as GradCAM.
  5. I also updated the doc string for the method. Since there are more arguments, please kindly look over them to see if they are clear to the user.

As to the copyright issue, I am actually an employee at LLNL, working with the original authors. So there shouldn't be any copyright issues from the LLNL side. But if there is anything we need to do to ensure everything is okay legal-wise, I am willing to add any changes.

Last but not least, I am having a hard time passing some of the automatic tests. It seems like it has something to do with the files I didn't touch. What is the best way to resolve these issues?

Sorry for the long wait! And we really appreciate the feedback! Thank you!

Copy link
Contributor

@vivekmig vivekmig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments @ryanchankh , this looks great :) ! I've added a few more minor comments. @NarineK may also take another look to provide some additional feedback.

Regarding the CircleCI tests, I think the errors are related to issues we have fixed previously, can you try pulling the latest master into your branch? That should likely resolve the issues.


The recommended use case for FastCAM is to compute saliency maps for multiple
layers with different scales in a deep network, then combine them to obtain
a more meaningful saliency map for the original input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more details here would be great if possible!

r"""
Args:

inputs (tensor or tuple of tensors): Input for which attributions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, it looks like inputs should only be a single tensor based on

bn, channels, height, width = inputs.shape

If so, can we update this docstring and the type hint to reflect this?

are provided, the examples must be aligned appropriately.
scale (str, optional): The choice of scale to pass through attributes.
The available options are:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some detail to clarify that all these methods are different methods of combining elements on the channel dimension (for each spatial location independently) and resulting in size from N x C X H x W to N x 1 x H x W? This might also be clarified by more details in the algorithm description explaining the scale / normalization steps applied to layer activations prior to interpolation and weighted summation?

Args:
forward_func (callable): The forward function of the model or any
modification of it
layers (torch.nn.Module or listt(torch.nn.Module)): A list of layers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: list

else:
msg = (
f"{norm} norming option not found or invalid. "
+ "Available options: [gamma, normal, None]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Options should be gaussian, identity?

else:
msg = (
f"{scale} scaling option not found or invalid. "
+ "Available options: [smoe, std, mean, normal]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing some options, max identity, etc.?


from ..helpers.basic import BaseTest, assertTensorAlmostEqual
from ..helpers.basic_models import BasicModel_ConvNet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, would be great if these DataParallel / GPU tests could be added as well, thanks!

[0.5515, 0.5881, 0.6614, 0.6980],
[0.7127, 0.7140, 0.7165, 0.7178],
]
self._fastcam_test_assert(net, layers, inp, ex, scale, norm, ex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add a few more tests here with multiple layers provided to confirm behavior and aggregation of multiple interpolated layer results. A simple addition could be testing with both conv1 and relu1 of this model, as well as another model where the output shape is different for the 2 layers.

)
weighted_maps = [[] for _ in range(bn)] # type: List[List[Any]]
for m, smap in enumerate(attributes):
for i in range(bn):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this inner loop necessary? Can we just interpolate an input of N x 1 x H x W , which should be equivalent to interpolating each sample in the batch individually?

False, return the weighted maps individually.
Default: True
resize_mode (str, optional): An argument to interpolation method for
rescaling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we include options here or link to available options?

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

1 similar comment
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@krebso
Copy link

krebso commented Aug 26, 2023

Hello there, any updates on this one?

@krebso
Copy link

krebso commented Sep 4, 2023

@vivekmig is there a possibility you will merge this? If there is some work that needs to be done, LMK, I am more than willing to help. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants