Use FX to have a more robust intermediate feature extraction #3597

fmassa · 2021-03-23T16:12:21Z

No description provided.

nairbv · 2021-03-29T10:54:21Z

    It has a strong assumption that the modules have been registered
    into the model in the same order as they are used.
    This means that one should **not** reuse the same nn.Module
    twice in the forward if you want this to work.

    Additionally, it is only able to query submodules that are directly
    assigned to the model. So if `model` is passed, `model.feature1` can
    be returned, but not `model.feature1.layer2`.

does this approach resolve both of those constraints?

fmassa · 2021-03-29T12:31:29Z

@nairbv yes, this FX-based approach addresses both of the aforementioned constraints from the current implementation in torchvision (under the assumption that FX can appropriately symbolically trace the model)

…layer_getter_2

datumbox

Sorry for snooping around before you mark the PR as complete; hope you don't mind. This feature is going to be super useful for some of the things I'm looking after, so I wanted to have an early sneak pick. 😄

I like the approach. Below I just highlighted few corner-cases. Let me know what you think.

datumbox · 2021-06-07T13:24:05Z

torchvision/models/_utils.py

+    # Get output node
+    orig_output_node: Optional[torch.fx.Node] = None
+    for n in reversed(m.graph.nodes):
+        if n.op == "output":


What happens in cases where we have multiple outputs (example Inception3 which got auxiliaries)? It seems that FX has another node called inception_outputs:

>>> list(reversed(m.graph.nodes)) [output, inception_outputs, fc, flatten_1, dropout, ....]

You can see this by replacing this input on your test:

model = torchvision.models.inception_v3(pretrained=False) return_layers = {'Mixed_7c': '0', 'avgpool': '1'}

datumbox · 2021-06-07T13:48:19Z

test/test_backbone_utils.py

+    def test_old_new_match(self):
+        model = torchvision.models.resnet18(pretrained=False)
+
+        return_layers = {'layer2': '5', 'layer4': 'pool'}


FYI, it fails when we include the final output in the return layers:

Suggested change

return_layers = {'layer2': '5', 'layer4': 'pool'}

return_layers = {'layer2': '5', 'layer4': 'pool', 'fc': 'fc1'}

with:

E RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x1 and 512x1000)

Thanks for the catch! I need to check more carefully, but the old implementation doesn't work in this case because of the torch.flatten call (which is not a nn.Module), but I believe this should work with the new implementation. To be verified

datumbox · 2021-06-07T13:49:36Z

torchvision/models/_utils.py

+        >>>     [('feat1', torch.Size([1, 64, 56, 56])),
+        >>>      ('feat2', torch.Size([1, 256, 14, 14]))]
+    """
+    # TODO come up with a better name for this


I think return_layers is fine. I understand it remaps but it still stores the mapping of the returned layers.

mthrok · 2021-06-07T16:36:20Z

@fmassa

I was exploring ways to achieve something similar and found this. This one looks simpler.

https://discuss.pytorch.org/t/how-can-l-load-my-best-model-as-a-feature-extractor-evaluator/17254/6

Is there an advantage/disadvantage in this PR's approach?

fmassa · 2021-06-07T18:20:29Z

Hi @mthrok

That's a good question, and I've in fact implemented a similar solution 4 years ago (although I don't think I have the code lying around anymore it looks like)

I've discussed about using hooks to get intermediate features in pytorch/pytorch#21064, but the current drawbacks compared to the current approach we have are the following:

we have to always forward the whole model, even though we don't need it all (wasting compute)
we always keep the same model parameters around
in principle, we might want to return elements which are not part of nn.Module outputs, but intermediates

From those perspectives, hooks can get us many of the things we are looking for, but if the model can be FX-traced, the FX-based approach can be a bit more powerful

mthrok · 2021-06-07T19:49:24Z

Hi @mthrok

That's a good question, and I've in fact implemented a similar solution 4 years ago (although I don't think I have the code lying around anymore it looks like)

I've discussed about using hooks to get intermediate features in pytorch/pytorch#21064, but the current drawbacks compared to the current approach we have are the following:

we have to always forward the whole model, even though we don't need it all (wasting compute)

we always keep the same model parameters around

in principle, we might want to return elements which are not part of nn.Module outputs, but intermediates

From those perspectives, hooks can get us many of the things we are looking for, but if the model can be FX-traced, the FX-based approach can be a bit more powerful

I see. Thanks for the clarification.

ppwwyyxx · 2021-07-12T18:09:52Z

This looks pretty useful! any plans on pushing it further?

alexander-soare · 2021-07-15T19:46:30Z

@datumbox @fmassa trying to do something similar for the timm library (cc @rwightman). Right now it uses two types of approaches: hooks and something along the lines of IntermediateLayerGetter. But ultimately the FX approach seems most flexible/robust as you say.

Unfortunately, only 42% of the models are traceable in their current state, with control flow being the most frequent blocker (and that's just from catching the first error). Other frequent issues are tensor constructors which need concrete arguments, or when we make use of Tensor.shape.

Do you know of any workarounds for these limitations of symbolic tracing which could be useful without having to touch the models (much)? For instance, using concrete args (I tried but the "concreteness" gets washed away when we trace through non-custom modules), forwarding a fully representative set of inputs for building up control flow paths, or customising the tracer class to deal with problem nodes?

I also wonder if there are any near future developments in the pipeline that will help with this.

Thanks!

fmassa · 2021-08-12T10:04:22Z

Hi,

Sorry @ppwwyyxx and @alexander-soare for the delay in replying, I missed the notifications as I was on holidays.

@ppwwyyxx yes, we would like to get this finalized and merged in torchvision sometime soon. Currently all classification models in torchvision work with this approach (and thus detection / segmentation models can be adapted as well), but as @alexander-soare pointed out there are many models in the community that wouldn't work out of the box.

I do have some ideas on how to push the FX-based approach to work for all models (with some caveats on what is possible to be obtained). The main idea is as follows:

Use FX to recursively trace each module
if a module can't be traced (e.g., due to control flow or unsupported feature), do not trace inside the module but instead keep the module as a leaf

This approach would enable all models to be traced, with the caveat that in the worst possible case the whole model would be a leaf node (and thus we would only be able to get its output and no other intermediate activation -- this case should be rare though).

Thoughts on this approach?

alexander-soare · 2021-08-12T12:51:01Z

@fmassa thanks for that, I actually went ahead and implemented some of those ideas. In case you find it useful, here's a kind of outdated write up from a different branch to where I'm currently working on this

One thing I'm still trying to work out is how to make model.train() and model.eval() retain its effect when there is control flow based on model.training.

fmassa · 2021-08-12T15:48:45Z

@alexander-soare Nice! Do you think you could work on getting your code from https://github.com/alexander-soare/pytorch-image-models/blob/fx-feature-extract-new/timm/models/fx_features.py to be submitted as a PR to torchvision when it's ready?

fmassa · 2021-08-12T15:50:42Z

About your question

One thing I'm still trying to work out is how to make model.train() and model.eval() retain its effect when there is control flow based on model.training.

I can reach out to some folks in the FX team to figure out a possible approach. It might probably involve tracing the model twice with different flags, and stitching it in FX somehow

alexander-soare · 2021-08-12T16:00:18Z

@fmassa would love to make that PR. What are the reasons this one hasn't gone forward so I can make sure I'm addressing them?

rwightman · 2021-08-12T18:54:07Z

@fmassa this approach for feat extraction feels like it's close to usable, would be great to hash out the final details and smooth out some of the wrinkles (mostly due to tracing limitations, flow control, etc). I'd like to add @alexander-soare 's work here to timm but still some testing and determination of whether there will be any show stoppers to use it for downstream tasks like obj detection, segmentation (undue constraints on the users of the downstream models wrt to scripting, training, exporting, checkpoint saving/loading, etc).

Also on the timm end I need to spend some time figuring how how to better specify the interface for selecting the features to use in deferent use cases (feature pyramid, attention maps, arbitrary taps, etc) for each model. It'd be good to know what you'd like the torchvision API for such functionality to cover so I can roughly match it and eventually use the torchvision code w/ timm feature specs...

fmassa · 2021-08-13T11:19:44Z

@alexander-soare

What are the reasons this one hasn't gone forward so I can make sure I'm addressing them?

There were only minor reasons that we didn't get this merged in torchvision yet (for classification models at least).

the current module to node assignment that we do in here has a few rough edges. Indeed, one node can belong to multiple modules (i.e., layer0.3.2 and layer0.3 and layer0 can represent the same tensor), but in the current approach only the last one is valid (i.e., layer0 will work, but layer0.3 and layer0.3.2 won't be visible). I would have liked to fix this before getting this merged, but I went on holidays for a few weeks and didn't get to finish it.
although all models in torchvision would work with this approach, I had given it a quick try on timm models and as you noted many models wouldn't work out of the box. So I was wondering if I should first adapt the tracing to make it work for timm models or just go ahead with the v1 version, and improve it over time.

I think we can get started with just fixing the first point I mentioned, and collaboratively work to get the more robust tracing working.

@rwightman ultimately I would love to see a generic solution for pytorch/pytorch#21064, and I think using FX can be a way for that, including for detection / segmentation models.
I do think we should go by steps here though. Even though FX allows for querying arbitrary nodes in the computation graph, it doesn't make guarantees (yet?) that the names will be consistent across versions. So querying arbitrary features (which are not the output of a nn.Module) is possible, but will probably not be "officially" supported until we come up with some more guarantees.

Also on the timm end I need to spend some time figuring how how to better specify the interface for selecting the features to use in deferent use cases (feature pyramid, attention maps, arbitrary taps, etc) for each model. It'd be good to know what you'd like the torchvision API for such functionality to cover so I can roughly match it and eventually use the torchvision code w/ timm feature specs

I've been trying to avoid specifying / exposing myself what should different "levels" of a model should be in torchvision, because it is a rather arbitrary decision and ultimately it's up to the user to decide what works best for their application.
The question was kind of easier to answer with resnet-style models because of the different "stages" (so one could just assume that the output of a stage is what we want), but there are newer models like ViT where the definition of stage is way less clear.

My take on this is that the user should just specify a list of strings corresponding to the modules they want to gather the information from, and maybe provide a helper function that prints / returns all possible layer names for a given model.
Something in the lines of:

model = resnet50()

possible_layers = get_all_layer_names_in_execution_order(model)
# now take some of the layers, proportional to the number of layers
n = len(possible_layers)
layers = [possible_layers[int(n * frac)] for frac in [0.25, 0.5, 0.75]]

new_model = get_intermediate_layers(model, layers)

which allows us to be "somewhat" generic.

For models that have some more structure, this metadata can also be present within the models itself (like an attribute .stages or something like that) which returns the layer names for different stages.
This leaves room for the user to either use some pre-selected layers, but also the flexibility to chose something else if they want.

Thoughts?

alexander-soare · 2021-08-13T12:24:12Z

@fmassa I believe my implementation covers your point 1. I actually got rid of this line from your implementation, meaning you won't get layer0 as it's not a leaf. And then if the user specifies a truncated qualified name like layer0 the intermediate_layer_getter will pick the last one in order of execution (so maybe layer0.3.2). Currently, this behaviour is silent, so might need to come up with a nice way to make sure the user knows it's happening.

Regarding your get_all_layer_names_in_execution_order, you can check that with print_graph_node_qualified_names from my branch.

import timm
from timm.models.fx_features import print_graph_node_qualified_names

model = timm.create_model('resnet50')
print_graph_node_qualified_names(model)

Still though, there are probably many unknown things to smooth out which will only become apparent when it's applied in a variety of use cases.

So, I'd suggest that timm could be an iterative test bed to start with. We could implement it with the foresight that there will be a "generic" tool which requires the user to specify the node names. Then around that we can wrap @rwightman 's interface. Then once we're ready, and if it makes sense, we can move the generic core to torchvision and it should just be a matter of changing the import paths in timm to torchvision. @fmassa this would just mean that we need to stay connected on the timm end to make sure it converges towards what's required for torchvision (or not, maybe we decide it differs at some point and we need to fork it - in which case I'd be happy to continue working on it in torchvision as well).

@rwightman @fmassa does that arrangement sound like it could work?

fmassa · 2021-08-13T14:41:44Z

This makes sense to me if it means you'll be able to move faster on this front.
Ultimately I would love if we could join efforts to get some generic tooling out for our users.

Depending on how generic the implementation is, I think it could even live within PyTorch fx folder as a set of helper functions, as this is the type of feature that I think could also be used in torchtext / torchaudio / etc.

alexander-soare · 2021-08-21T15:02:27Z

@fmassa nevertheless I've gone ahead and made a draft to help keep this moving along. I know it's a bit of a u-turn from my suggestion above, but I realised it's mostly done... I've moved the convo there.

fmassa · 2021-09-08T08:46:02Z

Superseeded by #4302

Use FX to have a more robust intermediate feature extraction

395c053

facebook-github-bot added the cla signed label Mar 23, 2021

fmassa mentioned this pull request Apr 13, 2021

Add test to check that classification models are FX-compatible #3662

Merged

fmassa added 3 commits April 14, 2021 13:52

Raise error if requested output is not present

2cd8e84

Rename to not replace old impl for now

6187770

Merge branch 'master' into intermediate_layer_getter_2

fbbbf18

fmassa mentioned this pull request May 10, 2021

[models] IntermediateLayerGetter doesn't handle well architectures with modules referenced multiple times #3802

Closed

fmassa added 2 commits May 15, 2021 11:03

Dump commit to work on something else

d72f89e

Merge branch 'master' of github.com:pytorch/vision into intermediate_…

41caaa9

…layer_getter_2

fmassa mentioned this pull request Jun 7, 2021

[proposal] Use self.flatten instead of torch.flatten and when becomes possible derive ResNet from nn.Sequential (scripting+quantization is blocker), would simplify model surgery in the most frequent cases #3331

Open

datumbox reviewed Jun 7, 2021

View reviewed changes

fmassa mentioned this pull request Jun 21, 2021

r3d_18(3D_resnet) do not work!! (When I import it in my torch.nn.Module) #4083

Closed

fmassa mentioned this pull request Aug 13, 2021

[JIT] Expose subgraph execution for intermediate output extraction pytorch/pytorch#21064

Open

alexander-soare mentioned this pull request Aug 21, 2021

Add FX feature extraction as an alternative to intermediate_layer_getter #4302

Merged

3 tasks

fmassa closed this Sep 8, 2021

fmassa deleted the intermediate_layer_getter_2 branch September 8, 2021 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use FX to have a more robust intermediate feature extraction #3597

Use FX to have a more robust intermediate feature extraction #3597

fmassa commented Mar 23, 2021

nairbv commented Mar 29, 2021

fmassa commented Mar 29, 2021

datumbox left a comment

datumbox Jun 7, 2021

datumbox Jun 7, 2021

fmassa Jun 7, 2021

datumbox Jun 7, 2021

mthrok commented Jun 7, 2021

fmassa commented Jun 7, 2021

mthrok commented Jun 7, 2021

ppwwyyxx commented Jul 12, 2021

alexander-soare commented Jul 15, 2021 •

edited

Loading

fmassa commented Aug 12, 2021

alexander-soare commented Aug 12, 2021 •

edited

Loading

fmassa commented Aug 12, 2021

fmassa commented Aug 12, 2021

alexander-soare commented Aug 12, 2021

rwightman commented Aug 12, 2021

fmassa commented Aug 13, 2021

alexander-soare commented Aug 13, 2021

fmassa commented Aug 13, 2021

alexander-soare commented Aug 21, 2021 •

edited

Loading

fmassa commented Sep 8, 2021

	return_layers = {'layer2': '5', 'layer4': 'pool'}
	return_layers = {'layer2': '5', 'layer4': 'pool', 'fc': 'fc1'}

Use FX to have a more robust intermediate feature extraction #3597

Use FX to have a more robust intermediate feature extraction #3597

Conversation

fmassa commented Mar 23, 2021

nairbv commented Mar 29, 2021

fmassa commented Mar 29, 2021

datumbox left a comment

Choose a reason for hiding this comment

datumbox Jun 7, 2021

Choose a reason for hiding this comment

datumbox Jun 7, 2021

Choose a reason for hiding this comment

fmassa Jun 7, 2021

Choose a reason for hiding this comment

datumbox Jun 7, 2021

Choose a reason for hiding this comment

mthrok commented Jun 7, 2021

fmassa commented Jun 7, 2021

mthrok commented Jun 7, 2021

ppwwyyxx commented Jul 12, 2021

alexander-soare commented Jul 15, 2021 • edited Loading

fmassa commented Aug 12, 2021

alexander-soare commented Aug 12, 2021 • edited Loading

fmassa commented Aug 12, 2021

fmassa commented Aug 12, 2021

alexander-soare commented Aug 12, 2021

rwightman commented Aug 12, 2021

fmassa commented Aug 13, 2021

alexander-soare commented Aug 13, 2021

fmassa commented Aug 13, 2021

alexander-soare commented Aug 21, 2021 • edited Loading

fmassa commented Sep 8, 2021

alexander-soare commented Jul 15, 2021 •

edited

Loading

alexander-soare commented Aug 12, 2021 •

edited

Loading

alexander-soare commented Aug 21, 2021 •

edited

Loading