Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported OpTypes #69

Open
jbrea opened this issue Mar 15, 2022 · 5 comments
Open

Unsupported OpTypes #69

jbrea opened this issue Mar 15, 2022 · 5 comments

Comments

@jbrea
Copy link

jbrea commented Mar 15, 2022

Cool package!

Playing around with it I ran into OpType not supported errors, e.g. when I tried loading https://github.com/onnx/models/raw/main/vision/classification/efficientnet-lite4/model/efficientnet-lite4-11.onnx (Transpose) or https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx (Clip).

How much work would it be to add support for further OpTypes?

@DrChainsaw
Copy link
Owner

How much work would it be to add support for further OpTypes?

A large number of OpTypes are trivial to implement and and pretty sure that Clip is one of them. Reason why there are not alot of those trivial OpTypes implemened is just that I like to keep the amount of code low to make refactoring easier as I'm not 100% happy with the internal structure and scope of the package.

Transpose also seems trivial in principle except for the fact that this package has the additional functionality that users shall be able to expect that graph mutation operations (e.g changing the size of a parameter array) keeps all parameters in the model consistent in shape and that the graph to the largest possible extent represents the same function.

For many "traditional" neural network layers this is already built in to NaiveNASflux so there is no further action needed, but Transpose does not fall into that category unfortunately. One would need to add MILP constraints to fulfil the above for it and that might be very non-trivial depending on which axes are permuted.

It's not a religuous issue for me though and I suppose I could accept breaking this aspect (perhaps display a warning if the user mutates the model so that a non-mutation supported Op is hit).

@jbrea
Copy link
Author

jbrea commented Mar 16, 2022

Thanks a lot for the detailed response! Meanwhile I also stumbled over FluxML/ONNX.jl#49. What makes more sense: adding OpTypes here or in ONNX.jl (which has the same issues)?

@DrChainsaw
Copy link
Owner

Thanks for showing interest!

What makes more sense: adding OpTypes here or in ONNX.jl (which has the same issues)?

This might be a bit of a tangent, but hopefully it clarifies things:
The main difference between this package and ONNX.jl is that this package tries to convert an ONNX model to a Julia model using the layer definitions from Flux while ONNX.jl has the goal of making Julia AD compatible models without the constraint of using Flux definitions. In some sense, ONNX.jl is pretty much the equivalent of a new framework which happens to be designed based on the ONNX spec (although I can and does leverage NNlib quite a bit to make it slightly less daunting than it might appear).

I would say that the approach used in this package is doomed to hit a dead end before the whole ONNX spec is covered and ONNX.jl should not have this problem. As a simple example, the ONNX spec is defined so that it is perfectly valid to have a layer which acts as Flux.Conv but has both the weights and what Flux considers the input as trainable parameters while the bias vector is output from another layer in the model. That might not mean that the package is useless as it might still turn out that the set of models it can handle is large enough to be mostly useful.

I'm still maintaining this package but as described above I'm trying to keep the feature growth to a minimum so that any eventual refactoring will be easier. One way I see things evolving is that this package depends on ONNX.jl for most of the work and just adds some NaiveNASflux specific things so that e.g. doing NAS seeded by ONNX models is something the package is capable of.

Currently this is far off so right now if one wants to load the models from the OP my guess is that the path of least resistance is to add them here. Let me know if you have use for them and I can add the missing OpTypes. I got the impression that you were just trying out the package and for that purpose I would be less motivated to add them :)

@jbrea
Copy link
Author

jbrea commented Mar 16, 2022

Wow, thanks for your answer! I don't have urgent need for the missing OpTypes. I wrote this little tutorial on transfer learning for my students (and I wasn't really sure whether I should use ONNX or this package; I had both versions at some point). My students may wonder at some point, if it is possible to load other models and run into missing OpTypes. For a moment I considered adding some OpTypes myself, but for now I'm fine with telling the students that there would be a little bit of work to be done, if they ever run into this issue.

@DrChainsaw
Copy link
Owner

Feel free to suggest your studens to file issues and PRs here if they are interested. I'll try to make it a good experience (as long as they stay away from transformers so I don't need to think about Gather).

Tutorial looks very good btw.

I kinda wish I would have made replacing the last layer a simpler thing in NaiveNASlib given that this is what you do in 99.9% of transfer learning cases (much less so in NAS though). For now, one needs to do something like this:

using NaiveNASlib.Advanced # To bring strategies into namespace

nlabels = 100
newmodel = deepcopy(resnet)

# Remove the old head
# We don't want NaiveNASlib to automatically change the size so that the model has the same output shape
remove!(newmodel[end], RemoveStrategy(NoSizeChange()))
# Insert a new head, or maybe do this after training to make it easier to store the input to the head
insert!(newmodel[end-1], v -> fluxvertex(Dense(nout(v), nlabels), v))

# output of layer before last is the new layer we added, ugh...
newmodel = CompGraph(inputs(newmodel), outputs(newmodel[end-1]))

newhead = newmodel[end]
#Train newhead only...

I suppose one could also just resize the last layer and use whatever is left as initialization (or just re-init the params):

newmodel = deepcopy(resnet)

Δnout!(newmodel[end], nlabels - nout(newmodel[end]))

nout(newmodel[end])
# prints 100

Note that the above works even if the last layer is just a (flattened) global pooling layer since NaiveNASlib propages the size change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants