-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor repo with more complete models and documentation #70
Refactor repo with more complete models and documentation #70
Conversation
Currently tests are failing without FluxML/Flux.jl#1305 |
Nice, great work!. However, because there is missing functionallity from previous version, I think it could be better to update it in another branch, and when the missing functionality (specially the pre-trained weights and missing models) merge it with master. It is a topic of style, but I way it to avoid problems to people that could be using #master version. Don't think I don't believe that MetalHead should be changed, I'm convinced that it should be clearly improved.But we must avoid temporarily generating problems to users. |
I guess I can't mark a PR as a draft after it's already been created? But I added a disclaimer at the top of the PR. I agree, we shouldn't merge this into |
(There should be a little text field under the reviewers area marked "convert to draft") |
Why is this removing densenet and squeezenet? |
It isn't. I just haven't updated the PR to have them which is why I marked it as WIP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rishabhvarshney14 looks good. I added a couple of comments to use utility functions instead of repeating code.
Are you interested in working on the other remaining models (SqueezeNet and DenseNet)?
A comment about the commit about inception, thank you for that, sorry but I do not know how to comment the commit, I think the test has a small error, because in the test the model is googlenet, not inception. |
Good catch! @rishabhvarshney14 can you address the test error too? |
Awesome, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! It looks great. After these changes, I'll spin some cycles on my GPU to add pre-trained weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks. I'll add the pre-trained weights, and then we can merge this PR!
@darsnack, referencing this comment, do you mean to train the models from scratch on Imagenet dataset myself or writing the code for training? Since it is such a massive dataset, It would be difficult for me to train it myself but I can surely write the code which can be used by you/another contributor with the appropriate resources to train the model. |
Yeah even the code would help. Can you run for a few epochs? If that's feasible, then just doing that and posting a gist of the script that trains the models will allow me to just run that script on GPU machine. Then we'd have the trained weights in short order. |
@darsnack, I have created the gist for training alexnet using the architecture here but haven't tried running it though. |
Thanks @cyborg1995, I'll give it a go tonight and report back. |
@darsnack @cyborg1995 Sorry if I misunderstand this but in Transformers.jl it uses pretrained weights of Hugging Face models to provide pretrained models in Julia/Flux. It might be better if we use pretrained weights of other models and use it here instead of training it from the start. |
@rishabhvarshney14 Unfortunately, that isn't a simple translation since a Flux model and a PyTorch model aren't the same structure. I think it's an easier "once and done" to just train from scratch. It's just a question of someone with GPUs (e.g. me) to have the bandwidth to do it. |
Actually I take that back. There would be a lot of parsing work involved to map the pre-trained weights from PyTorch to the corresponding layers in the Flux models. If someone wants to attempt to tackle that, then that would be great. Otherwise, I'll have to find some time to train these models later this weekend. |
Not quite. It's easy to read the weights in with ONNX.jl, but since ONNX serializes to more primitive operations, you need some translation code to figure out which higher level Flux layer contains the convolution etc. that you read from ONNX. |
I think doing it from ONNX would be easier. I'll try to figure it out. |
@cyborg1995 You might look at ONNXmutable.jl which is more up to date/maintained. If you do figure out some translation code that works, then do you mind sharing it in this issue? Having a good translation flow from ONNX to Flux is one of our goals. |
I think I'd prefer to have the backbone and classifier separated since there are many models that drop the classifier to use resnet as a backbone, so this should make it easier. Any reason why we couldn't achieve the same results of "indexing" using structs. It seems much more Julian. |
It would still be easy to drop the classifier. I would argue the struct approach is not Julian and more Python. Defining tons of little classes or wrapper classes is very PyTorch-like. You achieve the same "indexing" result, but if all you need is named indexing, then I don't see why you wouldn't use a More importantly, going with the struct solution means we only solve the naming problem for these models in this repo. It isn't a flexible solution that helps anyone else. I'd say that on top of anything is the best reason to solve this more generically. |
Well if we see Chains everywhere anyway, its harder to tell what exactly is happening in the forwards pass, and named tuples aren't great for when you have large objects inside them. structs are good for containerisation and labelling, printing etc. It makes it easier to track how something is implemented too. I agree we should have a nicer solution to avoid the boilerplate, which is where the functional aspect is better. Not a fan of boilerplate here :) |
Its good that Functors can already handle most of the plumbing we would want from these models |
We might need to ease up on the batch size for CI. Could we make it 2 or something? |
Is there a performance degradation when using I'm still a bit unconvinced about this issue. Can we solve this issue in another PR to Metalhead or Flux (depending on which solution we select)? Right now, the outermost object is a struct. It's possible to swap out parts of the model easily like I described above. We can always enrich the repo later instead of bikeshedding on this issue now. |
Wouldn't you want a separate show method for different containers? Looking at Chains is the bottleneck to me. I'd love to see this in once CI is happy. We should try to bring in bindings for datasets etc back and have docs using FluxML - we can do that later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be too much for GA to handle large models on CI - we'll set up a buildkite but in the meanwhile.
Thanks all! I'll let this be on master for a couple days before releasing. |
Great! I'll update the install instructions in FastAI.jl as soon as a new release is tagged. |
Any chance we could a new release tagged? This was merged quite a while ago. |
I also want to know the status of the training pipeline for pretrained weights as a todo. I plan on releasing this this week or next. |
We also need to get rid of some unnecessary typing for example Metalhead.jl/src/squeezenet.jl Line 60 in 3f458cd
|
So would you want to just leave them untyped? |
I think it would be fine to have some kind of helpful hints that are useful for dispatch, but avoid a massive type when its not necessary for inference. Currently, this would yield a massive type, which slows down the compiler and type inference significantly. Since the layers themselves are typed already, we get the inference either way. In fact, we may want to consider dropping the type param from Chain and checking the impact of that on small models. Both latency and runtime wise. |
Can't find my earlier comment on this, but could we define type aliases for certain basic blocks in order to help with display? AIUI that was a big reason for having dedicated types in the first place, right? |
Are we waiting for some more work before tagging a new release? |
Yeah, there's a few cleanups that we need before a full release. I'll get a pr in a couple days. |
WIP: DO NOT MERGE!This is a major refactor to Metalhead.jl with the following changes:
Overall, the repository will now operate as the de-facto source of vision models for the Flux ecosystem (similar to
torchvision.models
). The codebase has been slimmed down so that we can provide users with a lightweight dependency to get standard pre-trained models. The models are also updated to use the latest layers in Flux likeSkipConnection
,Parallel
, and adaptive pooling.Most importantly, the model is code is now more flexible. For example, the ResNet code is generic enough that users can extend it to create variants of ResNet that are not in the original paper. This is useful for research where varying model architecture parameters is important (e.g. double descent). We also provide all the ResNet variants from the paper, not just ResNet-50 like most other ecosystems. The same can be said for other models like VGG, etc.
Future work:
alexnet(future PR)vgg11(future PR)vgg11bn(future PR)vgg13(future PR)vgg13bn(future PR)vgg16(future PR)vgg16bn(future PR)vgg19bn(future PR)resnet18(future PR)resnet34(future PR)resnet101(future PR)resnet152(future PR)densenet_161(future PR)densenet_169(future PR)densenet_201(future PR)This PR will supersede #69. cc @DhairyaLGandhi