-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ Models #728
C++ Models #728
Conversation
Codecov Report
@@ Coverage Diff @@
## master #728 +/- ##
==========================================
+ Coverage 62.69% 62.71% +0.01%
==========================================
Files 65 65
Lines 5080 5101 +21
Branches 761 765 +4
==========================================
+ Hits 3185 3199 +14
- Misses 1678 1680 +2
- Partials 217 222 +5
Continue to review full report at Codecov.
|
this is really great, thanks a lot for the PR. |
Thanks @soumith |
Hi @soumith |
it's our fault for letting this drop off. I pinged @goldsborough @yf225 and @fmassa , and we'll review and help on this early next week, i.e. monday / tuesday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks a lot for the PR and sorry for the long time to review.
I went over the implementation of the models, by comparing them line by line with the Python versions, and they look good to me (minus some comments in the code).
There are a few orthogonal points that I'd like to make regarding the C++ style:
- there are tabs everywhere in the implementation. Can we replace those with 2 spaces?
- do we want to enforce some style guidelines?
- I think it would be good to have the pre-trained weights from Python to be converted to C++, so that the users can just download them and start using it right away. @goldsborough what's the recommended format in this case, and is there a simple way of achieving this?
- it would be great to have some kind of tests that ensure that the outputs are the same as their corresponding Python version. A potential way of handling this would be to forward the same tensor (a serialized image maybe?), given the same initialization (a pre-trained model maybe?) and assert that the output tensor is the same.
Thoughts?
Hi @fmassa regarding comments: The comments that are currently present are copied form python code. What is wrong with them? regarding tabs and guidelines: I like tabs instead of spaces and I've made myself a custom clang-format style based on google with tabs and breaks before the brace that begins blocks. But I checked and Pytorch is apparently using google style so I'll change it. regarding testing: once we manage to add pretrained models to C++ models testing it the way that you said will be easy. This can be C++ API's tests and every time python API changes these tests will fail indicating that the code needs to be changed. |
Please use https://github.com/pytorch/pytorch/blob/master/.clang-format to format your code. |
Nothing wrong, I left comments in the code, that's what I meant :-)
Yes. This will also mean adding tests to the python models, which I can do later as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good overall! My main request is to remove the custom MaxPool
and ReLU
modules and use torch::nn::Functional
instead -- I've left inline comments about this
isn't the explicit call to a modules forward method unnecessary since the operator overloading of the call op? removing those would make the code both less verbose and more pythonish... |
Hi @goldsborough |
I'll look into the |
As far default arguments and |
@goldsborough |
Remaining tasks:
|
Hi @goldsborough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might still be a few differences left in the initialization
@ShahriarSS about pre-trained weights, I think we should convert straightaway the models from torchvision. @goldsborough @yf225 what would be the recommended way here to convert a |
@fmassa I looked through the C++ API, and saving the torchvision models as JIT models and loading them into C++ might be the best way to share models between Python and C++ (https://pytorch.org/tutorials/advanced/cpp_export.html). Would it be feasible to do? |
@fmassa @ShahriarSS I think this is the recommended way to share pre-trained weights from Python to C++:
This is a minimal example: import torch
class Model(torch.jit.ScriptModule):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(2, 2)
model = Model()
print(model.linear.weight) # For comparison
model.save('model1.pt') C++ frontend model: struct Net : torch::nn::Module {
Net() {
linear = register_module("linear", torch::nn::Linear(2, 2)); // Exactly match the Python model
}
torch::nn::Linear linear{nullptr};
}; Load parameters of JIT model into C++ frontend model: int main(int argc, const char* argv[]) {
auto module = std::make_shared<Net>();
std::cout << module->linear->named_parameters()["weight"] << "\n"; // before loading pre-trained params
torch::load(module, argv[1]);
std::cout << module->linear->named_parameters()["weight"] << "\n"; // after loading pre-trained params
return 0;
} |
I was thinking of the same method. I think that the models that I have written match the structure of the python models but I'm not sure. I will try this and If I run into any problem I'll inform you. |
@yf225 @fmassa
I checked and BatchNorm only has weight and bias parameters but there is no "running_variance". I can push the code that converts the models if you want to. |
The Python BatchNorm has |
Summary: Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (pytorch/vision#728 (comment)): ``` terminate called after throwing an instance of 'c10::Error' what(): No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27) frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so) frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1) frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1) frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1) frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1) frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1) frame #6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) frame #10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6) frame #11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest) ``` Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem. This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites. Pull Request resolved: #17371 Reviewed By: goldsborough Differential Revision: D14172775 Pulled By: yf225 fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6
@ShahriarSS The PR pytorch/pytorch#17371 is merged and the libtorch nightly build in the next few days should fix the BatchNorm running_variance problem. |
@yf225 I will compile pytorch from source to get it faster |
Hi @yf225
Is there anyway that we can replace these names with their original names (indexes in Sequential I think)? |
Hi @fmassa can you help me with the above problem? |
Hi @fmassa |
Can you put the models in evaluation mode? Many of the models have dropout / batch norm, and those layers have non-determinism during training mode. |
@fmassa putting models in eval mode fixed the issue but the test for googlenet still fails. I'll look into it. |
Hi @fmassa |
Hi @fmassa what's next? |
Hi @ShahriarSS , Sorry for the delay in replying, I was very busy with the 0.3 release. Looks like there is a merge conflict, can you fix that? Once you fix it and tests are passing this will be ready to merge, thanks! |
@ShahriarSS tests are still failing, can you have a look? |
@fmassa Tests are passing now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the awesome work @ShahriarSS !
Great. I will update C++ models when python ones are updated. Also looking forward to implementing transforms. |
Thanks for all of this @ShahriarSS its great stuff! To get it to compile on visual studio, I had to change the 'or' to '||' in line 33 csrc/models/resnet.cpp. Should this be changed to keep those (at least 3) of us using windows happy? |
@philipNoonan yes, can you send a PR fixing it? |
Small fix to allow for the built library to be used in windows pytorch#728
* Enabling exporting symbols on windows Small fix to allow for the built library to be used in windows #728 * added macro to allow for exported symbols on windows * added macro to allow for exported symbols on windows * removed cmake command * added dllimport using torchvision_EXPORTS preprocessor
Hi @fmassa
I have implemented C++ models as discussed here. But I ran into some problems:
If you can help me solve these issues I will be able to finish the code.