ResNet model takes forever to load in Metalhead 0.8 compared to 0.7. #247

christiangnrd · 2023-07-16T17:10:21Z

I've been looking to change over some code from using Metalhead 0.7 to 0.8 and I've noticed that loading the pretrained ResNet in 0.8 takes an abnormally long time to load.

The pretrained weights are loaded for each test. Without pretrained weights there is still a 3x slowdown, but it is MUCH worse when loading the weights.

Metalhead 0.7.4:

julia> using Metalhead

julia> @time model = ResNet(152, pretrain=false).layers[1];
  1.234614 seconds (1.47 M allocations: 556.321 MiB, 4.13% gc time)

julia> using Metalhead

julia> @time model = ResNet(152, pretrain=true).layers[1];
  4.520097 seconds (6.87 M allocations: 1.314 GiB, 2.11% gc time)

Metalhead 0.8.1:

julia> using Metalhead

julia> @time model = ResNet(152, pretrain=false).layers[1];
  4.833957 seconds (5.21 M allocations: 801.770 MiB, 1.94% gc time)

julia> using Metalhead

julia> @time model = ResNet(152, pretrain=true).layers[1];
161.141156 seconds (16.60 M allocations: 1.897 GiB, 0.14% gc time)

The text was updated successfully, but these errors were encountered:

theabhirath · 2023-07-17T01:50:13Z

Thank you for the bug report! I can reproduce this on my machine. The regression in times without the weights is something that is probably caused by more function calls to construct the model in 0.8 vs 0.7. This is because the model is powered by a more elaborate function that allows for more flexibility, and so a one time cost there is probably acceptable? I will investigate if this can be made faster, though.

The weights seem to be taking a little too long even with that allowed for, so it's probably hitting an edge case on the first compile. Subsequent calls seem to be fine on Metalhead 0.8:

julia> using Metalhead

julia> @time model = ResNet(152, pretrain=true).layers[1]; # cold start
196.966291 seconds (16.40 M allocations: 1.887 GiB, 0.08% gc time, 99.67% compilation time)

julia> @time model = ResNet(152, pretrain=true).layers[1];
  0.759214 seconds (2.62 M allocations: 1.018 GiB, 10.27% gc time)

And for a different model, just to be sure:

julia> @time model = ResNeXt(50, pretrain=true).layers[1]; # cold start
 30.753351 seconds (19.45 M allocations: 1.495 GiB, 0.57% gc time, 99.36% compilation time: 2% of which was recompilation)

julia> @time model = ResNeXt(50, pretrain=true).layers[1];
  0.136975 seconds (153.16 k allocations: 297.606 MiB, 17.00% gc time)

From 0.7 to 0.8, ResNet's exact structure has changed quite a bit even though the model has the same overall flow and number of parameters: we don't use Chain(::Vector) for one, and also there are more nested Chains and some identitys in 0.8 because of the way in which the model is created.

This regression also seems to be happening on VGG for some reason.

Metalhead v0.7:

julia> using Metalhead

julia> @time model = VGG(pretrain=true).layers[1];
  2.640502 seconds (6.24 M allocations: 2.444 GiB, 3.02% gc time, 70.84% compilation time)

julia> model
Chain([
  Conv((3, 3), 3 => 64, relu, pad=1),   # 1_792 parameters
  Conv((3, 3), 64 => 64, relu, pad=1),  # 36_928 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 64 => 128, relu, pad=1),  # 73_856 parameters
  Conv((3, 3), 128 => 128, relu, pad=1),  # 147_584 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 128 => 256, relu, pad=1),  # 295_168 parameters
  Conv((3, 3), 256 => 256, relu, pad=1),  # 590_080 parameters
  Conv((3, 3), 256 => 256, relu, pad=1),  # 590_080 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 256 => 512, relu, pad=1),  # 1_180_160 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  MaxPool((2, 2)),
])                  # Total: 26 arrays, 14_714_688 parameters, 56.134 MiB.

vs Metalhead v0.8:

julia> using Metalhead

julia> @time model = VGG(16, pretrain=true).layers[1];
  6.499447 seconds (11.99 M allocations: 2.279 GiB, 1.96% gc time, 94.77% compilation time: 9% of which was recompilation)

julia> model
Chain(
  Conv((3, 3), 3 => 64, relu, pad=1),   # 1_792 parameters
  Conv((3, 3), 64 => 64, relu, pad=1),  # 36_928 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 64 => 128, relu, pad=1),  # 73_856 parameters
  Conv((3, 3), 128 => 128, relu, pad=1),  # 147_584 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 128 => 256, relu, pad=1),  # 295_168 parameters
  Conv((3, 3), 256 => 256, relu, pad=1),  # 590_080 parameters
  Conv((3, 3), 256 => 256, relu, pad=1),  # 590_080 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 256 => 512, relu, pad=1),  # 1_180_160 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  MaxPool((2, 2)),
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  Conv((3, 3), 512 => 512, relu, pad=1),  # 2_359_808 parameters
  MaxPool((2, 2)),
)                   # Total: 26 arrays, 14_714_688 parameters, 56.137 MiB.

I chose VGG because it has minimal changes from v0.7 to 0.8 in terms of model structure and the way the model is created. There is still one minor change, and that is the fact that models in 0.7 were using Chain(::Vector) which we later removed as we were unclear about its consequences. Could that be actually helping compilation times here for loadmodel!? Not sure. I will try looking into this more.

Tagging @darsnack @ToucheSir @CarloLucibello for further investigation here as well, since they know a lot about Flux internals that I don't 😅

darsnack · 2023-07-17T01:57:50Z

This regression also seems to be happening on VGG for some reason.

With or without loading the weights?

theabhirath · 2023-07-17T02:03:10Z

This regression also seems to be happening on VGG for some reason.

With or without loading the weights?

With the weights. I'm sorry, I should have made that clearer. Without the weights, the time taken is almost exactly the same (which is to be expected because they are being constructed in the same manner 😅).

Metalhead 0.7:

julia> using Metalhead

julia> @time model = VGG(16, pretrain=false).layers[1];
  0.835150 seconds (2.29 M allocations: 1.178 GiB, 7.37% gc time, 77.39% compilation time)

Metalhead 0.8:

julia> using Metalhead

julia> @time model = VGG(16, pretrain=false).layers[1];
  0.885210 seconds (2.30 M allocations: 1.179 GiB, 4.96% gc time, 78.59% compilation time)

theabhirath added the bug Something isn't working label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResNet model takes forever to load in Metalhead 0.8 compared to 0.7. #247

ResNet model takes forever to load in Metalhead 0.8 compared to 0.7. #247

christiangnrd commented Jul 16, 2023

theabhirath commented Jul 17, 2023

darsnack commented Jul 17, 2023

theabhirath commented Jul 17, 2023

ResNet model takes forever to load in Metalhead 0.8 compared to 0.7. #247

ResNet model takes forever to load in Metalhead 0.8 compared to 0.7. #247

Comments

christiangnrd commented Jul 16, 2023

theabhirath commented Jul 17, 2023

darsnack commented Jul 17, 2023

theabhirath commented Jul 17, 2023