Skip to content

Opportunities to Speed Up Conv Operation on Arm-V8? #8320

Answered by abadams
FabianSchuetze asked this question in Q&A
Discussion options

You must be logged in to vote

If I were to optimize something like this, I'd start here and change the floats to int8s:

https://github.com/halide/Halide/blob/main/apps/conv_layer/conv_layer_generator.cpp

Note that it does indeed use more vector accumulator registers (20 instead of 8) is indeed doing some staging of inputs (The .in() scheduling calls)

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@FabianSchuetze
Comment options

Answer selected by FabianSchuetze
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants