-
Notifications
You must be signed in to change notification settings - Fork 86
The debug binary now show how much memory it will try to allocate on the heap #595
base: master
Are you sure you want to change the base?
Conversation
total_mem_bytes += sizeof(QUANTIZED_PACKED) * max_device_input_elems; | ||
total_mem_bytes += sizeof(BIN_CONV_OUTPUT) * max_device_output_elems; | ||
#endif | ||
return total_mem_bytes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several temporary buffer allocated by some operators (ex. kn2row_buf in src/func/impl/generic/quantized_conv2d_kn2row.cpp
), but these are not counted yet.
I'm working to allocate these hidden temporary buffers in Network::init()
( #473 ), but it may take a while...
Until it is resolved, we should count these by hand...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thank you! Didn't know about these buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can wait until they are allocated from Network::init()
, if you want.
Or we can add it later and count by hand as you said 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These buffers are allocated by std::make_unique
.
I found such buffers in following files:
src/func/conv2d.cpp
src/func/*/batch_normalization.cpp
src/func/impl/generic/quantized_conv2d_kn2row.cpp
src/func/impl/arm_neon/quantized_conv2d_tiling.cpp
src/func/impl/x86_avx/quantized_conv2d_tiling.cpp
src/matrix/multiplication.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, seems that for changing layout we need these temporaries...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When #473 is fixed, buffers may be unified into single buffer, total buffer size is obvious.
It seems waiting for fixed is reasonable way.
@@ -28,6 +28,7 @@ class SYM_PUBLIC Network | |||
Network(); | |||
~Network(); | |||
|
|||
int memory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep consistency of API, it looks better to use verb for the method name. Considering that the responsibility of this function should output debug information, how about renaming to get_debug_info
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I propose to change this API to return string, (or char*), so that we can see the buffer size layer by layer. (Is it possible to change the behavior of this API?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's true. The name can be better and we should return more detailed information.
|
1 similar comment
|
Currently there is no way of knowing how much memory a network model requires for activations on deployment.
Motivation and Context
It is possible to know the required memory from the Tensorflow model but will not match with the actual required amount of memory due to buffer reuse and quantization. This PR will make the debug binary (
lm_xxx.elf
) to print a message that show how much memory will be allocated on the heap before callingNetwork::init()
. This allows you to have an idea of how much memory will be used and, debug possibly related memory issues.Description
Network
class that returns the amount of memory in bytes.mains/main.cpp
and print how many megabytes will be required by the model.How has this been tested?
Tested with 2 of the provided examples:
Screenshots (if appropriate):
For object detection this will show:
Types of changes
Checklist: