You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The BRAM and LUT requirements for the actual compute of the max pooling layers is very little. The block is basically implementing C parallel comparators, one for each channel whereby each sequentially compares two A-bit words holding onto the maximum of its pooling window. The total computational LUT costs are roughly equivalent to the product of A and C.
The paper also says that a sliding window unit is included in a max pool layer, but its LUT cost should be fairly small.
However, in the model I am working on, the LUT utilization of the StreamingMaxPool_hls layers is significantly larger than the above estimate. The layers use 4-bit data and the resource utilization after synthesis with PE = 1 is as follows.
Input shape (H, W, C)
Kernel size
A * C
Logic LUT usage
(32, 64, 56)
(2, 2)
224
1417
(16, 32, 112)
(2, 2)
448
2701
(8, 16, 216)
(2, 2)
864
5533
(4, 8, 296)
(4, 8)
1184
4205
In the first 3 layers, the actual utilization is almost 7x the A * C estimate. I'm not sure whether a sliding window unit is included in this or not, but still the utilization is fairly higher than what we can estimate using the paper. I'm guessing that it comes from other logic that are implemented with the parallel comparators.
Do you know whether there are any methods to reduce this LUT usage? A way to control the number of parallel comparators would be useful, but I didn't see such an option anywhere.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The following is mentioned in the FINN-R paper:
The paper also says that a sliding window unit is included in a max pool layer, but its LUT cost should be fairly small.
However, in the model I am working on, the LUT utilization of the StreamingMaxPool_hls layers is significantly larger than the above estimate. The layers use 4-bit data and the resource utilization after synthesis with PE = 1 is as follows.
In the first 3 layers, the actual utilization is almost 7x the A * C estimate. I'm not sure whether a sliding window unit is included in this or not, but still the utilization is fairly higher than what we can estimate using the paper. I'm guessing that it comes from other logic that are implemented with the parallel comparators.
Do you know whether there are any methods to reduce this LUT usage? A way to control the number of parallel comparators would be useful, but I didn't see such an option anywhere.
Beta Was this translation helpful? Give feedback.
All reactions