Reducing the high LUT utilization in StreamingMaxPool_hls #1264

Anuki16 · 2025-01-17T19:34:07Z

Anuki16
Jan 17, 2025

The following is mentioned in the FINN-R paper:

The BRAM and LUT requirements for the actual compute of the max pooling layers is very little. The block is basically implementing C parallel comparators, one for each channel whereby each sequentially compares two A-bit words holding onto the maximum of its pooling window. The total computational LUT costs are roughly equivalent to the product of A and C.

The paper also says that a sliding window unit is included in a max pool layer, but its LUT cost should be fairly small.

However, in the model I am working on, the LUT utilization of the StreamingMaxPool_hls layers is significantly larger than the above estimate. The layers use 4-bit data and the resource utilization after synthesis with PE = 1 is as follows.

Input shape (H, W, C)	Kernel size	A * C	Logic LUT usage
(32, 64, 56)	(2, 2)	224	1417
(16, 32, 112)	(2, 2)	448	2701
(8, 16, 216)	(2, 2)	864	5533
(4, 8, 296)	(4, 8)	1184	4205

In the first 3 layers, the actual utilization is almost 7x the A * C estimate. I'm not sure whether a sliding window unit is included in this or not, but still the utilization is fairly higher than what we can estimate using the paper. I'm guessing that it comes from other logic that are implemented with the parallel comparators.

Do you know whether there are any methods to reduce this LUT usage? A way to control the number of parallel comparators would be useful, but I didn't see such an option anywhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing the high LUT utilization in StreamingMaxPool_hls #1264

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Reducing the high LUT utilization in StreamingMaxPool_hls #1264

Anuki16 Jan 17, 2025

Replies: 0 comments

Anuki16
Jan 17, 2025