2D downsampling of uint8 data inefficient #737

nkemnitz · 2024-07-03T08:09:35Z

All our tensors are passed as NCXYZ to torch and converted to float32. That's not just a copy, but also 4x more memory.

pytorch interpolate supports 'bilinear' downsampling for uint8 data, but requires NCXY tensor input
tinybrain has fast AvgPool for (2,2), (2,2,1), (2,2,1,1), as well as for (2,2,2), (2,2,2,1) ndarrays

Another thing to consider is that CloudVolume data already is in Fortran order, which tinybrain expects

data = np.asfortranarray(np.random.randint(0,255, size=(1,1,4096,4096,1), dtype=np.uint8))

# Torch CPU, uint8->float32->uint8
%timeit torch.nn.functional.interpolate(torch.from_numpy(data).float(), scale_factor=[0.5,0.5,1.0], mode='trilinear').byte()
84 ms ± 774 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Torch GPU, uint8->float32->uint8
%timeit torch.nn.functional.interpolate(torch.from_numpy(data).cuda().float(), scale_factor=[0.5,0.5,1.0], mode='trilinear').byte().cpu()
6.34 ms ± 32.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Torch CPU, uint8
%timeit torch.nn.functional.interpolate(torch.from_numpy(data).squeeze(-1), scale_factor=[0.5,0.5], mode='bilinear')
162 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Torch CUDA, uint8
%timeit torch.nn.functional.interpolate(torch.from_numpy(data).cuda().squeeze(-1), scale_factor=[0.5,0.5], mode='bilinear').cpu()
RuntimeError: "upsample_bilinear2d_out_frame" not implemented for 'Byte'

# Tinybrain, uint8->float32->uint8
%timeit tinybrain.downsample_with_averaging(data.astype(np.float32).squeeze((0,1)), factor=[2,2])[0].astype(np.uint8)
32.1 ms ± 254 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Tinybrain, uint8
%timeit tinybrain.downsample_with_averaging(data.squeeze((0,1)), factor=[2,2])[0]
1.45 ms ± 12.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

supersergiy · 2024-07-03T08:53:19Z

It's unexpected for me that performance matters here. I thought interpolation performance would be mostly bound by bandwidth

nkemnitz · 2024-07-03T09:43:39Z

Just checked - downloading a 4k x 4k uint8 JPG patch is 100-150 ms. Similar to current downsampling behavior

supersergiy · 2024-07-03T09:49:08Z

Wow, that's a crazy fast download! But also, doesn't that mean that there's basically no inefficiency if we use pipelining? At the same time, it maybe doesn't matter and we should just put tinybrain in instead of default torch behavior. It's not a hard fix.

supersergiy · 2025-01-23T17:28:37Z

@nkemnitz we already use tinybrain for segmentation as of a while ago. Should this be closed?

zetta_utils/zetta_utils/tensor_ops/common.py

Lines 391 to 405 in 1254139

    
           if mode == "segmentation" and ( 
        
               scale_factor_tuple is not None 
        
               and ( 
        
                   tuple(scale_factor_tuple) 
        
                   in ( 
        
                       [(0.5 ** i, 0.5 ** i) for i in range(1, 5)]  # 2D factors of 2 
        
                       + [(0.5 ** i, 0.5 ** i, 1) for i in range(1, 5)] 
        
                       + [(0.5 ** i, 0.5 ** i, 0.5 ** i) for i in range(1, 5)]  # #D factors of 2 
        
                   ) 
        
               ) 
        
               and data.shape[0] == 1 
        
           ):  # use tinybrain 
        
               result_raw = _interpolate_segmentation_with_tinybrain( 
        
                   data=data, scale_factor_tuple=scale_factor_tuple 
        
               )

nkemnitz · 2025-01-23T18:57:45Z

Still relevant for average_downsampling of uint8 images. Especially the 4x memory savings.

nkemnitz added the enhancement New feature or request label Jul 3, 2024

supersergiy mentioned this issue Feb 6, 2025

feat: use tinybrain for image downsampling #894

Merged

supersergiy closed this as completed in #894 Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2D downsampling of uint8 data inefficient #737

2D downsampling of uint8 data inefficient #737

nkemnitz commented Jul 3, 2024 •

edited

Loading

supersergiy commented Jul 3, 2024

nkemnitz commented Jul 3, 2024

supersergiy commented Jul 3, 2024

supersergiy commented Jan 23, 2025

nkemnitz commented Jan 23, 2025

2D downsampling of uint8 data inefficient #737

2D downsampling of uint8 data inefficient #737

Comments

nkemnitz commented Jul 3, 2024 • edited Loading

supersergiy commented Jul 3, 2024

nkemnitz commented Jul 3, 2024

supersergiy commented Jul 3, 2024

supersergiy commented Jan 23, 2025

nkemnitz commented Jan 23, 2025

nkemnitz commented Jul 3, 2024 •

edited

Loading