-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Most efficient region reading from tiled multi-resolution tiff images #100
Comments
Hello @nd26, WSI files are usually compressed with jpeg2000, so almost all your runtime will be spent inside your jp2k decoder. The speed of the disc won't really have any effect. You'd need to share your benchmark for me to comment on the speed. Is this for machine learning? pyvips 2.1.6 / libvips 8.8 have a new region feature which can give a nice speedup for this type of application (generating 1000s of eg. 32x32 pixel areas). If it's for image display, libvips has a fast path for this already, have a look at vipsdisp. Yes, you can read out levels from the pyramid, have a look at the https://github.com/jcupitt/vipsdisp https://libvips.github.io/libvips/API/current/VipsForeignSave.html#vips-openslideload OpenSlide uses OpenJPEG for decode, so you could investigate ways of speeding that up. There's been some talk of GPU acceleration for decode, and there are patches for AVX as well. They should help. In terms of hardware, a huge CPU with many cores will help too. |
For a simple benchmark, I'd try:
ie. find the average pixel value. That should max your CPU. If it doesn't, there's some more speed to squeeze from your IO system. |
Hi @jcupitt, I appreciate your help. Thank you!
SSD
NVME
HDD
Yes. Just to clarify, I do not need to simply preprocess the whole slide images a priori into tiles, but instead read patches on the go while the network is training, which is why I am trying to make the reading part as efficient as possible.
Is this what it's been used when I employ the "crop" method?
Could you elaborate on this? I am more than interested.
Any recommendations? Thanks again! |
You wrote:
That's curious, you should see user about 2x real on a two-core machine. What processor are you using? libvips has quite a high set-up cost for pipelines, but when set up, runs quickly. If you are generating images of over 512x512 or so, it's fine, but for 32x32 patches or whatever, pipeline cost will dominate. Here's a benchmark: #!/usr/bin/python3
import sys
import pyvips
image = pyvips.Image.new_from_file(sys.argv[1])
patch_size = 64
n_across = image.width // patch_size
n_down = image.height // patch_size
x_max = n_across - 1
y_max = n_down - 1
n_patches = 0
for y in range(0, n_down):
print("row {} ...".format(y))
for x in range(0, n_across):
patch = image.crop(x * patch_size, y * patch_size,
patch_size, patch_size)
patch_f = patch.fliphor()
patch0 = patch.write_to_memory()
patch1 = patch.rot90().write_to_memory()
patch2 = patch.rot180().write_to_memory()
patch3 = patch.rot270().write_to_memory()
patch4 = patch_f.write_to_memory()
patch5 = patch_f.rot90().write_to_memory()
patch6 = patch_f.rot180().write_to_memory()
patch7 = patch_f.rot270().write_to_memory()
n_patches += 8
print("{} patches generated".format(n_patches)) So this is building a new pipeline for every patch, executing it, and shutting it down. On this machine, it generates 12,512 patches in 41s from Here's a version using the new libvips 8.8 #!/usr/bin/python3
import sys
import pyvips
def fetch(region, patch_size, x, y):
return region.fetch(patch_size * x, patch_size * y, patch_size, patch_size)
image[0] = pyvips.Image.new_from_file(sys.argv[1])
image[1] = image0.rot90()
image[2] = image0.rot180()
image[3] = image0.rot270()
image[4] = image[0].fliphor()
image[5] = image[4].rot90()
image[6] = image[4].rot180()
image[7] = image[4].rot270()
reg = [pyvips.Region.new(x) for x in image]
patch_size = 64
n_across = image0.width // patch_size
n_down = image0.height // patch_size
x_max = n_across - 1
y_max = n_down - 1
n_patches = 0
for y in range(0, n_down):
print("row {} ...".format(y))
for x in range(0, n_across):
patch0 = fetch(reg[0], patch_size, x, y)
patch1 = fetch(reg[1], patch_size, y_max - y, x)
patch2 = fetch(reg[2], patch_size, x_max - x, y_max - y)
patch3 = fetch(reg[3], patch_size, y, x_max - x)
patch4 = fetch(reg[4], patch_size, x_max - x, y)
patch5 = fetch(reg[5], patch_size, y_max - y, x_max - x)
patch6 = fetch(reg[6], patch_size, x, y_max - y)
patch7 = fetch(reg[7], patch_size, y, x)
n_patches += 8
print("{} patches generated".format(n_patches)) Now this is making 8 pipelines and fetching patches from them. It's not rebuilding the pipeline each time, so it's much quicker. This version makes 12,512 patches in 0.5s from |
re. OpenJPEG, have a look at at the mailing list, there has been some discussion of AVX and OpenCL acceleration, plus some branches with code. |
I generate patches with a resolution that is a multiple of 224x224. 448x448 and 896x896 pixels are the most common. It's a hyperparameter.
I've ran into the "libvips too old" error while trying to test this. Installation seems to have went well, but not integrated into anaconda. I've downloaded "vips-8.8.0-rc3.tar.gz" from https://github.com/libvips/libvips/releases/tag/v8.8.0-rc3 and followed the typical configure-make-install procedure with no errors. Do you know why this might be the case?
Where can I find their mailing list? Sorry, I couldn't find it. |
I'd guess you have several libvipses installed and it's picking up the wrong one. Try: import logging
logging.basicConfig(level=logging.DEBUG)
import pyvips And you'll see some diagnostics. ojg mailing list: |
Hi @jcupitt, I just had the chance to get back to this. I've managed to run your code, and indeed, the latter (with fetch) is impressively faster. Correct me if I am wrong, but I think the following way is the only way I can take advantage of the fetch method for my problem. If I am not mistaken, the speedup is due to having the pipelines set up once, instead of every time a patch is needed. However, in my application, a single patch is retrieved from a single whole slide image. This happens around 10 (.. the batch size) times a second from different whole slide images. I cannot know a priori which patches will need to be extracted, and since patches are extracted sequentially from all whole slide images, the only way I can see the above working is to have a pipeline setup for each whole slide image before the training commences, and then when a patch needs to be extracted, to simply retrieve the pipeline and use the fetch function instead of "read_region". Is this correct? Empirically, do you think this will speedup the reading operation? Note that the patches have 896 x 896 pixels. |
Those are big patches! How many slide images will you be working from? |
Indeed!
Around 650! |
650 isn't so many. I would open them all (ie. make 650 pipelines) and then use fetch to pull out tiles. You could try using crop as well, but fetch would probably be quicker. Hopefully you have loads of memory! |
Alright, I will do that once the current experiment ends. I have 32GB RAM, and 24GB GPU VRAM. Hopefully, it will be enough. By the way, you said:
Is this an issue/suboptimality given the following CPU?
|
Try
So I have 4 cores, 8 threads and see ~4x speedup. |
@nd26 did you manage to get any boost in reading performance using pyvips instead of openslide in the batch generator? I'm studying the same thing right now, but I'm not finding openslide to be that slow tbh. Of course storing patches in a hdf5-file and reading from that directly is faster, but not more than about 2x. And isn't openslide already using libvips? |
@jcupitt can you write this same code in C language for generating patches using libvips functions: vips_image_new_from_file(),vips_crop() ,vips_image_write_to_file()? |
It's pretty easy, just build the pipeline, then use https://libvips.github.io/libvips/API/current/VipsRegion.html#vips-region-fetch |
@jcupitt Thanks for helping me out. I have used libvips to generate patches. But,when I included the function vips_region_fetch, it says implicit declaration of vips_region_fetch function.
In which header file vips_region_fetch() function is defined? |
It's part of the standard API from libvips 8.8, I think. Perhaps you have an old version installed? |
I have installed libvips 8.4.5. |
You'll need to update -- your libvips is five years old. |
Yeah!! I have updated it now to libvips 8.8 and its working fine. Thank you. |
@jcupitt I applied |
Hi @wenqf11,
This is because of how they work. Here's a benchmark: #!/usr/bin/python3
import random
import time
import sys
import pyvips
if len(sys.argv) != 4:
print("usage: ./fetch-vs-crop.py IMAGE SIZE N-TILES")
size = int(sys.argv[2])
n_tiles = int(sys.argv[3])
print(f"{n_tiles} tiles, each {size}x{size} pixels")
image = pyvips.Image.new_from_file(sys.argv[1])
print(f"random read")
start = time.time()
for _ in range(n_tiles):
x = random.randint(0, image.width - size)
y = random.randint(0, image.height - size)
tile = image.crop(x, y, size, size)
# necessary to force evaluation
avg = tile.avg()
end = time.time()
print(f"crop: time per tile = {1000 * (end - start) / n_tiles:.2f}ms")
start = time.time()
region = pyvips.Region.new(image)
for _ in range(n_tiles):
x = random.randint(0, image.width - size)
y = random.randint(0, image.height - size)
p = region.fetch(x, y, size, size)
end = time.time()
print(f"fetch: time per tile = {1000 * (end - start) / n_tiles:.2f}ms")
print(f"sequential read")
image = pyvips.Image.new_from_file(sys.argv[1])
start = time.time()
tile_number = 0
# most tile formats are column-major internally
for y in range(0, image.height - size, size):
for x in range(0, image.width - size, size):
tile = image.crop(x, y, size, size)
# necessary to force evaluation
avg = tile.avg()
tile_number += 1
if tile_number > n_tiles:
break
if tile_number > n_tiles:
break
end = time.time()
print(f"crop: time per tile = {1000 * (end - start) / n_tiles:.2f}ms")
start = time.time()
tile_number = 0
region = pyvips.Region.new(image)
for y in range(0, image.height - size, size):
for x in range(0, image.width - size, size):
x = random.randint(0, image.width - size)
y = random.randint(0, image.height - size)
p = region.fetch(x, y, size, size)
tile_number += 1
if tile_number > n_tiles:
break
if tile_number > n_tiles:
break
end = time.time()
print(f"fetch: time per tile = {1000 * (end - start) / n_tiles:.2f}ms") I see:
So for one small tile (no flips and rotates), The situation changes with large tiles:
Now fetch is very slow because it can't parallelize. There's little difference between sequential and random, since the libvips pixel caches won't help. tiff is slower for parallel JPEG read since the libtiff JPEG decoder is single-threaded. |
@jcupitt ok, thanks a lot! |
Hi, |
Hi @lafith, Please open a new issue for new questions -- it'll make it easier for other people to find answers. |
Hi!
This is not really an issue, but more of advice seeking. The context is that I am working with histopathological images (known as whole slide images) that have a typical size of 100k x 100k pixels, and have therefore dedicated libraries to perform reading and writing. The most popular example is openslide, and then there is also asap. However, following a recent acquisition of an NVME disk with >3gb/s read speeds, I've noticed zero to negligible improvements in reading regions from such images, using these libraries. Therefore, I had to delve deeper to figure out the reasons. And this has led me to lower-level libraries, i.e. libvips.
Reading regions from these images is the general bottleneck of my methodology (as it happens many times per second) and as such, I need to speed it up as much as possible, up until another (probably CPU-bound) bottleneck appears. What i've done so far is that I benchmarked the "Image.new_from_file" method by reading the same large tiff image from three different disks, an HDD, an SSD, and then the two new NVMes.
So here are the questions: - Is this the most efficient way to do the reading, i.e. do we really need to read the whole image before reading the region? - Since these are pyramidal tiffs, is there a way to access a specific level/resolution? - Generally, do you think these times are appropriate for disks with the following read speeds and given that the image has a size of 2.4 GB (2,402,766,863 bytes)?
HDD: ~100MB/s
SSD: ~500MB/s
NVMe: ~2.8 GB/s
The text was updated successfully, but these errors were encountered: