Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

Supporting DICOM via pydicom #1136

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Supporting DICOM via pydicom #1136

wants to merge 3 commits into from

Conversation

IsaacYangSLA
Copy link
Contributor

  1. Importing dicom if dicom module can be loaded.
  2. Storing dicom image in LMDB data field as float.
  3. Several small modifications on dealing with PIL.Image mode
  4. Notify users to use None on higher bit images.

Notify users on encoding for higher depth images.
Fix _save_means.
Copy link
Contributor

@gheinrich gheinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to prevent users from mixing 8-bit and 16-bit images in the classification path I would have added a "depth" field to the image transformations in the dataset (much like we are doing for image size and channels). This way, during training and inference, all images would use the same depth and this would avoid a lot of issues.

write_queue.put(datum)
else:
write_queue.put((image, label))
except IOError: # try to save 16-bit images with PNG/JPG encoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can IOError occur for other reasons, besides attempting to store 16-bit data with PNG/JPG encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me catch the error message, too, and set _notification with that error message.

@@ -442,6 +454,9 @@ def _create_hdf5(image_count, write_queue, batch_size, output_dir,

assert images_written == writer.count()

if _notification():
raise WriteError('Encoding should be None for images with color depth higher than 8 bits.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the notion of "encoding" is irrelevant for HDF5 databases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. It clearly states on the beginning of Hdf5Writer function DTYPE='float32.'

@@ -269,6 +269,9 @@ def create_db(input_file, output_dir,
write_queue = Queue.Queue(2*batch_size)
summary_queue = Queue.Queue()

# Init helper function for notification between threads
_notification(set_flag=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be more explicitly named or be made more generic? We could make the flag a bit field, and support a number of reasons for terminating threads (wrong encoding being just one of those reasons).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is too vague, I agreed. I thought Python didn't natively support bit field. I will make this a dict with meaningful key names. Hopefully, it can be a better helpful function.

@@ -667,7 +695,11 @@ def _save_means(image_sum, image_count, mean_files):
"""
Save mean[s] to file
"""
mean = np.around(image_sum / image_count).astype(np.uint8)
mean = np.around(image_sum / image_count)
if mean.max()>255:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not always use float for the mean? We don't use encoding for the mean.
if you want to keep the float v.s. uint8 dtype, I think you also need to check the lower bound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gheinrich , thanks on the comments. My understanding is image_sum is always positive, so the variable mean is also positive. The only reason that astype(np.uint8) doesn't convert properly is some pixels in mean are greater than 255. So I keep the original codes to handle original case.
However, it will be easier to understand if we make mean to float. Let me try that.

@@ -693,7 +725,10 @@ def _save_means(image_sum, image_count, mean_files):
with open(mean_file, 'wb') as outfile:
outfile.write(blob.SerializeToString())
elif mean_file.lower().endswith(('.jpg', '.jpeg', '.png')):
image = PIL.Image.fromarray(mean)
if np.issubdtype(mean.dtype, np.float):
image = PIL.Image.fromarray(mean*255/mean.max()).convert('L')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we resort to 16-bit png in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I don't think any pixel in mean can be greater than 65535. We should be able to do that for PNG case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gheinrich , it looks like if the max value is greater than 255, but much smaller than 65536 (for example, around 800), and we save it to png file. That image is almost black. The reason, I believe, is the dynamic range of 16-bit PNG is much higher than mean pixel value. We can either scale it up to full dynamic range of 16-bit PNG, or scale it down to 8-bit png/jpg. Both look the same on screen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's impossible to know how to best visualize data on server side. This is something we discussed with @jmancewicz. Joe thinks the best way to deal with this is to have the user choose the amount of contrast, etc. though sliders on client side. I agree with him.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds a good idea.

@@ -11,6 +11,20 @@
except ImportError:
from StringIO import StringIO

try:
Copy link
Contributor

@gheinrich gheinrich Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you want to add pydicom to requirements.txt? If the package is not installed, DIGITS will silently refuse to read DICOM files, which could be confusing to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's required, we're going to need a deb package for it. Does this package work?
http://packages.ubuntu.com/trusty/python/python-dicom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check if 0.9.7 works. Pip installed 0.9.9 and it's the one I tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested 0.9.7 and it worked without problem. We should be able to use apt python-dicom package of 0.9.7.

Copy link
Contributor

@gheinrich gheinrich Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can add pydicom to requirements.txt and import dicom without a try/except then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

except dicom.errors.InvalidDicomError:
raise errors.LoadImageError, 'Invalid Dicom file'
pixels = dm.pixel_array
image = PIL.Image.fromarray(pixels.astype(np.float))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you deal with 3D/4D data (which seem to be the most common use of DICOM)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This definitely breaks when pixels.shape is higher than 2D.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can support that in the future. Adding 3D/4D data support from one single DICOM file would involve many modifications. Maybe that's because my limited understanding on the current codes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that can be a future improvement. If pixels is a 3-D array, how do you determine whether it's an RGB image (OK in DIGITS) or a volumetric grayscale image (not OK in DIGITS if more than there aren't exactly 3 slices)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each slice should have its meta data. Therefore, we should be able to tell if it's a multi-slice DICOM, or a multi-channel DICOM. I need to dig out that information, but I remember I saw that before.

Make _notification more general
 Passing 8/32 bit-depth among jobs, create_db
Copy link
Contributor

@gheinrich gheinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inference.py needs to be modified to perform the required depth conversion on test images (similar to how we perform channel conversion and resize), correct?

@@ -21,6 +21,7 @@
<li><b>Image Count</b> - {{task.image_count}}</li>
<li><b>Image Dimensions</b> -
{{task.image_width}}x{{task.image_height}}x{{task.image_channels}}</li>
<li><b>Image bit-depth</b></li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this useful?

Copy link
Contributor Author

@IsaacYangSLA IsaacYangSLA Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed your point. That line of code is complete nonsense. I removed it in the latest commit.

@@ -11,6 +11,20 @@
except ImportError:
from StringIO import StringIO

try:
Copy link
Contributor

@gheinrich gheinrich Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can add pydicom to requirements.txt and import dicom without a try/except then?

@@ -35,6 +49,27 @@
# List of supported file extensions
# Use like "if filename.endswith(SUPPORTED_EXTENSIONS)"
SUPPORTED_EXTENSIONS = ('.png','.jpg','.jpeg','.bmp','.ppm')
if dicom is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we always import dicom maybe you don't need this if?

Copy link
Contributor Author

@IsaacYangSLA IsaacYangSLA Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. We can remove that if after we require pydicom package.

except dicom.errors.InvalidDicomError:
raise errors.LoadImageError, 'Invalid Dicom file'
pixels = dm.pixel_array
image = PIL.Image.fromarray(pixels.astype(np.float))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that can be a future improvement. If pixels is a 3-D array, how do you determine whether it's an RGB image (OK in DIGITS) or a volumetric grayscale image (not OK in DIGITS if more than there aren't exactly 3 slices)?

@IsaacYangSLA
Copy link
Contributor Author

If an inference task applies to a high bit-depth image a model which is trained with 8-bit images, it's likely to generate incorrect results. If resize is needed for test image, the function will follow test image's bit-depth to preserve its dynamic range.

@gheinrich
Copy link
Contributor

gheinrich commented Oct 7, 2016

When you create a classification dataset, all images are resized to the specified dimensions and converted to the specified color scheme (grayscale or RGB). It would feel awkward if this wasn't the case for the bit depth too. I understand it's probably more work than was originally planned but I think the bit depth should be enforced. Besides, doing this (both during dataset creation and inference) would avoid a lot of issues that could arise if a user mixes 8bpc and 18bpc images.

Remove import error handling on dicom package
Use model's train data to limit inference data
Conditional display bit-depth of dataset
Save to 16-bit png for Torch inferencing
@IsaacYangSLA
Copy link
Contributor Author

@gheinrich, thanks a lot. I think we are getting closer to finish it. Let me explain it and see if it makes sense to you.
An dicom image, after being loaded, will have image mode 'F.' And during resize, that 'F' is maintained, and no scaling is done on pixel values. At the very last stage, if user selected bpp = 32, and unencoded for LMDB, DIGITS save the complete information. The trained model remembers its dataset and its bpp. At inferencing, only the combination of mode 'F' images and model trained with bpp=8 will raise exception. If no exception, the resize stage of test images will apply original dataset's bpp (any image modes with bpp=32, or non-'F' modes with bpp=8). So the model will see the test images processed by the same bpp.

resize_bpp = utils.forms.SelectField(u'Bits per pixel',
default='8',
choices=[('8', '8-bit (color or grayscale)'), ('32', '32-bit floating point (grayscale only)')],
tooltip="Storing 32-bit floating point for certain medical images."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be useful to have a more verbose tooltip to state the fact that bit depth is only checked, not enforced through conversion

@gheinrich
Copy link
Contributor

@IsaacYangSLA I think it makes sense indeed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants