-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Supporting DICOM via pydicom #1136
base: master
Are you sure you want to change the base?
Conversation
IsaacYangSLA
commented
Oct 3, 2016
- Importing dicom if dicom module can be loaded.
- Storing dicom image in LMDB data field as float.
- Several small modifications on dealing with PIL.Image mode
- Notify users to use None on higher bit images.
Notify users on encoding for higher depth images. Fix _save_means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to prevent users from mixing 8-bit and 16-bit images in the classification path I would have added a "depth" field to the image transformations in the dataset (much like we are doing for image size and channels). This way, during training and inference, all images would use the same depth and this would avoid a lot of issues.
write_queue.put(datum) | ||
else: | ||
write_queue.put((image, label)) | ||
except IOError: # try to save 16-bit images with PNG/JPG encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can IOError
occur for other reasons, besides attempting to store 16-bit data with PNG/JPG encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me catch the error message, too, and set _notification with that error message.
@@ -442,6 +454,9 @@ def _create_hdf5(image_count, write_queue, batch_size, output_dir, | |||
|
|||
assert images_written == writer.count() | |||
|
|||
if _notification(): | |||
raise WriteError('Encoding should be None for images with color depth higher than 8 bits.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the notion of "encoding" is irrelevant for HDF5 databases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. It clearly states on the beginning of Hdf5Writer function DTYPE='float32.'
@@ -269,6 +269,9 @@ def create_db(input_file, output_dir, | |||
write_queue = Queue.Queue(2*batch_size) | |||
summary_queue = Queue.Queue() | |||
|
|||
# Init helper function for notification between threads | |||
_notification(set_flag=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be more explicitly named or be made more generic? We could make the flag a bit field, and support a number of reasons for terminating threads (wrong encoding being just one of those reasons).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is too vague, I agreed. I thought Python didn't natively support bit field. I will make this a dict with meaningful key names. Hopefully, it can be a better helpful function.
@@ -667,7 +695,11 @@ def _save_means(image_sum, image_count, mean_files): | |||
""" | |||
Save mean[s] to file | |||
""" | |||
mean = np.around(image_sum / image_count).astype(np.uint8) | |||
mean = np.around(image_sum / image_count) | |||
if mean.max()>255: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not always use float
for the mean? We don't use encoding for the mean.
if you want to keep the float
v.s. uint8
dtype, I think you also need to check the lower bound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gheinrich , thanks on the comments. My understanding is image_sum
is always positive, so the variable mean
is also positive. The only reason that astype(np.uint8) doesn't convert properly is some pixels in mean
are greater than 255. So I keep the original codes to handle original case.
However, it will be easier to understand if we make mean to float. Let me try that.
@@ -693,7 +725,10 @@ def _save_means(image_sum, image_count, mean_files): | |||
with open(mean_file, 'wb') as outfile: | |||
outfile.write(blob.SerializeToString()) | |||
elif mean_file.lower().endswith(('.jpg', '.jpeg', '.png')): | |||
image = PIL.Image.fromarray(mean) | |||
if np.issubdtype(mean.dtype, np.float): | |||
image = PIL.Image.fromarray(mean*255/mean.max()).convert('L') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we resort to 16-bit png in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I don't think any pixel in mean can be greater than 65535. We should be able to do that for PNG case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gheinrich , it looks like if the max value is greater than 255, but much smaller than 65536 (for example, around 800), and we save it to png file. That image is almost black. The reason, I believe, is the dynamic range of 16-bit PNG is much higher than mean pixel value. We can either scale it up to full dynamic range of 16-bit PNG, or scale it down to 8-bit png/jpg. Both look the same on screen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it's impossible to know how to best visualize data on server side. This is something we discussed with @jmancewicz. Joe thinks the best way to deal with this is to have the user choose the amount of contrast, etc. though sliders on client side. I agree with him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds a good idea.
@@ -11,6 +11,20 @@ | |||
except ImportError: | |||
from StringIO import StringIO | |||
|
|||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you want to add pydicom
to requirements.txt
? If the package is not installed, DIGITS will silently refuse to read DICOM files, which could be confusing to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's required, we're going to need a deb package for it. Does this package work?
http://packages.ubuntu.com/trusty/python/python-dicom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check if 0.9.7 works. Pip installed 0.9.9 and it's the one I tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested 0.9.7 and it worked without problem. We should be able to use apt python-dicom package of 0.9.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you can add pydicom
to requirements.txt
and import dicom
without a try/except
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
except dicom.errors.InvalidDicomError: | ||
raise errors.LoadImageError, 'Invalid Dicom file' | ||
pixels = dm.pixel_array | ||
image = PIL.Image.fromarray(pixels.astype(np.float)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you deal with 3D/4D data (which seem to be the most common use of DICOM)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This definitely breaks when pixels.shape is higher than 2D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can support that in the future. Adding 3D/4D data support from one single DICOM file would involve many modifications. Maybe that's because my limited understanding on the current codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that can be a future improvement. If pixels
is a 3-D array, how do you determine whether it's an RGB image (OK in DIGITS) or a volumetric grayscale image (not OK in DIGITS if more than there aren't exactly 3 slices)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each slice should have its meta data. Therefore, we should be able to tell if it's a multi-slice DICOM, or a multi-channel DICOM. I need to dig out that information, but I remember I saw that before.
Make _notification more general Passing 8/32 bit-depth among jobs, create_db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inference.py
needs to be modified to perform the required depth conversion on test images (similar to how we perform channel conversion and resize), correct?
@@ -21,6 +21,7 @@ | |||
<li><b>Image Count</b> - {{task.image_count}}</li> | |||
<li><b>Image Dimensions</b> - | |||
{{task.image_width}}x{{task.image_height}}x{{task.image_channels}}</li> | |||
<li><b>Image bit-depth</b></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed your point. That line of code is complete nonsense. I removed it in the latest commit.
@@ -11,6 +11,20 @@ | |||
except ImportError: | |||
from StringIO import StringIO | |||
|
|||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you can add pydicom
to requirements.txt
and import dicom
without a try/except
then?
@@ -35,6 +49,27 @@ | |||
# List of supported file extensions | |||
# Use like "if filename.endswith(SUPPORTED_EXTENSIONS)" | |||
SUPPORTED_EXTENSIONS = ('.png','.jpg','.jpeg','.bmp','.ppm') | |||
if dicom is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we always import dicom
maybe you don't need this if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. We can remove that if
after we require pydicom
package.
except dicom.errors.InvalidDicomError: | ||
raise errors.LoadImageError, 'Invalid Dicom file' | ||
pixels = dm.pixel_array | ||
image = PIL.Image.fromarray(pixels.astype(np.float)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that can be a future improvement. If pixels
is a 3-D array, how do you determine whether it's an RGB image (OK in DIGITS) or a volumetric grayscale image (not OK in DIGITS if more than there aren't exactly 3 slices)?
If an inference task applies to a high bit-depth image a model which is trained with 8-bit images, it's likely to generate incorrect results. If resize is needed for test image, the function will follow test image's bit-depth to preserve its dynamic range. |
When you create a classification dataset, all images are resized to the specified dimensions and converted to the specified color scheme (grayscale or RGB). It would feel awkward if this wasn't the case for the bit depth too. I understand it's probably more work than was originally planned but I think the bit depth should be enforced. Besides, doing this (both during dataset creation and inference) would avoid a lot of issues that could arise if a user mixes 8bpc and 18bpc images. |
Remove import error handling on dicom package Use model's train data to limit inference data Conditional display bit-depth of dataset Save to 16-bit png for Torch inferencing
@gheinrich, thanks a lot. I think we are getting closer to finish it. Let me explain it and see if it makes sense to you. |
resize_bpp = utils.forms.SelectField(u'Bits per pixel', | ||
default='8', | ||
choices=[('8', '8-bit (color or grayscale)'), ('32', '32-bit floating point (grayscale only)')], | ||
tooltip="Storing 32-bit floating point for certain medical images." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be useful to have a more verbose tooltip to state the fact that bit depth is only checked, not enforced through conversion
@IsaacYangSLA I think it makes sense indeed. |