Supporting DICOM via pydicom #1136

IsaacYangSLA · 2016-10-03T22:02:37Z

Importing dicom if dicom module can be loaded.
Storing dicom image in LMDB data field as float.
Several small modifications on dealing with PIL.Image mode
Notify users to use None on higher bit images.

Notify users on encoding for higher depth images. Fix _save_means.

gheinrich

In order to prevent users from mixing 8-bit and 16-bit images in the classification path I would have added a "depth" field to the image transformations in the dataset (much like we are doing for image size and channels). This way, during training and inference, all images would use the same depth and this would avoid a lot of issues.

gheinrich · 2016-10-04T13:50:53Z

digits/tools/create_db.py

+                write_queue.put(datum)
+            else:
+                write_queue.put((image, label))
+        except IOError:  # try to save 16-bit images with PNG/JPG encoding


Can IOError occur for other reasons, besides attempting to store 16-bit data with PNG/JPG encoding?

Let me catch the error message, too, and set _notification with that error message.

gheinrich · 2016-10-04T13:51:26Z

digits/tools/create_db.py

@@ -442,6 +454,9 @@ def _create_hdf5(image_count, write_queue, batch_size, output_dir,

    assert images_written == writer.count()

+    if _notification():
+        raise WriteError('Encoding should be None for images with color depth higher than 8 bits.')


the notion of "encoding" is irrelevant for HDF5 databases

My bad. It clearly states on the beginning of Hdf5Writer function DTYPE='float32.'

gheinrich · 2016-10-04T13:54:03Z

digits/tools/create_db.py

@@ -269,6 +269,9 @@ def create_db(input_file, output_dir,
    write_queue = Queue.Queue(2*batch_size)
    summary_queue = Queue.Queue()

+    # Init helper function for notification between threads
+    _notification(set_flag=False)


can this be more explicitly named or be made more generic? We could make the flag a bit field, and support a number of reasons for terminating threads (wrong encoding being just one of those reasons).

This name is too vague, I agreed. I thought Python didn't natively support bit field. I will make this a dict with meaningful key names. Hopefully, it can be a better helpful function.

gheinrich · 2016-10-04T13:56:43Z

digits/tools/create_db.py

@@ -667,7 +695,11 @@ def _save_means(image_sum, image_count, mean_files):
    """
    Save mean[s] to file
    """
-    mean = np.around(image_sum / image_count).astype(np.uint8)
+    mean = np.around(image_sum / image_count)
+    if mean.max()>255:


why not always use float for the mean? We don't use encoding for the mean.
if you want to keep the float v.s. uint8 dtype, I think you also need to check the lower bound

@gheinrich , thanks on the comments. My understanding is image_sum is always positive, so the variable mean is also positive. The only reason that astype(np.uint8) doesn't convert properly is some pixels in mean are greater than 255. So I keep the original codes to handle original case.
However, it will be easier to understand if we make mean to float. Let me try that.

gheinrich · 2016-10-04T13:59:26Z

digits/tools/create_db.py

@@ -693,7 +725,10 @@ def _save_means(image_sum, image_count, mean_files):
            with open(mean_file, 'wb') as outfile:
                outfile.write(blob.SerializeToString())
        elif mean_file.lower().endswith(('.jpg', '.jpeg', '.png')):
-            image = PIL.Image.fromarray(mean)
+            if np.issubdtype(mean.dtype, np.float):
+                image = PIL.Image.fromarray(mean*255/mean.max()).convert('L')


can't we resort to 16-bit png in this case?

You are right. I don't think any pixel in mean can be greater than 65535. We should be able to do that for PNG case.

@gheinrich , it looks like if the max value is greater than 255, but much smaller than 65536 (for example, around 800), and we save it to png file. That image is almost black. The reason, I believe, is the dynamic range of 16-bit PNG is much higher than mean pixel value. We can either scale it up to full dynamic range of 16-bit PNG, or scale it down to 8-bit png/jpg. Both look the same on screen.

Indeed it's impossible to know how to best visualize data on server side. This is something we discussed with @jmancewicz. Joe thinks the best way to deal with this is to have the user choose the amount of contrast, etc. though sliders on client side. I agree with him.

That sounds a good idea.

gheinrich · 2016-10-04T14:00:54Z

digits/utils/image.py

@@ -11,6 +11,20 @@
 except ImportError:
    from StringIO import StringIO

+try:


don't you want to add pydicom to requirements.txt? If the package is not installed, DIGITS will silently refuse to read DICOM files, which could be confusing to the user.

If it's required, we're going to need a deb package for it. Does this package work?
http://packages.ubuntu.com/trusty/python/python-dicom

Let me check if 0.9.7 works. Pip installed 0.9.9 and it's the one I tested.

I tested 0.9.7 and it worked without problem. We should be able to use apt python-dicom package of 0.9.7.

so you can add pydicom to requirements.txt and import dicom without a try/except then?

gheinrich · 2016-10-04T14:28:32Z

digits/utils/image.py

+    except dicom.errors.InvalidDicomError:
+        raise errors.LoadImageError, 'Invalid Dicom file'
+    pixels = dm.pixel_array
+    image = PIL.Image.fromarray(pixels.astype(np.float))


how do you deal with 3D/4D data (which seem to be the most common use of DICOM)?

You are right. This definitely breaks when pixels.shape is higher than 2D.

I think we can support that in the future. Adding 3D/4D data support from one single DICOM file would involve many modifications. Maybe that's because my limited understanding on the current codes.

Sure that can be a future improvement. If pixels is a 3-D array, how do you determine whether it's an RGB image (OK in DIGITS) or a volumetric grayscale image (not OK in DIGITS if more than there aren't exactly 3 slices)?

Each slice should have its meta data. Therefore, we should be able to tell if it's a multi-slice DICOM, or a multi-channel DICOM. I need to dig out that information, but I remember I saw that before.

Make _notification more general Passing 8/32 bit-depth among jobs, create_db

gheinrich

I think inference.py needs to be modified to perform the required depth conversion on test images (similar to how we perform channel conversion and resize), correct?

gheinrich · 2016-10-06T20:51:04Z

digits/templates/datasets/images/generic/summary.html

@@ -21,6 +21,7 @@
        <li><b>Image Count</b> - {{task.image_count}}</li>
        <li><b>Image Dimensions</b> -
        {{task.image_width}}x{{task.image_height}}x{{task.image_channels}}</li>
+        <li><b>Image bit-depth</b></li>


is this useful?

I missed your point. That line of code is complete nonsense. I removed it in the latest commit.

gheinrich · 2016-10-06T21:02:24Z

digits/utils/image.py

@@ -11,6 +11,20 @@
 except ImportError:
    from StringIO import StringIO

+try:


so you can add pydicom to requirements.txt and import dicom without a try/except then?

gheinrich · 2016-10-06T21:03:27Z

digits/utils/image.py

@@ -35,6 +49,27 @@
 # List of supported file extensions
 # Use like "if filename.endswith(SUPPORTED_EXTENSIONS)"
 SUPPORTED_EXTENSIONS = ('.png','.jpg','.jpeg','.bmp','.ppm')
+if dicom is not None:


if we always import dicom maybe you don't need this if?

That's correct. We can remove that if after we require pydicom package.

gheinrich · 2016-10-06T21:06:10Z

digits/utils/image.py

+    except dicom.errors.InvalidDicomError:
+        raise errors.LoadImageError, 'Invalid Dicom file'
+    pixels = dm.pixel_array
+    image = PIL.Image.fromarray(pixels.astype(np.float))


Sure that can be a future improvement. If pixels is a 3-D array, how do you determine whether it's an RGB image (OK in DIGITS) or a volumetric grayscale image (not OK in DIGITS if more than there aren't exactly 3 slices)?

IsaacYangSLA · 2016-10-07T04:03:11Z

If an inference task applies to a high bit-depth image a model which is trained with 8-bit images, it's likely to generate incorrect results. If resize is needed for test image, the function will follow test image's bit-depth to preserve its dynamic range.

gheinrich · 2016-10-07T14:47:51Z

When you create a classification dataset, all images are resized to the specified dimensions and converted to the specified color scheme (grayscale or RGB). It would feel awkward if this wasn't the case for the bit depth too. I understand it's probably more work than was originally planned but I think the bit depth should be enforced. Besides, doing this (both during dataset creation and inference) would avoid a lot of issues that could arise if a user mixes 8bpc and 18bpc images.

Remove import error handling on dicom package Use model's train data to limit inference data Conditional display bit-depth of dataset Save to 16-bit png for Torch inferencing

IsaacYangSLA · 2016-10-07T21:00:36Z

@gheinrich, thanks a lot. I think we are getting closer to finish it. Let me explain it and see if it makes sense to you.
An dicom image, after being loaded, will have image mode 'F.' And during resize, that 'F' is maintained, and no scaling is done on pixel values. At the very last stage, if user selected bpp = 32, and unencoded for LMDB, DIGITS save the complete information. The trained model remembers its dataset and its bpp. At inferencing, only the combination of mode 'F' images and model trained with bpp=8 will raise exception. If no exception, the resize stage of test images will apply original dataset's bpp (any image modes with bpp=32, or non-'F' modes with bpp=8). So the model will see the test images processed by the same bpp.

gheinrich · 2016-10-10T20:36:10Z

digits/dataset/images/forms.py

+    resize_bpp = utils.forms.SelectField(u'Bits per pixel',
+            default='8',
+            choices=[('8', '8-bit (color or grayscale)'), ('32', '32-bit floating point (grayscale only)')],
+            tooltip="Storing 32-bit floating point for certain medical images."


may be useful to have a more verbose tooltip to state the fact that bit depth is only checked, not enforced through conversion

gheinrich · 2016-10-10T20:41:20Z

@IsaacYangSLA I think it makes sense indeed.

Loading/classifying/inferencing

179fcb5

Notify users on encoding for higher depth images. Fix _save_means.

lukeyeager added the enhancement label Oct 4, 2016

gheinrich suggested changes Oct 4, 2016

View reviewed changes

UI to select 8 or 32 bit-depth

3120bca

Make _notification more general Passing 8/32 bit-depth among jobs, create_db

gheinrich suggested changes Oct 6, 2016

View reviewed changes

Add PyDicom in requirements.txt

b20261b

Remove import error handling on dicom package Use model's train data to limit inference data Conditional display bit-depth of dataset Save to 16-bit png for Torch inferencing

gheinrich approved these changes Oct 10, 2016

View reviewed changes

lukeyeager mentioned this pull request Oct 21, 2016

Fix python lint under plugins/ #1198

Merged

Supporting DICOM via pydicom #1136

Are you sure you want to change the base?

Supporting DICOM via pydicom #1136

Conversation

IsaacYangSLA commented Oct 3, 2016

gheinrich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gheinrich Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gheinrich Oct 6, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gheinrich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IsaacYangSLA Oct 6, 2016 • edited Loading

Choose a reason for hiding this comment

gheinrich Oct 6, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IsaacYangSLA Oct 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IsaacYangSLA commented Oct 7, 2016

gheinrich commented Oct 7, 2016 • edited Loading

IsaacYangSLA commented Oct 7, 2016

Choose a reason for hiding this comment

gheinrich commented Oct 10, 2016

gheinrich Oct 4, 2016 •

edited

Loading

gheinrich Oct 6, 2016 •

edited

Loading

IsaacYangSLA Oct 6, 2016 •

edited

Loading

gheinrich Oct 6, 2016 •

edited

Loading

IsaacYangSLA Oct 7, 2016 •

edited

Loading

gheinrich commented Oct 7, 2016 •

edited

Loading