Preprocessors

Preprocessors to remove DICOM masks and generate segmentations.

Prepare Dependencies

pip install opencv-python pydicom matplotlib pandas

Command Usage

usage: main.py [-h] --dcm-dir DCM_DIR 
                    --label-dir LABEL_DIR
                    --target-dir TARGET_DIR
                    --mode {inspector,mask,roi,spreadsheet,cvat,transform,classify,label}
                    [--filterable-csv-file FILTERABLE_CSV_FILE]
                    [--filterable-dataset-type {train,valid,test}]
                    [--filterable-keep-issues]
                    [--overwrite-label-type {cvat}]
                    [--overwrite-label-file OVERWRITE_LABEL_FILE]
                    [--new-shape NEW_SHAPE]
                    [--crop-image CROP_IMAGE]
                    [--jobs JOBS]

This is a preprocessor to remove DICOM masks and generate segmentations and
its inspectors, masks and ROIs.

optional arguments:
  -h, --help            show this help message and exit
  --dcm-dir DCM_DIR     The DICOM root directory
  --label-dir LABEL_DIR
                        The JSON labels root directory
  --target-dir TARGET_DIR
                        The destination root directory for outputs
  --mode {inspector,mask,roi,spreadsheet,cvat}
                        inspector    Generate four-in-one images to compare masks, overlay 
                                     and noise-eliminated with original image
                        mask         Generate binary masks that will be used as Dataset
                                     for segmentation models
                        roi          Generate region-of-interest images that will be used 
                                     as Dataset for classification model
                        spreadsheet  Generate CSV files that contains encrypted
                                     patients identifiers and its file name
                        cvat         Generate a XML file that contains segmentation mask polygons
                                     to be uploaded on CVAT
                        transform    Generate the original dataset images but necessarily transformed
                        classify     Generate region-of-interest images that will be used
                                     as Dataset for classification model, but sorts into phase labels
                        label        Generate a CSV file that contains file name and its cancer phases
  --filterable-csv-file FILTERABLE_CSV_FILE
                        The CSV file to be used for filtering broken datasets out
  --filterable-dataset-type {train,valid,test}
                        The type of dataset source directory for querying filterable CSV file
  --filterable-keep-issues
                        A flag to keep issued rows in filterable CSV file
  --overwrite-label-type {cvat}
                        The type of overrideable labels format to parse
                        
                        cvat        CVAT 1.1 XML annotation format
                                    Pass 'annotations.xml' file to --overwrite-label-file argument
  --overwrite-label-file OVERWRITE_LABEL_FILE
                        The label file to be used for overwriting dataset labels
  --new-shape NEW_SHAPE
                        WxH. Resize the output image with desired width and height - e.g.) 224x224
  --crop-image CROP_IMAGE
                        X:Y,W:H. Crop the output image to desired rectangle - e.g.) 90:0,480:480
  --jobs JOBS           Number of workers

usage: assign.py [-h] --csv-file CSV_FILE 
                      --source-dirs SOURCE_DIRS
                      --target-dir TARGET_DIR

This is an assigner for assigning dataset validation job fairly.

optional arguments:
  -h, --help            show this help message and exit
  --csv-file CSV_FILE   The assignees CSV file
  --source-dirs SOURCE_DIRS
                        dir1,dir2,dir3,..  The source directories to be assigned to
  --target-dir TARGET_DIR
                        The destination directory where the assigned directory will be located

Dataset Hierarchy

Dataset:
- {LABEL_DIR}
    - *
        - ENDO
            - *.json
- {DCM_DIR}
    - *
        - ENDO
            - *.dcm

CSV Formats

Filterable CSV File

filterable-dataset-type,patient_id,image_id,assignee,issue

train,00000001,00000001_0001,John,TRUE
valid,00000002,00000002_0001,James,FALSE
test,00000003,00000003_0001,Alice,FALSE

TRUE means the row have an issue, and the image will be truncated in the result.

Assignees CSV File

John,James,Alice
,,
00000001,00000002,00000003
00000004,00000005,
,00000006,

Proper assignees CSV file is required to separate the dataset fairly.

Result:
- John
    - 00000001
        - *.jpg
    - 00000004
        - *.jpg
- James
    - 00000002
        - *.jpg
    - 00000005
        - *.jpg
    - 00000006
        - *.jpg
- Alice
    - 00000003
        - *.jpg

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
filters		filters
modes		modes
overwriters		overwriters
pipes		pipes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
assign.py		assign.py
main.py		main.py
preprocessors.py		preprocessors.py
statistic.py		statistic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocessors

Prepare Dependencies

Command Usage

Dataset Hierarchy

CSV Formats

Filterable CSV File

Assignees CSV File

About

Releases

Languages

License

Basars/preprocessors

Folders and files

Latest commit

History

Repository files navigation

Preprocessors

Prepare Dependencies

Command Usage

Dataset Hierarchy

CSV Formats

Filterable CSV File

Assignees CSV File

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages