Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor cookiecutter #207

Closed
16 tasks done
pkcakeout opened this issue Aug 21, 2019 · 10 comments
Closed
16 tasks done

Processor cookiecutter #207

pkcakeout opened this issue Aug 21, 2019 · 10 comments
Assignees

Comments

@pkcakeout
Copy link

pkcakeout commented Aug 21, 2019

Required features

  • Loading of images must be automated
  • Processor specifications must describe which input formats are expected
    • If files are not specified they should be ignored
    • Inputs should support MHD/MHA/TIFF image formats
    • Inputs should support CSV format
  • Support for multi-case processing is desirable
  • Call a standard function to execute processing
    • Similar to evalutils, this should be generated automatically from the cookiecutter project
  • Results:
    • JSON output (as python data structures)
    • Later: Overlays
  • Configurable required resources for running the container
    • Automatically switch GPU enabled docker image if req. gpu_count > 0
  • Dockerfile generation
  • Local test run support
  • Write tests
    • processor template generation
    • processor CLI tests
    • processor template runs on dummy data
    • local test / build / export tools works
  • Partly updated documentation (needs a more comprehensive guide in the future)
@pkcakeout
Copy link
Author

@pkcakeout
Copy link
Author

pkcakeout commented Aug 21, 2019

  • Setup Continuous Integration builds. Options:
    • CircleCI
    • GitHub Actions (in beta)

@silvandeleemput
Copy link
Member

After studying the processor specifications I came up with some changes to the /output/results.json format specifications which I think are both simpler and an improvement over the current specification.

To accommodate most use cases, processors should support one-to-many, many-to-one, and many-to-many file mappings (input -> output). To be complete, processors should support zero-to-many and many-to-zero mappings as well.

This feature is currently supported by entity specifying a list of [inputs] and under metrics specifying an output "filepath:". However, specifying the output files under metrics seems arbitrary and a bit off.

My proposal is to specify both inputs and outputs (as a list of file names / strings) under entity and remove the filepath: specification altogether frommetrics. This will simplify the output, neatly combining the input / output relation under the entity section and disentangle the outputs from the unrelated metrics section. An additional benefit is that this would simplify the automatic generation of the entity section in the results file using processor templating.

An example proposal results.json file would look something like the following:

[
  {
    "entity": {       
        "inputs": ["filenames", ...],
        "outputs": ["filenames", ...],
        ...
    },
    "metrics": ...,
    "error_messages": [
      ...
    ]
  },
  ...
]

I have discussed this with Paul and he agrees that this change will probably improve the specification. However, there seem to be some processors which follow the current specification.

@jmsmkn jmsmkn transferred this issue from comic/grand-challenge.org Aug 29, 2019
@silvandeleemput
Copy link
Member

  • Setup of continuous integrations builds can be done through the existing Travis CI setup, since we want to integrate the processor templating inside evalutils. Tests still need to be written.

  • Required resources can now be set through the evalutils CLI and get inserted into the Dockerfile using cookiecutter. See the following list:

    • com.radboudumc.diag.processor.hardware.cpu.count (int +)
    • com.radboudumc.diag.processor.hardware.cpu.capabilities (list of string *)
    • com.radboudumc.diag.processor.hardware.memory (string)
    • com.radboudumc.diag.processor.hardware.gpu.count (int *)
    • com.radboudumc.diag.processor.hardware.gpu.cuda_compute_capability (string)
    • com.radboudumc.diag.processor.hardware.gpu.memory (string)
  • Additionally, the following meta data will be required to set:

    • com.radboudumc.diag.processor.author (string)
    • com.radboudumc.diag.processor.ticket (string)
    • com.radboudumc.diag.processor.name (string)

@silvandeleemput
Copy link
Member

I studied the EnvoyAI API, which do some interesting things:

  • Specification of inputs and outputs through (json/yaml) schema in the DockerFile including specific datatypes for all
    • Input and output file paths are hardcoded.
  • Apparently it usually expects to process only a single case / run, but it does support e.g. dicom-series datatypes
  • Support for labels in the dockerfile for displaying certain datatypes

@jmsmkn
Copy link
Member

jmsmkn commented Sep 3, 2019

Looks good!

  • Additionally, the following meta data will be required to set:

    • com.radboudumc.diag.processor.author (string)
    • com.radboudumc.diag.processor.ticket (string)
    • com.radboudumc.diag.processor.name (string)

I wouldn't make these required, or at least have an option to turn this off. We will want to use this library elsewhere, eg, challenges on grand challenge that take in processors from users, rather than prediction files.

@silvandeleemput
Copy link
Member

@jmsmkn I would opt to keep the com.radboudumc.diag.processor.name label, since this will be used by cookiecutter for project folder and class creation (and maybe roughly specify its purpose)

The author label might even be unnecessary since a processor/algorithm will be tagged upon upload to grand-challenge.

The ticket label is only for internal use and I think prompting might be strange for external users. We could keep it optional for internal use though and maybe let users specify it with a flag. (should be well documented)

silvandeleemput added a commit that referenced this issue Sep 4, 2019
…ookiecutter #207

* Added basic lung nodule classification use-case
* Added basic BaseProcess class for templating
* Added pip-wheel-metadata to .gitignore
* Removed required author tag from template Dockerfile and CLI
* Made required ticket tag from template optional and non-prompt in CLI
silvandeleemput added a commit that referenced this issue Sep 4, 2019
silvandeleemput added a commit that referenced this issue Sep 4, 2019
* Added BaseProcess to __init__
* Added bootstrapping of evalutils into template on development option
* Fixed some cookiecutter variables and also in the CLI
* Added local evalutils injection for Dockerfile when the development option is specified
* Fixed bugs with the process.py.j2 template
* Inserted memory specifications in /test.sh
* Added cli test for test_templates.py
silvandeleemput added a commit that referenced this issue Sep 6, 2019
…med code cleanup #207

* Moved bootstrap code from CLI to utils.py
* Moved bootstrap calls to post-hooks in templates
* Moved convert_line_endings to utils.py
* Changed dev distribution directory name to devdist
silvandeleemput added a commit that referenced this issue Sep 6, 2019
* Fixed processor template validator spec and processing call
* Renamed test_utils.py -> test_scorers.py because it does not test utils.py but scorers.py
* Fixed BaseProcess validation, result saving, and case loading
* Fixed some docstring in utils.py and added explicit str conversion for file
silvandeleemput added a commit that referenced this issue Sep 6, 2019
Additionally shortened docstring in utils.py for flake8 tests
silvandeleemput added a commit that referenced this issue Sep 6, 2019
 * Processor test files
   * Removed ground-truth files from processor template
   * Added candidates_001.csv file to /test for processor template
   * Adjusted Dockerfile for ground-truth removal
 * Added optional pattern matching for FileLoaders to solve CSVLoader parsing .mhd files
 * Changed test.bat / test.sh to cat results.json instead of metrics.json
 * Fixed file_loaders input for process.py.j2 and various bugs
 * Fixed BaseProcess and test_templates.py to work until producing results
   * TODO validate that the results are correct, probably not yet.
silvandeleemput added a commit that referenced this issue Sep 9, 2019
* io.py shortend some long lines for flake checking
* Added hard test for output of process debugging on Travis
* Changed line endings for build.bat and export.bat to Windows format
silvandeleemput added a commit that referenced this issue Sep 10, 2019
…rn error. #207

* Reverted io.py back to last edit by James
* Removed FileLoaderPattern error.
* Removed OrderedDict as file_loaders expected type from BaseProcess and added the index_key parameter instead.
* Removed file filtering from io.py and moved it to the BaseProcess file_filters parameter instead.
* Modified test_processor.p and process.py.j2 template to reflect the changes.
silvandeleemput added a commit that referenced this issue Sep 13, 2019
* Set BaseProcess file_sorter_key to None by default
* Removed integers from candidates.csv filenames in template and in testing resource
* Modified the test to check for the new files
silvandeleemput added a commit that referenced this issue Oct 7, 2019
* Changed output json writing in evalutils.py
* Added expected test result.json resource for testing
* Modified test_processor.py to include output testing as well
* Added more realistic processing in test_processor.py
* Updated processor cookiecutter template
silvandeleemput added a commit that referenced this issue Oct 7, 2019


* Updated results.json with 2 input files
* Pointed tests towards resourcs in processor templates
* Removed candidates.csv under resources
* Finalized processor tests in test_templates.py
@jmsmkn
Copy link
Member

jmsmkn commented Jan 28, 2020

What is the status of this issue? The PR was merged but there are probably some things left over. Should we continue in a new issue for these?

@silvandeleemput
Copy link
Member

silvandeleemput commented Feb 2, 2020

@jmsmkn I think, the main things left to do are:

Since these are addressed in other issues, I think we can close this issue.

@jmsmkn
Copy link
Member

jmsmkn commented Mar 16, 2020

Continued in #246

@jmsmkn jmsmkn closed this as completed Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants