Processor cookiecutter #207

pkcakeout · 2019-08-21T14:53:01Z

pkcakeout · 2019-08-21T15:07:31Z

Rough template for the Dockerfile is available here:

https://github.com/DIAGNijmegen/rse-unit-of-knowledge-template/blob/master/processor/Dockerfile.template

pkcakeout · 2019-08-21T15:14:45Z

Setup Continuous Integration builds. Options:
- CircleCI
- GitHub Actions (in beta)

silvandeleemput · 2019-08-27T10:00:25Z

After studying the processor specifications I came up with some changes to the /output/results.json format specifications which I think are both simpler and an improvement over the current specification.

To accommodate most use cases, processors should support one-to-many, many-to-one, and many-to-many file mappings (input -> output). To be complete, processors should support zero-to-many and many-to-zero mappings as well.

This feature is currently supported by entity specifying a list of [inputs] and under metrics specifying an output "filepath:". However, specifying the output files under metrics seems arbitrary and a bit off.

My proposal is to specify both inputs and outputs (as a list of file names / strings) under entity and remove the filepath: specification altogether frommetrics. This will simplify the output, neatly combining the input / output relation under the entity section and disentangle the outputs from the unrelated metrics section. An additional benefit is that this would simplify the automatic generation of the entity section in the results file using processor templating.

An example proposal results.json file would look something like the following:

[
  {
    "entity": {       
        "inputs": ["filenames", ...],
        "outputs": ["filenames", ...],
        ...
    },
    "metrics": ...,
    "error_messages": [
      ...
    ]
  },
  ...
]

I have discussed this with Paul and he agrees that this change will probably improve the specification. However, there seem to be some processors which follow the current specification.

silvandeleemput · 2019-09-02T15:11:18Z

Setup of continuous integrations builds can be done through the existing Travis CI setup, since we want to integrate the processor templating inside evalutils. Tests still need to be written.
Required resources can now be set through the evalutils CLI and get inserted into the Dockerfile using cookiecutter. See the following list:
- com.radboudumc.diag.processor.hardware.cpu.count (int +)
- com.radboudumc.diag.processor.hardware.cpu.capabilities (list of string *)
- com.radboudumc.diag.processor.hardware.memory (string)
- com.radboudumc.diag.processor.hardware.gpu.count (int *)
- com.radboudumc.diag.processor.hardware.gpu.cuda_compute_capability (string)
- com.radboudumc.diag.processor.hardware.gpu.memory (string)
Additionally, the following meta data will be required to set:
- com.radboudumc.diag.processor.author (string)
- com.radboudumc.diag.processor.ticket (string)
- com.radboudumc.diag.processor.name (string)

silvandeleemput · 2019-09-02T15:29:09Z

I studied the EnvoyAI API, which do some interesting things:

Specification of inputs and outputs through (json/yaml) schema in the DockerFile including specific datatypes for all
- Input and output file paths are hardcoded.
Apparently it usually expects to process only a single case / run, but it does support e.g. dicom-series datatypes
Support for labels in the dockerfile for displaying certain datatypes

jmsmkn · 2019-09-03T07:24:45Z

Looks good!

Additionally, the following meta data will be required to set:

com.radboudumc.diag.processor.author (string)

com.radboudumc.diag.processor.ticket (string)

com.radboudumc.diag.processor.name (string)

I wouldn't make these required, or at least have an option to turn this off. We will want to use this library elsewhere, eg, challenges on grand challenge that take in processors from users, rather than prediction files.

silvandeleemput · 2019-09-03T10:38:12Z

@jmsmkn I would opt to keep the com.radboudumc.diag.processor.name label, since this will be used by cookiecutter for project folder and class creation (and maybe roughly specify its purpose)

The author label might even be unnecessary since a processor/algorithm will be tagged upon upload to grand-challenge.

The ticket label is only for internal use and I think prompting might be strange for external users. We could keep it optional for internal use though and maybe let users specify it with a flag. (should be well documented)

…ookiecutter #207 * Added basic lung nodule classification use-case * Added basic BaseProcess class for templating * Added pip-wheel-metadata to .gitignore * Removed required author tag from template Dockerfile and CLI * Made required ticket tag from template optional and non-prompt in CLI

* Added BaseProcess to __init__ * Added bootstrapping of evalutils into template on development option * Fixed some cookiecutter variables and also in the CLI * Added local evalutils injection for Dockerfile when the development option is specified * Fixed bugs with the process.py.j2 template * Inserted memory specifications in /test.sh * Added cli test for test_templates.py

…207

…med code cleanup #207 * Moved bootstrap code from CLI to utils.py * Moved bootstrap calls to post-hooks in templates * Moved convert_line_endings to utils.py * Changed dev distribution directory name to devdist

* Fixed processor template validator spec and processing call * Renamed test_utils.py -> test_scorers.py because it does not test utils.py but scorers.py * Fixed BaseProcess validation, result saving, and case loading * Fixed some docstring in utils.py and added explicit str conversion for file

Additionally shortened docstring in utils.py for flake8 tests

* Processor test files * Removed ground-truth files from processor template * Added candidates_001.csv file to /test for processor template * Adjusted Dockerfile for ground-truth removal * Added optional pattern matching for FileLoaders to solve CSVLoader parsing .mhd files * Changed test.bat / test.sh to cat results.json instead of metrics.json * Fixed file_loaders input for process.py.j2 and various bugs * Fixed BaseProcess and test_templates.py to work until producing results * TODO validate that the results are correct, probably not yet.

* io.py shortend some long lines for flake checking * Added hard test for output of process debugging on Travis * Changed line endings for build.bat and export.bat to Windows format

…mplate #207

…rn error. #207 * Reverted io.py back to last edit by James * Removed FileLoaderPattern error. * Removed OrderedDict as file_loaders expected type from BaseProcess and added the index_key parameter instead. * Removed file filtering from io.py and moved it to the BaseProcess file_filters parameter instead. * Modified test_processor.p and process.py.j2 template to reflect the changes.

* Set BaseProcess file_sorter_key to None by default * Removed integers from candidates.csv filenames in template and in testing resource * Modified the test to check for the new files

…207

* Changed output json writing in evalutils.py * Added expected test result.json resource for testing * Modified test_processor.py to include output testing as well * Added more realistic processing in test_processor.py * Updated processor cookiecutter template

* Updated results.json with 2 input files * Pointed tests towards resourcs in processor templates * Removed candidates.csv under resources * Finalized processor tests in test_templates.py

jmsmkn · 2020-01-28T08:11:35Z

What is the status of this issue? The PR was merged but there are probably some things left over. Should we continue in a new issue for these?

silvandeleemput · 2020-02-02T22:25:48Z

@jmsmkn I think, the main things left to do are:

Documentation for the processor template Documentation for the algorithm-templating #271
Adding schemas for input/output formatting and better grand-challenge integration Using schemas for explicit input and output types results.json #246

Since these are addressed in other issues, I think we can close this issue.

jmsmkn · 2020-03-16T12:37:48Z

Continued in #246

jmsmkn transferred this issue from comic/grand-challenge.org Aug 29, 2019

jmsmkn assigned silvandeleemput Aug 29, 2019

silvandeleemput added a commit that referenced this issue Sep 4, 2019

apply black code style #207

93091ec

silvandeleemput added a commit that referenced this issue Sep 4, 2019

made req_cpu_capabilities_prompt use shorter variable name #207

ec4f10b

silvandeleemput added a commit that referenced this issue Sep 6, 2019

Added evalutils injection on --dev flag also to evaluator dockerfile #…

b2246f3

…207

silvandeleemput added a commit that referenced this issue Sep 6, 2019

Fix posthook cookiecutter argument passing for flake8 tests #207

fe62bb9

Additionally shortened docstring in utils.py for flake8 tests

silvandeleemput added a commit that referenced this issue Sep 9, 2019

Reset patterns to string format #207

58d4e93

silvandeleemput added a commit that referenced this issue Sep 9, 2019

Increased required GB in test_templates.py to 2G #207

d9cecf1

silvandeleemput added a commit that referenced this issue Sep 9, 2019

Reverted tox output setting and number of images loaded in process te…

41b7754

…mplate #207

silvandeleemput added a commit that referenced this issue Sep 10, 2019

Added memory string type validation to cli.py #207

8598b18

silvandeleemput added a commit that referenced this issue Sep 10, 2019

fix length of line in evalutils.py for flake8 tests #207

e619c2b

silvandeleemput added a commit that referenced this issue Sep 13, 2019

Update description of file_sorter_key #207

bd6777a

silvandeleemput mentioned this issue Sep 30, 2019

Feature processor template generation #210

Merged

silvandeleemput added a commit that referenced this issue Oct 2, 2019

Updated the CLI commands "init" to "init-evaluator" in the usage docs #…

f946cb4

…207

silvandeleemput mentioned this issue Feb 2, 2020

Documentation for the algorithm-templating #271

Closed

jmsmkn closed this as completed Mar 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processor cookiecutter #207

Processor cookiecutter #207

pkcakeout commented Aug 21, 2019 •

edited by silvandeleemput

Loading

pkcakeout commented Aug 21, 2019

pkcakeout commented Aug 21, 2019 •

edited by silvandeleemput

Loading

silvandeleemput commented Aug 27, 2019

silvandeleemput commented Sep 2, 2019

silvandeleemput commented Sep 2, 2019

jmsmkn commented Sep 3, 2019

silvandeleemput commented Sep 3, 2019

jmsmkn commented Jan 28, 2020

silvandeleemput commented Feb 2, 2020 •

edited

Loading

jmsmkn commented Mar 16, 2020

Processor cookiecutter #207

Processor cookiecutter #207

Comments

pkcakeout commented Aug 21, 2019 • edited by silvandeleemput Loading

Required features

pkcakeout commented Aug 21, 2019

pkcakeout commented Aug 21, 2019 • edited by silvandeleemput Loading

silvandeleemput commented Aug 27, 2019

silvandeleemput commented Sep 2, 2019

silvandeleemput commented Sep 2, 2019

jmsmkn commented Sep 3, 2019

silvandeleemput commented Sep 3, 2019

jmsmkn commented Jan 28, 2020

silvandeleemput commented Feb 2, 2020 • edited Loading

jmsmkn commented Mar 16, 2020

pkcakeout commented Aug 21, 2019 •

edited by silvandeleemput

Loading

pkcakeout commented Aug 21, 2019 •

edited by silvandeleemput

Loading

silvandeleemput commented Feb 2, 2020 •

edited

Loading