Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions for adding a method #9

Merged
merged 8 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,11 @@ You need to have Docker, Java, and Viash installed. Follow
[these instructions](https://openproblems.bio/documentation/fundamentals/requirements)
to install the required dependencies.

## First steps
## Add a method

To add a method to the repository, follow the instructions in the `scripts/add_a_method.sh` script.

## Frequently used commands

To get started, you can run the following commands:

Expand Down
42 changes: 42 additions & 0 deletions scripts/add_a_method.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash

echo "This script is not supposed to be run directly."
echo "Please run the script step-by-step."
exit 1

# sync resources
scripts/download_resources.sh

# create a new component
method_id="my_method"
method_lang="python" # change this to "r" if need be

viash run src/common/create_component/config.vsh.yaml -- \
--language "$method_lang" \
--name "$method_id"

# TODO: fill in required fields in src/task/methods/foo/config.vsh.yaml
# TODO: edit src/task/methods/foo/script.py/R

# test the component
viash test src/task/methods/$method_id/config.vsh.yaml

# rebuild the container (only if you change something to the docker platform)
# You can reduce the memory and cpu allotted to jobs in _viash.yaml by modifying .platforms[.type == "nextflow"].config.labels
viash run src/task/methods/$method_id/config.vsh.yaml -- \
---setup cachedbuild ---verbose

# run the method
viash run src/task/methods/$method_id/config.vsh.yaml -- \
--de_train "resources/neurips-2023-kaggle/de_train.parquet" \
--id_map "resources/neurips-2023-kaggle/id_map.csv" \
--output "output/prediction.parquet"

# run evaluation metric
viash run src/task/metrics/mean_rowwise_rmse/config.vsh.yaml -- \
--de_test "resources/neurips-2023-kaggle/de_test.parquet" \
--prediction "output/prediction.parquet" \
--output "output/score.h5ad"

# print score on kaggle test dataset
python -c 'import anndata; print(anndata.read_h5ad("output/score.h5ad").uns)'
56 changes: 56 additions & 0 deletions src/common/create_component/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
functionality:
name: create_component
namespace: common
description: |
Create a component Viash component.

Usage:
```
bin/create_component --task denoising --type method --language r --name foo
bin/create_component --task denoising --type metric --language python --name bar
```
arguments:
- type: string
name: --language
description: Which scripting language to use. Options are 'python', 'r'.
default: python
choices: [python, r]
- type: string
name: --name
example: new_comp
description: Name of the new method, formatted in snake case.
- type: file
name: --output
direction: output
# required: true
description: Path to the component directory. Suggested location is `src/<TASK>/<TYPE>s/<NAME>`.
default: src/task/methods/${VIASH_PAR_NAME}
- type: file
name: --api_file
description: |
Which API file to use. Defaults to `src/<TASK>/api/comp_<TYPE>.yaml`.
In tasks with different subtypes of method, this location might not exist and you might need
to manually specify a different API file to inherit from.
must_exist: false
# required: true
default: src/task/api/comp_method.yaml
- type: file
name: --viash_yaml
description: |
Path to the project config file. Needed for knowing the relative location of a file to the project root.
# required: true
default: "_viash.yaml"
resources:
- type: python_script
path: script.py
- path: read_and_merge_yaml.py
platforms:
- type: docker
image: python:3.10-slim
setup:
- type: python
pypi: ruamel.yaml
- type: native
- type: nextflow


52 changes: 52 additions & 0 deletions src/common/create_component/read_and_merge_yaml.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
def read_and_merge_yaml(path):
"""Read a Viash YAML

If the YAML contains a "__merge__" key anywhere in the yaml,
the path specified in that YAML will be read and the two
lists will be merged. This is a recursive procedure.

Arguments:
path -- Path to the Viash YAML"""
from ruamel.yaml import YAML

yaml = YAML(typ='safe', pure=True)

with open(path, 'r') as stream:
data = yaml.load(stream)
return _ram_process_merge(data, path)

def _ram_deep_merge(dict1, dict2):
if isinstance(dict1, dict) and isinstance(dict2, dict):
keys = set(list(dict1.keys()) + list(dict2.keys()))
out = {}
for key in keys:
if key in dict1:
if key in dict2:
out[key] = _ram_deep_merge(dict1[key], dict2[key])
else:
out[key] = dict1[key]
else:
out[key] = dict2[key]
return out
elif isinstance(dict1, list) and isinstance(dict2, list):
return dict1 + dict2
else:
return dict2

def _ram_process_merge(data, path):
import os
if isinstance(data, dict):
processed_data = {k: _ram_process_merge(v, path) for k, v in data.items()}

if "__merge__" in processed_data:
new_data_path = os.path.join(os.path.dirname(path), processed_data["__merge__"])
new_data = read_and_merge_yaml(new_data_path)
else:
new_data = {}

return _ram_deep_merge(new_data, processed_data)
elif isinstance(data, list):
return [_ram_process_merge(dat, path) for dat in data]
else:
return data

Loading