Tonkaz is a CLI tool to verify workflow reproducibility. It compares the RO-Crate of workflow execution results and calculates the reproducibility level of each output file.
Reproducibility level is defined as follows:
- Level3 ⭐⭐⭐ : Files are identical with the same checksum
- Level2 ⭐⭐ : Files are different, but their features (file size, map rate, etc.) are similar (within threshold: 0.05)
- Level1 ⭐ : Files are different, and their features are different (beyond threshold)
- Level0 : File not found
Level3: "Fully Reproduced" <---> Level0: "Not Reproduced"
If you want to try easily, run as follows. It compares the execution results of nf-core/rnaseq v3.7 twice in the same Linux environment.
$ tonkaz ./tests/example_crate/rnaseq_1st.json ./tests/example_crate/rnaseq_2nd.json
# Example output:
$ cat ./tests/comparison_results/rnaseq_same_env.log
We provide various examples in the tests/README.md. Please check it out.
Use a single binary that is built without any dependencies.
# for Linux x86_64
$ curl -fsSL -o ./tonkaz https://github.com/sapporo-wes/tonkaz/releases/latest/download/tonkaz_x86_64-unknown-linux-gnu
# for Mac x86_64
$ curl -fsSL -o ./tonkaz https://github.com/sapporo-wes/tonkaz/releases/latest/download/tonkaz_x86_64-apple-darwin
# for Mac Apple silicon
$ curl -fsSL -o ./tonkaz https://github.com/sapporo-wes/tonkaz/releases/latest/download/tonkaz_aarch64-apple-darwin
$ chmod +x ./tonkaz
$ ./tonkaz --help
Or, use the Docker environment:
docker run -it --rm ghcr.io/sapporo-wes/tonkaz:latest --help
Pass two crates as arguments to the tonkaz
command. (local file or URL)
tonkaz crate1.json crate2.json
For more details:
$ tonkaz -h
Tonkaz 0.1.0 by @suecharo
CLI tool to verify workflow reproducibility
Usage: tonkaz [options] crate1 crate2
Options:
-a, --all Use all output files for comparison
-t, --threshold <threshold> Set threshold for comparison (default: 0.05)
-h, --help Show this help message and exit
-v, --version Show version and exit
Examples:
$ tonkaz crate1 crate2
$ tonkaz crate1 https://example.com/crate2
$ tonkaz https://example.com/crate1 https://example.com/crate2
Tonkaz supports ONLY RO-Crate generated by Sapporo-service
(version 1.6.0 or newer) or Yevis-cli
.
For more information about Sapporo and Yevis, please see these repositories.
The RO-Crate can be generated to pass the --fetch-ro-crate
option to Yevis-cli's test
command as follows:
# Execute the workflow
$ yevis test --fetch-ro-crate https://example.com/path/to/yevis-metadata-file
# The RO-Crate is generated in the `test-logs` directory
$ ls test-logs/
ro-crate-metadata_c13b6e27-a4ee-426f-8bdb-8cf5c4310bad_1.0.0_test_1.json
Or, the RO-Crate can be generated from Sapporo's run_dir.
# At Sapporo run_dir
$ ls
cmd.txt run.sh state.txt
exe/ run_request.json stderr.log
executable_workflows.json sapporo_config.json stdout.log
outputs/ service_info.json workflow_engine_params.txt
run.pid start_time.txt yevis-metadata.yml
# Execute sapporo/ro_crate.py script
$ docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v $PWD:$PWD -w $PWD ghcr.io/sapporo-wes/sapporo-service:latest python3 /app/sapporo/ro_crate.py $PWD
Sapporo-service, from version 1.6.0 onwards, introduced a feature where MultiQC is executed after workflow completion, and its results are added to the RO-Crate. Tonkaz can compare these MultiQC results. MultiQC checks the workflow output, extracts units per sample, and aggregates statistical data from each tool for individual samples.
Previously, Tonkaz facilitated comparisons at the file level, but with the introduction of MultiQC comparison, it is now possible to compare at the sample level. Please note that since it is difficult to handle file-level and sample-level comparisons in parallel, the MultiQC comparison is an additional feature.
However, this sample-level comparison is considered beneficial as it allows for more detailed comparison of workflow results.
An example of the comparison results is as follows:
- Salmon Num_mapped
.--------------------------------------------------------------------------.
| Sample | in Crate1 | in Crate2 | Level |
|--------------------------------|--------------|--------------|-----------|
| RAP1_IAA_30M_REP1 | 38268 | 40165 | ⭐⭐ |
| RAP1_UNINDUCED_REP1 | 39317 | 39317 | ⭐⭐ |
| RAP1_UNINDUCED_REP2 | 78884 | 81361 | ⭐⭐ |
| WT_REP1 | 74109 | 74109 | ⭐⭐ |
| WT_REP2 | 37368 | 37368 | ⭐⭐ |
'--------------------------------------------------------------------------'
- Samtools Flagstat_total
.--------------------------------------------------------------------------.
| Sample | in Crate1 | in Crate2 | Level |
|--------------------------------|--------------|--------------|-----------|
| RAP1_IAA_30M_REP1 | 94912 | 94912 | ⭐⭐ |
| RAP1_UNINDUCED_REP1 | 49040 | 49040 | ⭐⭐ |
| RAP1_UNINDUCED_REP2 | 98338 | 98338 | ⭐⭐ |
| WT_REP1 | 188243 | 188241 | ⭐⭐ |
| WT_REP2 | 94419 | 94419 | ⭐⭐ |
'--------------------------------------------------------------------------'
## Development
We use [Deno](https://deno.land/) `v1.40.2`.
If you want to use the Docker environment, please run the following command:
```bash
$ docker run -it --rm -v $PWD:$PWD -w $PWD denoland/deno:1.40.2 deno --version
deno 1.40.2 (release, x86_64-unknown-linux-gnu)
v8 12.1.285.6
typescript 5.3.3
Please see ./tests
directory.
Apache-2.0. See the LICENSE.