Skip to content

mdjaffardjy/Reuse_in_processes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT licensed Version 1.0.1

Reuse_in_processes

This repository contains the code for investigating reuse using levenshtein distance in Nextflow and Snakemake processors.

  • In the folder script there is the "library" of all methods for computing the similarity scores, grouping the processors and compiling the results in dataframes. The pipeline contains the pipeline for producing these scores and dataframes.
  • In the folder notebook are the notebooks in which we visualize our investigation's analyses.

Source data

The original data that the analysis is performed on can be found in the folder json/source_files. In this folder there is workflow and author metadata froma crawl of snakemake and nextflow workflows . There is also processors information files. Note that these are results of the [Nextflow parser] and [Snakemake parser] that were reunited in a single list of dictionnaries in a json, and with an additional "shell" key containing the isolated shell scripts of the processes. They were also filtered to only keep the processors containing at least one tool in a shell.

Contribute

Please submit GitHub issues to provide feedback or ask for new features, and contact me for any related question.

To Run

A conda environment can be found here.

Step 1 : compute the scores and generate dataframes

This step can take a few hours.

python3 script/run_levenshtein_all.py

Step 2 : read notebooks

The notebooks containing the analyses can now be launched :

  • To take a look at process reuse repartition in workflows in this notebook. This notebook contains the figures for nextflow and snakemake processors reuse, as well as the comparison between nf-core and non nf-core processors reuse.
  • To see the tools repartitions in workflows, the code can be foudn in this notebook

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published