Skip to content

Commit

Permalink
Merge pull request #18 from bebatut/split_commands
Browse files Browse the repository at this point in the history
Split script into 2 commands: 1 to extract, 1 to filter tools

Tested locally, got:
1295 Tool "Suits" with 16 duplicates which accounts to 2915 tool ids.

Thank you @bebatut !
  • Loading branch information
paulzierep authored Nov 1, 2023
2 parents 2aa1087 + c667c6e commit 30cb077
Show file tree
Hide file tree
Showing 7 changed files with 187 additions and 121 deletions.
11 changes: 11 additions & 0 deletions .isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[settings]
combine_as_imports=true
force_alphabetical_sort_within_sections=true
# Override force_grid_wrap value from profile=black, but black is still happy
force_grid_wrap=2
# Same line length as for black
line_length=120
no_lines_before=LOCALFOLDER
profile=black
reverse_relative=true
skip_gitignore=true
55 changes: 28 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,24 +38,22 @@ Galaxy Tool extractor
$ python3 -m pip install -r requirements.txt
```
# Extract tools for categories in the ToolShed
## Extract all tools
1. Get an API key ([personal token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens)) for GitHub
2. (Optional) Create a text file with ToolShed categories for which tools need to be extracted: 1 ToolShed category per row ([example for microbial data analysis](data/microgalaxy/categories))
3. (Optional) Create a text file with list of tools to exclude: 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_exclude))
4. (Optional) Create a text file with list of tools to really keep (already reviewed): 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_keep))
4. Run the tool extractor script
2. Export the GitHub API key as an environment variable:
```
$ python bin/extract_galaxy_tools.py \
--api <GitHub API key> \
--output <Path to output file> \
[--categories <Path to ToolShed category file>] \
[--exclude <Path to excluded tool file category file>]\
[--keep <Path to to-keep tool file category file>]
$ export GITHUB_API_KEY=<your GitHub API key>
```
3. Run the script
```
$ python bin/extract_all_tools.sh
```
The script will generate a CSV file with each tool found in the list of GitHub repository and several information for these tools:
The script will generate a TSV file with each tool found in the list of GitHub repositories and metadata for these tools:
1. Galaxy wrapper id
2. Description
Expand All @@ -73,27 +71,30 @@ The script will generate a CSV file with each tool found in the list of GitHub r
14. Galaxy wrapper version
15. Conda id
16. Conda version
17. Reviewed
18. To keep
## For microbial related tools
## Filter tools based on their categories in the ToolShed
For microGalaxy, a Bash script in `bin` can used by:
1. Exporting the GitHub API key as an environment variable:
1. Run the extraction as explained before
2. (Optional) Create a text file with ToolShed categories for which tools need to be extracted: 1 ToolShed category per row ([example for microbial data analysis](data/microgalaxy/categories))
3. (Optional) Create a text file with list of tools to exclude: 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_exclude))
4. (Optional) Create a text file with list of tools to really keep (already reviewed): 1 tool id per row ([example for microbial data analysis](data/microgalaxy/tools_to_keep))
4. Run the tool extractor script
```
$ export GITHUB_API_KEY=<your GitHub API key>
$ python bin/extract_galaxy_tools.py \
--tools <Path to CSV file with all extracted tools> \
--filtered_tools <Path to output CSV file with filtered tools> \
[--categories <Path to ToolShed category file>] \
[--excluded <Path to excluded tool file category file>]\
[--keep <Path to to-keep tool file category file>]
```
2. Running the script
```
$ bash bin/extract_microgalaxy_tools.sh
```
### Filter tools for microbial data analysis
It will:
1. Update the files in the `data/microgalaxy` folder
2. Export the tools into `microgalaxy_tools.csv`
For microGalaxy, a Bash script in `bin` can used by running the script
```
$ bash bin/extract_microgalaxy_tools.sh
```
It will take the files in the `data/microgalaxy` folder and export the tools into `microgalaxy_tools.csv`
8 changes: 8 additions & 0 deletions bin/extract_all_tools.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash

mkdir -p 'results/'

python bin/extract_galaxy_tools.py \
extractools \
--api $GITHUB_API_KEY \
--all_tools 'results/all_tools.tsv'
Loading

0 comments on commit 30cb077

Please sign in to comment.