Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelisation of taxonomy_ranks #1

Open
HuoJnx opened this issue Nov 2, 2022 · 2 comments
Open

Parallelisation of taxonomy_ranks #1

HuoJnx opened this issue Nov 2, 2022 · 2 comments

Comments

@HuoJnx
Copy link

HuoJnx commented Nov 2, 2022

Hello. I think taxonomy_ranks is a very convenient tool for lineage annotation. But it's a little bit slow, so I wrote a script to parallelize it and found it works. It can fasten the speed 4 times for 21579 queries on a 48-thread server. Hope that the script can help anyone who needs it. ^_^

Code

taxaranks_parallel(){

    ## stop after error
    set -e
    
    ## get current directory
    current_dir=$(pwd)
    
    ##parse the input path
    input=$1
    dir=$(dirname $input|xargs realpath)
    base=$(basename $input)
    real_input="${dir}/${base}"
    echo "Input is $input."
    
    ## go to sub_dir
    sub_dir="${dir}/split_${base}"
    rm -rf $sub_dir; mkdir -p $sub_dir
    cd $sub_dir
    echo "Create temporary directory $sub_dir."
    
    ## get parameters for spliting, then split
    total_line=$(cat $real_input|wc -l )
    threads=$(nproc)
    need_length=3
    split -a $need_length -d -n "l/${threads}" $real_input
    echo "Have $threads threads, split the file to $threads parts."
    
    ## run taxaranks in parallel
    echo "Annotating..."
    ls .|parallel "taxaranks -i {} -o {}.lineage -t"
    
    ## merge
    merge_file="../${base}.lineage"
    merge_file_with_head="../${base}.lineage.with_head"
    
    #### drop the first line for each file, then merge
    rm -rf $merge_file;ls *.lineage|parallel "awk 'NR>1 {print}' {} &>> $merge_file"
    
    #### add the first line for the merge file
    head_line=$(ls *.lineage|head -n1|xargs head -n1)
    awk -v a="$head_line" 'BEGIN{print a} {print $0}' $merge_file &>$merge_file_with_head
    rm -rf $merge_file;mv $merge_file_with_head $merge_file
    
    ## remove the sub_dir
    rm -rf $sub_dir
    echo "Clear temporary directory."
    
    ## back to the previous working directory
    cd $current_dir
    
    ## prompt
    echo "All finished."
}

Example

Without parallelization

image

With parallelization

image

@linzhi2013
Copy link
Owner

Hi HuoJnx,

Thanks a lot for your suggestion!

I will post it on the main page of the project.

Cheers
Guanliang

linzhi2013 added a commit that referenced this issue Nov 2, 2022
@HuoJnx
Copy link
Author

HuoJnx commented Nov 3, 2022

Wow! I'm happy to be of help! ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants