How to perform parallel BLAST

BLAST can be time-consuming especially when it includes a large number of the query sequence. Therefore, parallel BLAST can be useful. This Bash script can perform parallel BLASt in three common steps
  • Split fasta file into files with a given number of sequences 
  • BLAST them in parallel fashion
  • Combine their result
Since it doesn't include any additional software except BLAST+, therefore, it is easy to use.  If you want to parse your BLAST result you can always use NCBI BLAST parser from here.

USES

This bash script require following commands
script_name: Name of given script
blast_type: What kind of BLAST you want to perform e.g. blastn
query_file: Name of query file
database_name: Name of the database
number_of_sequence_each_file: how many sequences you want per query file

SCRIPT

#!/bin/bash

########################################################################################################################
#
#     split fasta file into files with given number of sequences 
#     blast them in parallel fashion
#     combine their result
########################################################################################################################

prog="$1"
query="$2"
database="$3"
num_seq="$4"


if [[ $# -lt 3 ]] ; then
    printf "\033[1;31mGive me a proper command\033[0m\n"
    printf "\033[1;31mUsage: script_name blast_type query_file database name number_of_sequence_each_file\033[0m\n\n"
    
exit;


else

start=`date +%s`

#split fasta sequences in given number of sequences per file 
awk -v a_seq=$num_seq 'BEGIN {n_seq=0;} /^>/ {if(n_seq%a_seq==0){file=sprintf("myseq%d.fa",n_seq);} print >> file; n_seq++; next;} { print >> file; }' < $query



#run blast
ls *.fa | parallel -a - $prog -query {} -db $database -out {.}.out -evalue 0.001 -num_descriptions 1 -num_alignments 1 -num_threads 8 

cat *.out >$query.blast.result


while true; do
    read -p "Do you want to delete intermediate files?" yn
    case $yn in
        [Yy]* ) rm -f *.out *.fa; break;;
        [Nn]* ) exit;;
        * ) echo "Please answer yes or no.";;

    esac
done

runtime=$((end-start))
printf "\033[1;31mYour analysis was done in $((($(date +%s)-$start)/60)) minutes\033[0m\n"${reset}
printf "\033[1;31mTHANKS\033[0m\n"${reset}
   
fi



No comments:

Post a Comment

Have Problem ?? Drop a comments here!