How to perform parallel BLAST
|
BLAST can be time-consuming especially when it includes a large number of the query sequence. Therefore, parallel BLAST can be useful. This Bash script can perform parallel BLASt in three common steps
script_name: Name of given script
blast_type: What kind of BLAST you want to perform e.g. blastn
query_file: Name of query file
database_name: Name of the database
number_of_sequence_each_file: how many sequences you want per query file
- Split fasta file into files with a given number of sequences
- BLAST them in parallel fashion
- Combine their result
USES
This bash script require following commandsscript_name: Name of given script
blast_type: What kind of BLAST you want to perform e.g. blastn
query_file: Name of query file
database_name: Name of the database
number_of_sequence_each_file: how many sequences you want per query file
SCRIPT
#!/bin/bash ######################################################################################################################## # # split fasta file into files with given number of sequences # blast them in parallel fashion # combine their result ######################################################################################################################## prog="$1" query="$2" database="$3" num_seq="$4" if [[ $# -lt 3 ]] ; then printf "\033[1;31mGive me a proper command\033[0m\n" printf "\033[1;31mUsage: script_name blast_type query_file database name number_of_sequence_each_file\033[0m\n\n" exit; else start=`date +%s` #split fasta sequences in given number of sequences per file awk -v a_seq=$num_seq 'BEGIN {n_seq=0;} /^>/ {if(n_seq%a_seq==0){file=sprintf("myseq%d.fa",n_seq);} print >> file; n_seq++; next;} { print >> file; }' < $query #run blast ls *.fa | parallel -a - $prog -query {} -db $database -out {.}.out -evalue 0.001 -num_descriptions 1 -num_alignments 1 -num_threads 8 cat *.out >$query.blast.result while true; do read -p "Do you want to delete intermediate files?" yn case $yn in [Yy]* ) rm -f *.out *.fa; break;; [Nn]* ) exit;; * ) echo "Please answer yes or no.";; esac done runtime=$((end-start)) printf "\033[1;31mYour analysis was done in $((($(date +%s)-$start)/60)) minutes\033[0m\n"${reset} printf "\033[1;31mTHANKS\033[0m\n"${reset} fi
Related Posts Bioinformatics resources,
BLAST,
HOW TO
|
Was This Post Useful? Add This To Del.icio.us Share on Facebook StumbleUpon This Add to Technorati Share on Twitter |
Labels:
Bioinformatics resources,
BLAST,
HOW TO
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Have Problem ?? Drop a comments here!