NCBI BLAST parser : Extract query and best hits

BLAST is a wonderful utility for sequence analysis studies. Bioinformatic analysis of thousand of sequences without BLAST is unthinkable. But if you analyzing the thousands of sequences or,in other words, if your data set is really big then analyzing the results of BLAST is another Herculean job.  Therefore, people start to write BLAST parser. BLAST parser are tools which format your result in most abstractive way so that you can easily analyse your final result. PERL are most popular computing language that is used by researcher to write parser. Therefore, PERL scripts  are unavoidable tools. NCBI BLAST is most commonly used bioinformatics software for sequences analysis. You can found an online NCBI BLAST parser on GreenGene server also.

  • How to install BLAST on window HERE
  • How to run BLAST on WINDOW   HERE
Please remember that I have tested these NCBI BLAST parser for Standalone NCBI BLAST programme. It may/may not work for web version of NCBI BLAST

There may be two kind of scenario for parsing your BLAST result.

  • You want to get the information about all your queries.
  • You want to get the information about only those queries which have some hits. 
Although, you can use outfmt  in your BALST+ query to get output in tabular, tabular with comments, or comma-separated value formats bout it would not be easy to sort out your sequences. So Lets parser you NCBI  BLAST results. These parsers were originally written by Dr. Xiaodong Bai.

Dependencies
Both PERL script NCBI blast parser are depend upon Bioperl Modules, Bio::SearchIO, so make sure they are installed on you computer.


  1. NCBI BLAST parser 1
  2. Final Steps

  1. Extract out quesries with hits or without hits from NCBI BLAST result file
ncbiblastparser.pl

Script name Download
blastparser.pl

Uses

perl ncbiblastparser.pl <blast-result-file> <no of hits> <result-file name> 

If my BLAST result is saved in 'blast-result.txt' and I want top '5' hits in my parsed result file 'parsed-result.txt' then my command will be
perl ncbiblastparser.pl blast-result.txt 5 parsed-result.txt 

parsed-result.txt will contain following information about hits and queries
query_name\query_length\Hit accession_number\Hit length\Hit description\E value\bit score\frame\query_start

Finally, you may like to finally sort out/ extract query without any hits. You can simply import tabulated NCBI blast parse file into Excel and then sort them alphabetically. Finally you can copy those queries for further processing. Follow this Bioinformatics video tutorial for more




15 comments:

  1. doesn't work for me; "Search pattern not terminated at ncbiblastparser.pl line 74"

    ReplyDelete
    Replies
    1. Please download the PERL script from the link given above. Let me know if face any problem again.

      Delete
  2. Hi Priyanka, the script doesn't work for me, only displays the header at the end of output. My input file is in .xml format.

    ReplyDelete
    Replies
    1. Hi, Actually I have tested this script on .txt file. You can save your BLAST result in .txt file to check this PERL script to parse the NSBI BLAST results

      Delete
  3. Hi, Thank you for the script. We got some issues when we were trying use it.

    perl: symbol lookup error: ..../perl/5.16.3/site/lib/auto/Data/Dumper/Dumper.so: undefined symbol: Perl_Istack_sp_ptr

    Has anyone experienced this?

    ReplyDelete
  4. Hi Teshome Mulugeta,
    Very sorry that you got this error. Please make sure that Bioperl is installed on your computer. Are you using this script on Window machine?

    ReplyDelete
  5. Hi, Yes, Bioperl is installed and tested. We are using it on Linux machine (CentOS6.5).

    ReplyDelete
    Replies
    1. Hi
      Did you solve the problem on Linux ? If yes, please let me know how.
      Thank you in advance.

      Delete
    2. Hi,
      I have been using this script on Ubuntu for a long time without any trouble.

      Delete
  6. Hi Teshome Mulugeta,
    Thanks for letting me know about your OS. Actually I have never tested this script on Linux. You can test this NCBI parser that doesn't depend upon bioperl but remember that I have tested it on window only NCBI BLAST parser : Extract query and best hits II

    ReplyDelete
  7. Hi,

    how can we cite your script?

    ReplyDelete
    Replies
    1. Hi,
      Thanks for your question. This script is not written by me but Dr. Xiaodong Bai. However you can give the reference of this page.

      Delete
  8. Thanks for the great tool! One suggestion: it would be great to have an option to include exclusion terms, eg. if my top hit is a "hypothetical protein" and the second hit has an actual descriptor, exclude the top hit and take the second hit instead (if parsing to 1).

    ReplyDelete
  9. Thanks for the great tool! One suggestion: it would be useful to have an option to exclude certain terms, eg. if my top hit is a "hypothetical protein" but my second hit is an actual description, do not include the top hit but instead include the second (if parsing to 1).

    ReplyDelete
    Replies
    1. Glad to hear that it was useful for you. Thanks for your suggestion but I think it would make situation complicated because if significant hit is a hypothetical protein then there is meaning to look for other protein with some description. However you can always parse for 2 or more hits and remove the hits with hypothetical description in excel. Hope this will help you.

      Delete

Have Problem ?? Drop a comments here!