NCBI BLAST parser : Extract query and best hits
|
BLAST is a wonderful utility for sequence analysis studies. Bioinformatic analysis of thousand of sequences without BLAST is unthinkable. But if you analyzing the thousands of sequences or,in other words, if your data set is really big then analyzing the results of BLAST is another Herculean job. Therefore, people start to write BLAST parser. BLAST parser are tools which format your result in most abstractive way so that you can easily analyse your final result. PERL are most popular computing language that is used by researcher to write parser. Therefore, PERL scripts are unavoidable tools. NCBI BLAST is most commonly used bioinformatics software for sequences analysis. You can found an online NCBI BLAST parser on GreenGene server also.
There may be two kind of scenario for parsing your BLAST result.
Please remember that I have tested these NCBI BLAST parser for Standalone NCBI BLAST programme. It may/may not work for web version of NCBI BLAST
There may be two kind of scenario for parsing your BLAST result.
- You want to get the information about all your queries.
- You want to get the information about only those queries which have some hits.
Although, you can use outfmt in your BALST+ query to get output in tabular, tabular with comments, or comma-separated value formats bout it would not be easy to sort out your sequences. So Lets parser you NCBI BLAST results. These parsers were originally written by Dr. Xiaodong Bai.
Dependencies
Both PERL script NCBI blast parser are depend upon Bioperl Modules, Bio::SearchIO, so make sure they are installed on you computer.
Uses
If my BLAST result is saved in 'blast-result.txt' and I want top '5' hits in my parsed result file 'parsed-result.txt' then my command will be
parsed-result.txt will contain following information about hits and queries
Finally, you may like to finally sort out/ extract query without any hits. You can simply import tabulated NCBI blast parse file into Excel and then sort them alphabetically. Finally you can copy those queries for further processing. Follow this Bioinformatics video tutorial for more
Dependencies
Both PERL script NCBI blast parser are depend upon Bioperl Modules, Bio::SearchIO, so make sure they are installed on you computer.
- Extract out quesries with hits or without hits from NCBI BLAST result file
Script name | Download |
---|---|
blastparser.pl |
Uses
perl ncbiblastparser.pl <blast-result-file> <no of hits> <result-file name>
If my BLAST result is saved in 'blast-result.txt' and I want top '5' hits in my parsed result file 'parsed-result.txt' then my command will be
perl ncbiblastparser.pl blast-result.txt 5 parsed-result.txt
parsed-result.txt will contain following information about hits and queries
query_name\query_length\Hit accession_number\Hit length\Hit description\E value\bit score\frame\query_start
Finally, you may like to finally sort out/ extract query without any hits. You can simply import tabulated NCBI blast parse file into Excel and then sort them alphabetically. Finally you can copy those queries for further processing. Follow this Bioinformatics video tutorial for more
Related Posts NCBI,
Perl Script,
Sequence analysis
|
Was This Post Useful? Add This To Del.icio.us Share on Facebook StumbleUpon This Add to Technorati Share on Twitter |
Labels:
NCBI,
Perl Script,
Sequence analysis
Subscribe to:
Post Comments (Atom)
doesn't work for me; "Search pattern not terminated at ncbiblastparser.pl line 74"
ReplyDeletePlease download the PERL script from the link given above. Let me know if face any problem again.
DeleteHi Priyanka, the script doesn't work for me, only displays the header at the end of output. My input file is in .xml format.
ReplyDeleteHi, Actually I have tested this script on .txt file. You can save your BLAST result in .txt file to check this PERL script to parse the NSBI BLAST results
DeleteHi, Thank you for the script. We got some issues when we were trying use it.
ReplyDeleteperl: symbol lookup error: ..../perl/5.16.3/site/lib/auto/Data/Dumper/Dumper.so: undefined symbol: Perl_Istack_sp_ptr
Has anyone experienced this?
Hi Teshome Mulugeta,
ReplyDeleteVery sorry that you got this error. Please make sure that Bioperl is installed on your computer. Are you using this script on Window machine?
Hi, Yes, Bioperl is installed and tested. We are using it on Linux machine (CentOS6.5).
ReplyDeleteHi
DeleteDid you solve the problem on Linux ? If yes, please let me know how.
Thank you in advance.
Hi,
DeleteI have been using this script on Ubuntu for a long time without any trouble.
Hi Teshome Mulugeta,
ReplyDeleteThanks for letting me know about your OS. Actually I have never tested this script on Linux. You can test this NCBI parser that doesn't depend upon bioperl but remember that I have tested it on window only NCBI BLAST parser : Extract query and best hits II
Hi,
ReplyDeletehow can we cite your script?
Hi,
DeleteThanks for your question. This script is not written by me but Dr. Xiaodong Bai. However you can give the reference of this page.
Thanks for the great tool! One suggestion: it would be great to have an option to include exclusion terms, eg. if my top hit is a "hypothetical protein" and the second hit has an actual descriptor, exclude the top hit and take the second hit instead (if parsing to 1).
ReplyDeleteThanks for the great tool! One suggestion: it would be useful to have an option to exclude certain terms, eg. if my top hit is a "hypothetical protein" but my second hit is an actual description, do not include the top hit but instead include the second (if parsing to 1).
ReplyDeleteGlad to hear that it was useful for you. Thanks for your suggestion but I think it would make situation complicated because if significant hit is a hypothetical protein then there is meaning to look for other protein with some description. However you can always parse for 2 or more hits and remove the hits with hypothetical description in excel. Hope this will help you.
Delete