How to Predict Gene from Multiple Sequence

Gene prediction tools or ORF finders are inevitable tool for both molecular biologist and bioinformaticians. Therefore, There are so many softwares and server all around to predict the gene in given genomic DNA sequences. Some of these gene prediction tools are trained to predict gene in a specific genome while some work ab initio also. Problem with most of gene prediction servers is their output. Output of ORF finders are OK if you have single or few genes as input but it is hard to handle the output if size of input file is very big in other words gene prediction for multiple sequence is difficult if you don't any programming language.
In this post, lets discuss about a server that use to prediction gene from multiple sequences.
ORF FIND is hosted on GreenGene, University of Massachusetts, Lowell. It's simple interface is really easy to use. This ORF finder at Greengnene server find ORFs in multiple DNA sequence file by using GLIMMER to find the ORF coordinates and EMBOSS to extract the amino acid sequences out of predicted ORF DNA sequences.


Steps in gene prediction from multiple sequences by ORF finder
Finally, result of gene prediction from many sequence will appear in a temporary folder where predicted ORFS, predicted protein and input can be easily found. 

ORF Finder result folder
Here, It's important to note that input file format is important for successful prediction from multiple sequences. Your multiple fasta format should always contain sequences in single line after '>sequence description' line. Look below for detail :

> Correct Format
CCTCCTCCTGTTTTTCCCTCAATACAACCTCATTGGATTATTCAATTCACCATCCTGCCCTTGTTCCTTCCATTATACAGCTGTCTTTGCCCTCTCCTTCTCTCGCTGGACTGTTCACCAACTCTCAGCCCGCGATCCCAATTTCCAGACAACCCATCTTATCAGCTTGGCCACGGCCTCGACCCGAACAGACCGGCGTCCAGCGAGAAGAGCGTCGCCTCGACGCCTCTGCTTGACCGCACCTTGATGCTCAAGACTTATCGCGATGCCAAGAAGCGTCTCATCATGTTCGACTACGA
> Wrong Format
CGAAACGGGCACCTATACAACGATTGAAACCATTATTCAAGCTCAGCAAGCGTCTATGC
TAGCGGTTATTGCGAGCACTTCAGCGGTTGCTACTACGACTACTACTTGATAAATGAAA
CGGCTATAAAAGAGGCTGGGGCAAAAGTATGTTAGTTGAAGGGTGACCTGAACGATGAA
TCGGTCGAATTTTTTATTGGCAGAGGGAAGGTAGGTTTACTCAATTTAGTTACTTCTAG
CCGTTGATTGGAGGAGCGCAAGCGACGAGGAGGCTCATCGGCCGCCCGCGGAAAGCGTA
GTCTTACACGGAAATCAACGGCGGTGTCATAAGCGAG


Also Read : 

  • 5 servers for Gene prediction from plant genome HERE
  • Best Sequence format conversion tools HERE

1 comment:

  1. Thanks for updated Information...
    Good Job....

    ReplyDelete

Have Problem ?? Drop a comments here!