How to Extract Longest Sequence Region Between Stop Codons in Translated DNA Sequences
|
So you have so many FASTA sequence in a file have translated those multiple nucleotide sequences and now you want to extract the region with the longest gap between two stop codons. I have already shared couple of tool to translate many DNA sequences in a go. So you can translated several DNA sequences easily.
How to translate multiple DNA FASTA sequences HERE
Input
>Seq1 ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV >seq2 ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV
Script 1 : Long1.pl
#!/usr/bin/perl use strict; use warnings; $/ = "\n>"; while (<>) { s/>//g; my ($id, @seq) = split (/\n/, $_); my $seq = join "", @seq; my @orfs = split (/\-/, $seq); shift @orfs; pop @orfs; my $sel = shift @orfs; foreach my $next (@orfs) { $sel = $next if ((length $sel) < (length $next)) } print ">$id\n$sel\n"; }
Uses
perl script.pl input.txt result.txt
Result
>Seq1 LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL >seq2 LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL
If stop codon is depicted as '*' instead of '-' then you can replace '-' in line 11 with '*' and this PERL script work just fine.
Related Posts HOW TO,
Perl Script,
Sequence analysis
|
Was This Post Useful? Add This To Del.icio.us Share on Facebook StumbleUpon This Add to Technorati Share on Twitter |
Labels:
HOW TO,
Perl Script,
Sequence analysis
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Have Problem ?? Drop a comments here!