How to Extract Longest Sequence Region Between Stop Codons in Translated DNA Sequences


So you have so many FASTA sequence in a file have translated those multiple nucleotide sequences and now you want to extract the region with the longest gap between two stop codons. I have already shared couple of tool to translate many DNA sequences in a go. So you can translated several DNA sequences easily.
How to translate multiple DNA FASTA sequences HERE

Input

>Seq1
ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV
>seq2
ASKAENM-SRSHFEKLTF-VSVSKFNRMYLRQ-LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-FCIKS-KCVVSKSFREIDVLSFCFQIQTDVSSPIIVRS-NFFQHYEALFTYFDPASKAENM-SRSHFEKLEFLSTL-SPFYIF-FCIKS-KYVVSKSFRN-RFKFLFPNSNGCIFANNCQKLEFLSTL-SPFYIF-LCIKS-KYVVSKSFREIDV

Script 1 : Long1.pl

#!/usr/bin/perl

use strict;
use warnings;

$/ = "\n>";
while (<>) {
    s/>//g;
    my ($id, @seq) = split (/\n/, $_);
    my $seq = join "", @seq;
    my @orfs = split (/\-/, $seq);
    shift @orfs; pop @orfs; 
    my $sel = shift @orfs;
    foreach my $next (@orfs) {
        $sel = $next if ((length $sel) < (length $next))
    }
    print ">$id\n$sel\n";
}

Uses

perl script.pl input.txt result.txt

Result

>Seq1
LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL
>seq2
LSAVRISFNIMKPFLYILILHQKLKICSLEVISRNRRFKFLFPNSNGCIFANNCQKLEFLSTL
If stop codon is depicted as '*' instead of '-' then you can replace '-' in line 11 with '*' and this PERL script work just fine.

No comments:

Post a Comment

Have Problem ?? Drop a comments here!