How to Extract Multiple Sequence from Multi Fasta File with PERL - II
|
Previously I have shared a PERL script to extract multiple sequence from multi FASTA file with PERL. So if you have accession numbers stored in a file
and sequence in another file then you can fetch the sequence with the help of that PERL script. But here situaion is different. Here we have FASTA sequences (sequence.txt) in a file
and accession numbers/IDs (ID.txt) in different file but the IDs are given in different row and we want to extract the FASTA sequences according the IDs grouped in different
row and store in to different files (out_1, out_2, out_3).
SCRIPT 1 : extract-seq.PL
#!/usr/bin/perl use strict; use warnings; my ( %list, %FHs, $id ); while (<>) { $list{$_} = "out_$." for split; last if eof; } local $/ = '>'; while (<>) { chomp; if ( ($id) = /(.+)/ and exists $list{$id} ) { open $FHs{ $list{$id} }, '>', $list{$id} or die $! unless defined $FHs{ $list{$id} }; print { $FHs{ $list{$id} } } ">$_"; } }
Uses
perl extract-seq.pl id.txt sequence.txt
Input
Sequences
>Seq1 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGC CAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAAC ACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCC AGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGC ATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTG AAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCA AGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCT TCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGG GGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq2 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq3 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq4 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq5 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATAT >Seq6 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT
IDs
Seq1 Seq2 Seq3 Seq4 Seq5 Seq6
Convert Multi Fasta file into a Single line FASTA File HERE
How to add specific word to fasta header HERE
How to add specific word to fasta header HERE
Results
out_1
Seq1 Seq2 Seq3 >Seq1 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGC CAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAAC ACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCC AGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGC ATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTG AAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCA AGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCT TCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGG GGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq2 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq3 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT
out_2
Seq1 Seq2 Seq3 >Seq4 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq5 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATAT
out_3
Seq1 Seq2 Seq3 >Seq4 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATATT >Seq5 TGCTCCCGGCCGCACTGGCGGCCGCGGGAATTCGATTCGACAAGGCGTTGGTGCTGCCCACAAAGGCCAGTTCGATATCGCGCTCGCGGGTTTGCAACTGCAACAGGCTCTGGTCATAGCCTTTGGGCACGAACACCGCATCAAAGCCTTCTTCGCGCAGGCGTTCGCTGACCATGAAGCCCGAACAGATCACCCGCGCCCAGGGCAGCTTGCGATAGTGCGCGCTGAACTTGCCGGTGTACTTGCAGGGGATGTAGTTCTGGTAGGCATCGTGTTCAAGGATGACCAGATTGGGAATCGTGCGGATGAACCCGACCTGACGGACTTCCTGCTTGAAGCGCAGAAAAAACACGATCCGGTCATAACGCTCGACATCCACTTCACGGCGGAAATAGCCACGCAAGGTTGCGCTGCTCATCGGAGCTCCAGCCCAACCGCACCTCGCACTCACAATACGCGGCGATGCCCTTCATAAAGACGGTCGAGAATGGCCCGCTGCTCTTTCTTGGACCCCAGAAGTANAAACCTTTTCATGGGGTNTTCCCTTGCCAGTTACCTGCGCCCCTGCCTGAAATCACGATAT
Advantages
Related Posts HOW TO,
Perl Script,
Sequence analysis
|
Was This Post Useful? Add This To Del.icio.us Share on Facebook StumbleUpon This Add to Technorati Share on Twitter |
Labels:
HOW TO,
Perl Script,
Sequence analysis
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Have Problem ?? Drop a comments here!