How to rename fasta headers according to a matching name list

FaBox has several utilities to manipulate the FASTA sequence. I wanted to replace the FASTA header with the new header or description which are saved in a file. Although I can do it with FaBox, but it handles difficult when the number of sequences is huge. This PERL script will rename the fasta sequence as per store in another file.

Header

Header and new FASTA header should be separated by TAB
M54089d protein1
M54089c protein2
M54089b protein3
M54089a protein4

Sequence 

FASTA should be in one line
Convert Multi line Fasta file into a Single line FASTA File HERE
>M54089d
MEQCRQGSRQNGSVTSGKGLALRAGHGGPSPEPVGCRWTARAAPAARAGRRVPAGGRTGNGSFGGLPRASHSQLRTGTDKGNPTV
>M54089c
MINFDHLFACLHGHYGEVENKLKCILHYFGRICSSMPLGYVSFERKVLSLECTPSCIPYPKEKAWSQSNISLCPIEITISGLIEDQSREAIEVDFANMYLGGGALVRGCVQQEEIRFMINPELIAGMLFLPCMADNEAVEIVGTERFSSYTGRLTKHFVASWINSSVISINSFSKMMASWDFNMIKMLKTPVEGPLLIFCRLVILQLHLKKLRKHRKTS
>M54089b
MIGRADIEGSKSNVAMNAWLHKPVIPVVTFLTPLASNSEGLKIVRPRFHGSYSYWKSESNELLPSVPHEISVRVELILGHLRYLLTDVPPQPNSPPDNVFRRIGLQASLGSKKRGSAPLPLHGISKITLEVVVFHFRLSAPTYTTPLKSFTKSD
>M54089a
MNGLTRFHCPCLLSSETTAKGTGLAESAGKEDPVELDSSRLCEMT

Script 

This PERL script will ask for header list and FASTA sequences (file format given above) and save the FASTA file with new header in result.fasta
If you are working with unix based system, then this AWK one-liner will be very useful
awk 'FNR==NR{  a[">"$1]=$2;next}$1 in a{  sub(/>/,">"a[$1]"|",$1)}1' header_list.txt sequence.fasta

No comments:

Post a Comment

Have Problem ?? Drop a comments here!