KEGG Sequence Downloader : retrieve gene sequences in Fasta format from KEGG database

I wanted to download the gene sequence of tobacco from NCBI. Since NCBI also contains the isoform and some other unwanted genes, therefore I choose to get it from KEGG. Although KEGGREST is a wonderful R package to retrieve the data from KEGG, but it limits the retrieval. The following bash script can help to download the thousands of sequences in a single go without any limitation. Although this is a crude solution and there must be an efficient way to do it but it worked for me. Basically, this bash script works in three steps:
  • Split IDs in a given chunk 
  • Download fasta sequences as HTML file 
  •  Clean HTML file and save the result

Uses

bash KEGG_sequence_downloader.sh query_file number_of_sequence
How to download only viridiplantae miRNA from miRBase HERE

Script


Script name Download
KEGG_sequence_downloader.sh

No comments:

Post a Comment

Have Problem ?? Drop a comments here!