How Do I Install and Use BUSCO
|
What is BUSCO
BUSCO stand for Benchmarking Universal Single-Copy Orthologs which can be used to assess the completeness of genome assembly and annotation.
Why BUSCO
I tried several time to use the CEGMA (Core Eukaryotic Genes Mapping Approach) on my Ubuntu14.0 mechine but failed. Then I found the developer of CEGMA has stopped to give any support for it and has suggested to use BUSCO.
Requirements
- Python 3
sudo apt-get install python3run this command in your terminal
sudo apt-get install ncbi-blast+
- HMMER (HMMER 3.1b2)
sudo apt-get install hmmer
- Augustus 3.0.x (genome only)
- EMBOSS tools 6.x.x (transcriptome only)
sudo apt-get install emboss
Installation
- Download latest script of BUSCO from HERE, from Software & User Guide section, and unzip it. It will create a directory 'busco'. This directory shoul have following files : BUSCO_userguide.pdf, LICENSE, release_notes,BUSCO_v1.1.py,README.txt,sample_data
- Download the library of lineage-specific BUSCO data from HERE, from Dataset section, and extract in same directory of script. I downloaded the eukryotes specific file whose name is eukryota
Uses
- Genome assembly assessment
python BUSCO_v1.1b.py -o NAME -in ASSEMBLY -l LINEAGE –m genome
- Gene set assessment:
python BUSCO_v1.1b.py -o NAME -in GENE_SET -l LINEAGE -m OGS
- Gene set assessment:
python BUSCO_v1.1b.py -o NAME -in TRANSCRIPTOME -l LINEAGE -m trans
NAME- name to use for the run and all temporary files ASSEMBLY/GENE_SET/TRANSCRIPTOME - file in fasta format
LINEAGE - path to the lineage to be used (-l eukryota for example)
How to run BUSCO
To test the BUSCO, I downloaded the core eukryotic gene list from CEGMA and choose 9 Arabidopsis genesAt1g73030 At3g60360 At5g11900 At3g56490 At3g25980 At5g23900 At1g06790 At5g49510 At5g10780I run the BUSCO python script like this
python3 BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -fwhich produces following in my terminal
*** Running tBlastN *** Building a new DB, current time: 08/13/2015 09:09:35 New DB name: test New DB title: input Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 9 sequences in 0.000671148 seconds. *** Getting coordinates for candidate transcripts! *** *** Extracting candidate transcripts! *** Translating candidate transcripts ! Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences Translate nucleic acid sequences *** Running HMMER to confirm transcript orthology *** Total complete BUSCOs found in assembly (<2 data-blogger-escaped-3="" data-blogger-escaped-:="" data-blogger-escaped-buscos="" data-blogger-escaped-duplicated="" data-blogger-escaped-partially="" data-blogger-escaped-recovered="" data-blogger-escaped-sigma="" data-blogger-escaped-total="">2 sigma) : 0 Total groups searched: 429 Total BUSCOs not found: 426 Total running time: 5.930385589599609 secondsSince I didn't use the plant specific library that may be the reason that BUSCO identified only 3 genes in default mode
Problems
I notice few problem during installation and use of BUSCO- As you have notice that I have run the BUSCO script as python3 not python. So if run your BUSCO script like this
python BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -f
you may get an error like thisTraceback (most recent call last): File "BUSCO_v1.1.py", line 32, in
because python script is using the older python version.import queue ImportError: No module named queue - After successful run, BUSCO script will produce one directory run_test (name depend upon your '-o' value) and a file 'temp'. If you run BUSCO script without deleting the old file and directory it will give following error
python3 BUSCO_v1.1.py -o test -in input1 -l eukaryota -m trans -f *** Running HMMER to confirm transcript orthology *** Traceback (most recent call last): File "BUSCO_v1.1.py", line 598, in
name=i[:-7];group=transdic[name] KeyError: 'AT1G06790.1'
Related Posts HOW TO,
tool
|
Was This Post Useful? Add This To Del.icio.us Share on Facebook StumbleUpon This Add to Technorati Share on Twitter |
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Have Problem ?? Drop a comments here!