How Do I Install and Use BUSCO~ Bioinformatics Made Simple.com

How Do I Install and Use BUSCO

What is BUSCO

BUSCO stand for Benchmarking Universal Single-Copy Orthologs which can be used to assess the completeness of genome assembly and annotation.

Why BUSCO

I tried several time to use the CEGMA (Core Eukaryotic Genes Mapping Approach) on my Ubuntu14.0 mechine but failed. Then I found the developer of CEGMA has stopped to give any support for it and has suggested to use BUSCO.

Requirements

Python 3

run this command in your terminal

sudo apt-get install python3

NCBI BLAST+

run this command in your terminal

sudo apt-get install ncbi-blast+

HMMER (HMMER 3.1b2)

run this command in your terminal

sudo apt-get install hmmer

Augustus 3.0.x (genome only)

Download from HERE and install accordingly

EMBOSS tools 6.x.x (transcriptome only)

run this command in your terminal

sudo apt-get install emboss

Installation

Download latest script of BUSCO from HERE, from Software & User Guide section, and unzip it. It will create a directory 'busco'. This directory shoul have following files : BUSCO_userguide.pdf, LICENSE, release_notes,BUSCO_v1.1.py,README.txt,sample_data

Download the library of lineage-specific BUSCO data from HERE, from Dataset section, and extract in same directory of script. I downloaded the eukryotes specific file whose name is eukryota

Uses

Genome assembly assessment

python BUSCO_v1.1b.py -o NAME -in ASSEMBLY -l LINEAGE –m genome

Gene set assessment:

python BUSCO_v1.1b.py -o NAME -in GENE_SET -l LINEAGE -m OGS

Gene set assessment:

python BUSCO_v1.1b.py -o NAME -in TRANSCRIPTOME -l LINEAGE -m trans

NAME- name to use for the run and all temporary files ASSEMBLY/GENE_SET/TRANSCRIPTOME - file in fasta format
LINEAGE - path to the lineage to be used (-l eukryota for example)

How to run BUSCO

To test the BUSCO, I downloaded the core eukryotic gene list from CEGMA and choose 9 Arabidopsis genes

I run the BUSCO python script like this

python3 BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -f

which produces following in my terminal

*** Running tBlastN ***


Building a new DB, current time: 08/13/2015 09:09:35
New DB name:   test
New DB title:  input
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 9 sequences in 0.000671148 seconds.
*** Getting coordinates for candidate transcripts! ***
*** Extracting candidate transcripts! ***
Translating candidate transcripts !
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
Translate nucleic acid sequences
*** Running HMMER to confirm transcript orthology ***
Total complete BUSCOs found in assembly (<2 data-blogger-escaped-3="" data-blogger-escaped-:="" data-blogger-escaped-buscos="" data-blogger-escaped-duplicated="" data-blogger-escaped-partially="" data-blogger-escaped-recovered="" data-blogger-escaped-sigma="" data-blogger-escaped-total="">2 sigma) :  0
Total groups searched: 429
Total BUSCOs not found:  426
Total running time:   5.930385589599609 seconds

Since I didn't use the plant specific library that may be the reason that BUSCO identified only 3 genes in default mode

Problems

I notice few problem during installation and use of BUSCO

As you have notice that I have run the BUSCO script as python3 not python. So if run your BUSCO script like this
```
python BUSCO_v1.1.py -o test -in input -l eukaryota -m trans -f
```
you may get an error like this
```
Traceback (most recent call last):
  File "BUSCO_v1.1.py", line 32, in 
    import queue
ImportError: No module named queue
```
because python script is using the older python version.

After successful run, BUSCO script will produce one directory run_test (name depend upon your '-o' value) and a file 'temp'. If you run BUSCO script without deleting the old file and directory it will give following error

python3 BUSCO_v1.1.py -o test -in input1 -l eukaryota -m trans -f
*** Running HMMER to confirm transcript orthology ***
Traceback (most recent call last):
  File "BUSCO_v1.1.py", line 598, in 
    name=i[:-7];group=transdic[name]
KeyError: 'AT1G06790.1'

Here AT1G06790.1 is the fasta header from my previous run

Pages