How to Analyse Protein Domain in Given Amino Acid Sequences

What is it?
Protein domain is a part of protein sequence and structure that can evolve, function and can be independently stable and folded.

Why is it important?
Protein domains plays pivotal role in proper functioning of any protein and thus may contribute to different key processes in cells. Some key process may be
·         Bind to other molecules in the cell and help in their function
·         Mediate signals in signal transduction pathways
·         Shuffling of protein domains creates genetically engineering novel proteins

Importance of protein domain prediction
Protein domain prediction can be beneficial to

·         Identify the putative function of amino acid sequence

·         Identify the amino acids in a protein sequence that are putatively involved in functions

·         Evolutionary study


How to detect domain (protein) in given protein?

These are several server to identify conserved domains in given protein sequences. Some of my favorites are listed below
  1. Domain Analysis by CDD database
  2. Domain Analysis by Pfam Database
CD-Search of Conserved Domain Database At NCBI uses RPS-BLAST to scan a set of pre-calculated position-specific scoring matrices (PSSMs) with a protein query. 
The results of CD-Search are presented into two part - first part includes annotation of protein domains on the user query sequence (figure 1), and second part contains description and domain multiple sequence alignments with user queries (figure 2)You can also search for related domain in CDD database (yellow highlighted area)

Figure 1

If you bulk or multiple protein sequence and want to analyse them for presence of conserved protein domain then you can go BATCH conserved domain anlysis page of CDD and can submit bulk protein sequences for domain analysis. In output page you can change the file format (1), result information (2), level of information (3) and go for download. Alternatively you can browse your data also (4).

Pfam database is a 'large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs)'. On Pfam, you can search conserved domain by several way. you can use accession number, keywords, amino acid sequences, DNA sequences  etc. as a query. 
If you have multiple sequences then you can also use Pfam Batch Query interface to predict conserved domain in bulk protein sequences. To predict conserved domain in multiple protein sequences, you have to upload your FASTA amino acid sequence file (1), set the stringency (2) and finlly submit after giving your email ID (3). 
Although there are some other databases LIKE SMART but I always prefer to use CDD or Pfam because of their simple use interface. I may discuss about other database in future. 


No comments:

Post a Comment

Have Problem ?? Drop a comments here!