10.1.7 Primary sequence databases for protein

10.1.7 Primary sequence databases for protein

Protein Databases

Protein databases are the most comprehensive source of information on protein. It is necessary to distinguish between universal databases covering protein from all species and specialized data collections storing information about specific families or protein or a group or an organism.

Types

Ø Primary Protein Databases like UnitProt/Swiss-Prot.

Ø Secondary Protein Databases like Interpro.

Ø Specialised Protein Databases like GOA, ENZYME.

Ø Structure Databases like PDB.

UniProt:

  • It stands for Universal Protein resources.
  • It is a central repository or protein sequences and function, created by joining the information contained in UniProt/SwissProt.

PDB:

  • PDB stands for protein Data Bank.
  • The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease.

PDB viewer:

  • It provides an introduction to macromolecular modeling with Deep View, including review of many basic concepts in protein structure.
  • Structure files are used for viewing, and then carry out exercises in manipulating, analyzing, and comparing protein structures.

SWISS PDB VIEWER:

  • Deep View (formerly called Swiss-Pdb Viewer) is a friendly but powerful molecular graphics program.
  • It is designed for use with computing tools available from the Expert Protein Analysis System, or ExPASy Molecular Biology Server in Geneva, Switzerland.
  • It allows us to build models, by giving an amino-acid sequence.
  • It can find hydrogen bonds within proteins and between proteins and ligends.
  • It allows us to view several proteins simultaneously and superimpose them to compare their structures and sequences.
  • It computes electrostatic potentials and molecular surfaces, and carries out energy minimization.

Getting started:- website WWW.rcsb.org

Enter the name of protein name or PDB ID (contng. Four characters) into the search box. Then download Pdb file.

RASMOL

RasMol is a computer program written for molecular graphics visualization intended and used primarily for the depiction and exploration of biological macromolecule structures, such as those found in the Protein Data Bank. It is available for Windows, Macintosh and UNIX platforms.

Cn3D

  • Cn3D is a helper application that allows us to view 3-dimensional structures from NCBI’s entrez retrieval service.
  • Cn3D runs on Windows, Macintosh, and Unix.
  • Cn3D is a visualization tool for biomolecular structures, sequences, and sequence alignments and has powerful annotation and alignment editing features.
  • Cn3D displays structure-structure alignments along with their structure-based sequence alignments, to emphasize what regions of a group of related proteins are most conserved in structure and sequence.

While working on it two windows appear: the main Cn3D structure window where the protein is displayed, and a sequence window that shows the chain’s amino acid sequence.

When a single structure is a loaded into Cn3D, the sequence viewer shows the sequences of all protein and nucleic acid chains in the structure. The color of each residue is coordinated between the structure and sequence windows: each letter of the sequence represents a residue in the structure, and always adopts the color of the backbone’s alpha carbon (or phosphorus, for nucleotides), even if side chains are colored differently from backbone in the structure window Cn3D’s sequence window also functions as an alignment viewer when displaying more than one structure or a structure to which multiple sequences have been aligned.

PIR

The Protein Information Resource (PIR), located at Georgetown University Medical Center (GUMC) is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies.

PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers in the identification and interpretation of protein sequence information.

MIPS

The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck- Institution for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database.

SWISS PROT

Swiss-Prot is a protein sequence database created in 1986 by Amos Bairoch and developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute.

It is a manually curated biological database of protein sequences. Swiss-Prot was  Swiss-Prot strives to provide reliable protein sequences associated with a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. In 1996, a computer annotated supplement to SWISS-PROT was created, termed TrEMBL.

TrEMBL

It was created in 1996 as a computer annotated supplement to SWISS PROT. The database helps the SWISS PROT format and contains translations of all coding sequences (CDS) in EMBL. It has two main sections:

i) SP-TrEMBL : (SWISS PROT -TrEMBL): It contains the entries that eventually be incorporated into SWISS PROT; that have not yet been manually annotated.

ii) REM-TrEMBL : it contains sequences that are not destined to be included in SWISS PROT. These include: immunoglobulins, T-cell receptors, fragments of fewer than eight amino acids, synthetic sequences, patented sequences, codon translations.

Last modified: Tuesday, 8 November 2011, 5:29 AM