10.1.6 Tools for Sequence Analysis

10.1.6 Tools for Sequence Analysis

1. BLAST (Basic Local Alignment Search Tool)

It performs “local” alignment. It finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.

Types of BLAST

I. Blastn: searches a nucleotide database using a nucleotide query.

II. Blastp: searches a protein database using a protein query.

III. Blastx: searches a protein database using a translated nucleotide query.

IV. Tblastx: searches a translated nucleotide database using a translated nucleotide query.

2. CLUSTALW

Clustalw is a multiple sequence alignment program for proteins or nucleotides which is available at <ebi.ac.uk/clustalw>.

Multiple sequence alignment means an extension of pairwise alignment to incorporate more than two sequences at a time.

It is often used in identifying conserved sequence regions across a group of sequences hypothesized to be evolutionarily related.

It calculates the best match for the selected sequences so that differences, similarities and identities can be seen.

Evolutionary relationship can be seen using cladograms and phylograms.

Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change.

A Cladogram is a branching diagram (tree) assumed to be of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary “time” separating taxa.

It can align either nucleotide or protein sequences. In the case of nucleotide sequences, it will align them as they are input – the program does not provide the option of specifying DNA strands.

The program accepts sequences in the formats like:- NBRF/PIR,EMBL/UniProt, Pearson (Fasta), GDE, ALN/ClustalW. The sequences can either be pasted into the web form or uploaded to the web form in a file.

3. FASTA

Fasta is a Protein similarity search.

It provides sequence similarity searching of query sequence against nucleotide & protein databases using fasta programs.

It can be used for fast protein comparison or fast nucleotide comparison. This format contains a one line header followed by lines of sequence data.

The sequence in fasta formatted files are preceded by a line starting with a “>” symbol. The first word on this line is the name of the sequence. The rest of the line is a description of the sequence. The remaining lines contain the sequence itself. Fasta files containing multiple sequences are just the same, with one sequence listed right after another. This format is accepted for many multiple sequence alignment programs.

Last modified: Tuesday, 8 November 2011, 5:28 AM