Bioinformatics FAQ (Frequently Asked Questions) - Glossary of bioinformatics terms
This resource is maintained by and © Damian Counsell, UK Medical Research Council Rosalind Franklin Centre
for Genomic Research (the RFCGR) 1998-2004.
Jump to the table of contents of the whole FAQ.
Glossary of bioinformatics
- What is an alignment?
- What is a DNA array?
- What is a homologue?
- What is an ontology?
- What is a scoring matrix?
Here I attempt to define some common terms in bioinformatics. I
have tried to balance clarity, brevity and rigour. Let me know if I
let one of these priorities over-ride the others.
What is an
When two symbolic representations of DNA or protein sequences are
arranged next to one another so that their most similar elements are
juxtaposed they are said to be aligned. Many
bioinformatics tasks depend upon successful alignments. Alignments
are conventionally shown as a traces.
In a symbolic sequence each base or residue monomer in each
sequence is represented by a letter. The convention is to print the
single-letter codes for the constituent monomers in order in a fixed
font (from the N-most to C-most end of the protein sequence in
question or from 5' to 3' of a nucleic acid molecule). This is based
on the assumption that the combined monomers evenly spaced along the
single dimension of the molecule's primary structure. From now on I
shall refer to an alignment of two protein sequences.
Every element in a trace is either a match or a
gap. Where a residue in one of two aligned
sequences is identical to its counterpart in the other the
corresponding amino-acid letter codes in the two sequences are
vertically aligned in the trace: a match. When a residue in one
sequence seems to have been deleted since the assumed divergence of
the sequence from its counterpart, its "absence" is labelled by a
dash in the derived sequence. When a residue appears to have been
inserted to produce a longer sequence a dash appears opposite in the
unaugmented sequence. Since these dashes represent "gaps" in one or
other sequence, the action of inserting such spacers is known as
A deletion in one sequence is symmetric with an insertion in the
other. When one sequence is gapped relative to another a deletion in
sequence a can be seen as an insertion in sequence b.
Indeed, the two types of mutation are referred to together as
indels. If we imagine that at some point one of the
sequences was identical to its primitive homologue, then a trace can
represent the three ways divergence could occur (at that point).
Biological interpretation of an alignment
A trace can represent a substitution:
A trace can represent a deletion:
A trace can represent a insertion:
For obvious reasons I do not represent a silent mutation.
Traces may represent recent genetic changes which obscure older
changes. Here I have only represented point mutations for
simplicity. Actual mutations often insert or delete several
What is a DNA
Thanks to Bioinformatics.Org member Ravi Jain for the following
answer, which I present verbatim.
DNA microarrays consist of thousands of immobilized DNA sequences
present on a miniaturized surface the size of a business card or
less. Arrays are used to analyze a sample for the presence of gene
variations or mutations (genotyping), or for patterns of gene
expression, performing the equivalent of ca. 5 000 to 10 000
individual "test tube" experiments in approximately two days of
(Continued on next part...)