Bioinformatics Glossary

Map unit

A measure of genetic distance between two linked genes that corresponds to a recombination frequency of 1%. 

Markov chain

Any multivariate probability density whose independence diagram is a chain.The variables are ordered, and each variable "depends" only on its neighbors in the sense of being conditionally independent of the others.  Markov chains are an integral component of hidden Markov models. 


A process within the cell nucleus that results in the reduction of the chromosome number from diploid (two copies of each chromosome) to haploid (a single copy) through two reductive divisions in germ cells. 

Melting (of DNA)

The denaturation of double-stranded DNA into two single strands by the application of heat. (Denaturation breaks the hydrogen bonds holding the double-stranded DNA together). 

Messenger RNA (mRNA)

The complementary RNA copy of DNA formed from a single-stranded DNA template during transcription that migrates from the nucleus to the cytoplasm where it is processed into a sequence carrying the information to code for a polypeptide domain. 


The addition of -CH3 (methyl) groups to a target site. Typically such addition occurs on to the cytosine bases of DNA. (see maternal imprinting). 


A 2D array, typically on a glass, filter, or silicon wafer, upon which genes or gene fragments are deposited or synthesized in a predetermined spatial order allowing them to be made available as probes in a high-throughput, parallel manner. 


The miniaturization of chemical reactions or pharmacalogical assays into microscopic tubes or vessels in order to greatly increase their throughput, by placing many of them side-by-side in an array. 


Compounds that mimic the function of other molecules via their high degree of structural (conformational) similarity, and hence physio-chemical properties. 

Missense mutation

A point mutation in which one codon (triplet of bases) is changed into another designating a different amino acid. 


The nuclear division that results in the replication of the genetic material and its redistribution into each of the daughter cells during cell division. 


In bioinformatics, modeling usually refers to molecular modeling, a process whereby the three-dimensional architecture of biological molecules is interpreted (or predicted), visually represented, and manipulated in order to determine their molecular properties. (general) A series of mathematical equations or procedures which simulate a real-life process, given a set of assumptions, boundary parameters, and initial conditions. 


A single unit of any biological molecule or macromolecule, such as an amino acid, nucleic acid, polypeptide domain, or protein. 


Having one binding site; strictly, an atom with only one free electron available for binding in its highest energy shell. 


A conserved element of a protein sequence alignment that usually correlates with a particular function. Motifs are generated from a local multiple protein sequence alignment corresponding to a region whose function or structure is known. It is sufficient that it is conserved, and is hence likely to be predictive of any subsequent occurrence of such a structural/functional region in any other novel protein sequence. 

Multigene family

A set of genes derived by duplication of an ancestral gene, followed by independent mutational events resulting in a series of independent genes either clustered together on a chromosome or dispersed throughout the genome. 

Multiple (sequence) alignment

A Multiple Alignment of k sequences is a rectangular array, consisting of characters taken from the alphabet A, that satisfies the following conditions: There are exactly k rows; ignoring the gap character, row number i is exactly the sequence sI; and each column contains at least one character different from "-". In practice multiple sequence alignments include a cost/weight function, that defines the penalty for the insertion of gaps (the "-" character) and weights identities and conservative substitutions accordingly. Multiple alignment algorithms attempt to create the optimal alignment defined as the one with the lowest cost/weight score. 

Multiplex sequencing

Approach to high-throughput sequencing that uses several pooled DNA samples run through gels simultaneously and then separated and analyzed. 


Any agent that can cause an increase in the rate of mutations in an organism. 


An inheritable alteration to the genome that includes genetic (point or single base) changes, or larger scale alterations such as chromosomal deletions or rearrangements. 

