Biotech > Glossary

Bioinformatics Glossary

Part:   1  2  3  4   5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26 

(Continued from previous part...)

Data Cleaning

A process whereby automated or semi-automated algorithms are used to process experimental data, including noise, experimental errors and other artifacts, in order to generate and store high-quality data for use in subsequent analysis. Data cleaning is typically required in high-throughput sequencing where compression or other experimental artifacts limit the amount of sequence data generated from each sequencing run or "read." 

Data Mining

The ability to query very large databases in order to satisfy a hypothesis ("top-down" data mining); or to interrogate a database in order to generate new hypotheses based on rigorous statistical correlations ("bottom-up" data mining). 

Data Processing

Data processing is defined as the systematic performance of operations upon data such as handling, merging, sorting, and computing. The semantic content of the original data should not be changed, but the semantic content of the processed data may be changed. 

Data Warehouses

Vast arrays of heterogeneous (biological) data, stored within a single logical data repository, that are accessible to different querying and manipulation methods. 


Any file system by which data gets stored following a logical process.  (see also relational database) 


Mathematical procedure to separate out the overlapping effects of molecules such as mixtures of compounds in a high-throughput screen, or mixtures of cDNAs in a high density array. 


A chromosomal alteration in which a portion of the chromosome or the underlying DNA is lost. 

Deletion mapping

Process in which different deletions in a region of DNA are created and used to map the functionally critical areas of that DNA. e.g the minimal region of DNA required for a test promoter can be ascertained by systematic deletions in the region of interest. 

A graphical procedure for representing the output of a hierarchical clustering method.  A dendrogram is strictly defined as a binary tree with a distinguished root, that has all the data items at its leaves.  Conventionally, all the leaves are shown at the same level of the drawing.  The ordering of the leaves is arbitrary, as is their horizontal position. The heights of the internal nodes may be arbitrary, or may be related to the metric information used to form the clustering. 


A composite molecule formed by the binding of two molecules (see homo and heterodimers). 

Disulfide bond

Covalent link formed between the sulfur atoms of two different cysteine residues in a protein. Important in maintaining the folded structure of a protein, and also for linking different proteins in a complex. 

DNA (deoxyribonucleic acid)

The chemical that forms the basis of the genetic material in virtually all organisms. DNA is composed of the four nitrogenous bases Adenine, Cytosine, Guanine, and Thymine, which are covalently bonded to a backbone of deoxyribose-phosphate to form a DNA strand. Two complementary strands (where all Gs pair with Cs and As with Ts) form a double helical structure which is held together by hydrogen bonding between the cognate bases. 

DNA fingerprinting

A technique for identifying human individuals based on a restriction enzyme digest of tandemly repeated DNA sequences that are scattered throughout the human genome, but are unique to each individual. 

DNA microarrays

The deposition of oligonucleotides or cDNAs onto an inert substrate such as glass or silicon. Thousands of molecules may be organized spatially into a high-density matrix. These DNA chips may be probed to allow expression monitoring of many thousands of genes simultaneously. Uses include study of polymorphisms in genes, de novo sequencing or molecular diagnosis of disease. 

DNA polymerase

An enzyme that catalyzes the synthesis of DNA from a DNA template given the deoxyribonucleotide precursors. 

DNA probes

Short single stranded DNA molecules of specific base sequence, labeled either radioactively or immunologically, that are used to detect and identify the complementary base sequence in a gene or genome by hybridizing specifically to that gene or sequence. 

DNA sequencing

The technique in which the specific sequence of bases forming a particular DNA region is deciphered. 

DNase (Deoxyribonuclease)

One of a series of enzymes that can digest DNA. 

Domain (protein)

A region of special biological interest within a single protein sequence. However, a domain may also be defined as a region within the three-dimensional structure of a protein that may encompass regions of several distinct protein sequences that accomplishes a specific function. A domain class is a group of domains that share a common set of well-defined properties or characteristics. 


An agent that affects a biological process. Specifically, a molecule whose molecular structure can be correlated with its pharmacological activity. 

Drug discovery cycle

The cycle of events required to develop a new drug. Typically this involves research, preclinical testing and clinical development, and can take from 5 to 12 years. 

(Continued on next part...)

Part:   1  2  3  4   5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26