|
Bioinformatics Glossary
Part:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
(Continued from previous part...)
Secondary structure (protein)
The organization of the peptide backbone of a protein that occurs as
a result of hydrogen bonds e.g alpha helix, Beta pleated sheet.
Selectivity
Selectivity of bioinformatics similarity search algorithms is defined
as the significance threshold for reporting database sequence matches.
As an example, for BLAST searches, the parameter E is interpreted as the
upper bound on the expected frequency of chance occurrence of a match within
the context of the entire database search. E may be thought of as
the number of matches one expects to observe by chance alone during the
database search.
Sense strand
The strand of double-stranded DNA that acts as the template strand for
RNA synthesis. Typically only one gene product is produced per gene, reading
from the sense strand only. (Some viruses have open reading frames in both
the sense and the antisense strands).
Sensitivity
Sensitivity of bioinformatics similarity search algorithms centers around
two areas: First, how well can the method detect biologically meaningful
relationships between two related sequences in the presence of mutations
and sequencing errors; Secondly how does the heuristic nature of the algorithm
affect the probability that a matching sequence will not be detected. At
the user's discretion, the speed of most similarity search programs can
be sacrificed in exchange for greater sensitivity - with an emphasis on
detecting lower scoring matches.
Sequence Tagged Site (STS)
A unique sequence from a known chromosomal location that can be amplified
by PCR. STSs act as physical markers for genomic mapping and cloning.
Sexual PCR (Molecular Diversity)
Sexual PCR is a form of PCR in which similar, but not identical, DNA
sequences are reassembled to obtain novel juxtapositions, simulating the
result of genetic recombination. The result is the creation of an array
of related genes which may possess improved characteristics. By repeated
rounds of recombination, selection and PCR-based amplification vastly improved
gene-products, such as enzymes with greater activity, may be generated
and selected.
Shotgun cloning
The cloning of an entire gene segment or genome by generating a random
set of fragments using restriction endonucleases to create a gene library
that can be subsequently mapped and sequenced to reconstruct the entire
genome.
Similarity (homology) search
Given a newly sequenced gene, there are two main approaches to the prediction
of structure and function from the amino acid sequence. Homology methods
are the most powerful and are based on the detection of significant extended
sequence similarity to a protein of known structure, or of a sequence pattern
characteristic of a protein family. Statistical methods are less successful
but more general and are based on the derivation of structural preference
values for single residues, pairs of residues, short oligopeptides or short
sequence patterns. The transfer of structure/function information to a
potentially homologous protein is straightforward when the sequence similarity
is high and extended in length, but the assessment of the structural significance
of sequence similarity can be difficult when sequence similarity is weak
or restricted to a short region.
Signal sequence (leader sequence)
A short sequence added to the amino-terminal end of a polypeptide chain
that forms an amphipathic helix allowing the nascent polypeptide to migrate
through membranes such as the endoplasmic reticulum or the cell membrane.
It is cleaved from the polypeptide after the protein has crossed the membrane.
Single nucleotide polymorphisms (SNPs)
Variations of single base pairs scattered throughout the human genome
that serve as measures of the genetic diversity in humans. About 1 million
SNPs are estimated to be present in the human genome, and SNPs are useful
markers for gene mapping studies.
Single-pass sequencing
Rapid sequencing of large segments of the genome of an organism by isolating
as many expressed (cDNA) sequences as possible and performing single sequencer
runs on their 5’ or 3’ ends. Single-pass sequencing typically results in
individual, error-prone sequencing reads of 400-700 bases, depending on
the type of sequencer used. However, if many of these are generated from
numerous clones from different tissues, they may be overlapped and assembled
to remove the errors and generate a contiguous sequence for the entire
expressed gene.
Site
Sites in sequences can be located either in DNA (e.g. binding sites,
cleavage sites) or in proteins. In order to identify a site in DNA, ambiguity
symbols are used to allow several different symbols at one position. Proteins,
however, need a different mechanism (see Pattern). Restriction enzyme cleavage
sites, for instance, have the following properties: limited length
(typically, less than 20 base pairs); definition of the cleavage site and
its appearance (3', 5' overhang or blunt); definition of the binding site.
Southern blotting
A procedure for the identification of DNA by transmitting a fragment
isolated on an agarose gel to a nitrocellulose filter where it can be hybridized
with a complementary "probe" sequence.
Splice form
By using alternative splicing, a single message precursor from DNA can
generate an entire family of mRNAs and proteins. This can be utilized to
create specificity in cell-cell or cell-ligand interactions. A cell may
produce a given protein, but it will be a different splice-form of the
protein than that produced by an adjacent cell. In this manner, the two
cells have the potential to interact differently with other cells or molecules.
Two places where this has been extremely important is in the production
of cell-surface specificity proteins in the immune and nervous systems.
Splice site
The sequence found at the 5’ and 3’ region of exon/intron boundaries,
usually defined by a consensus sequence:
Intron
5’ CAGGTAAGT---------TNCAGG 3’
A G C T
N represents any nucleotide; the bottom line represents alternative
nucleotides at the indicated positions.
Splicing
The joining together of separate DNA or RNA component parts. For example,
RNA splicing in eukaryotes involves the removal of introns and the stitching
together of the exons from the pre-mRNA transcript before maturation.
Solvent accessibility
The surface area (typically measured in square angstroms) of a biological
molecule, usually a protein, that is exposed to solvent in its native,
folded form. Determining the solvent accessibility of a protein helps define
which amino acids in its molecular sequence are on the exterior of the
molecule, and thus available to participate in interactions with other
molecules.
Structural gene
Gene which encodes a structural protein (cf. Regulatory gene).
Structure prediction
Algorithms that predict the secondary, tertiary and sometimes even quarternary
structure of proteins from their sequences. Determining protein structure
from sequence has been dubbed "the second half of the Genetic Code" since
it is the folded tertiary structure of a protein that governs how it functions
as a gene product. As yet most structure prediction methods are only
partially successful, and typically work best for certain well-defined
classes of proteins.
Substitution matrix
A model of protein evolution at the sequence level resulting in the
development of a set of widely used substitution matrices. These are frequently
called Dayhoff, MDM (Mutation Data Matrix), BLOSUM or PAM (Percent Accepted
Mutation) matrices. They are derived from global alignments of closely
related sequences. Matrices for greater evolutionary distances are
extrapolated from those for lesser ones.
Subtraction library
A cDNA library that only contains cDNAs uniquely expressed in a given
cell or tissue. e.g T cells and B cells will express many common RNAs,
as well as a very small percentage which will be unique for T cells and
B cells respectively. To make a T cell subtraction library, the cDNA from
a T cell library is hybridized with a vast excess of B cell RNA. The commonly
expressed genes will result in RNA-cDNA hybrids which can be removed (or
subtracted) to leave only T cell specific cDNAs.
(Continued on next part...)
Part:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
|