|
Bioinformatics Glossary
Part:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
(Continued from previous part...)
FASTA format
A sequence in FASTA format begins with a single-line description, followed
by lines of sequence data. The description line is distinguished from the
sequence data by a greater-than (">") symbol in the first column. It is
recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK
A FASTA file can also contain multiple sequences;
>VECTOR32 Synthetic vector sequence #32
ATGAGCGGCGGCCCCATGGGCGGCAGGCCCGGCGGCAGGGGCGCCCCCGCCGTGCAGCAG
AACATCCCCAGCACCCTGCTGCAGGACCACGAGAACCAGAGGCTGTTCGAGATGCTGGGC
>VECTOR33 Synthetic vector sequence #33
ACGAGCGGCGGTCCCATGGGCGCCAGGCCCGGCGGCAGGGGCGCTGCCGCCGTGCAGCAC
ATCATCCCCAGCACCCTGCAGCAGGACCACGAGTACCAGAGGCTGTTCGAGATGCTGGGC
>VECTOR34 Synthetic vector sequence #34
GTGAGCGGCGGCTACTTGGGCGGCAGGCCCGGCGGCAGGGGCGCCCACGCCGTGCAGCAG
Sequences are expected to be represented in the standard IUB/IUPAC amino
acid and nucleic acid codes with these exceptions: lower-case letters
are accepted and are mapped into upper-case; a single hyphen or dash can
be used to represent a gap of indeterminate length; and in amino acid sequences,
U and * are acceptable letters (see below). Invalid characters (digits,
blanks) are automatically removed.
Fingerprint
A fingerprint is a set of motifs used to predict the occurrence of similar
motifs, in either an individual sequence or in a database. Fingerprints
are refined by iterative scanning of a composite protein sequence database.
A composite or multiple-motif fingerprint contains a number of aligned
motifs taken from different parts of a multiple alignment. True family
members are then easy to identify by virtue of possessing all elements
of the fingerprint, while subfamily members may be identified by possessing
only part of it.
Frameshift
A deletion, substitution, or duplication of one or more bases that causes
the reading-frame of a structural gene to shift from the normal series
of triplets.
Functional genomics
The use of genomic information to delineate protein structure, function,
pathways and networks. Function may be determined by "knocking out" or
"knocking in" expressed genes in model organisms such as worm, fruitfly,
yeast or mouse.
Fusion protein
The protein resulting from the genetic joining and expression of 2 different
genes (see chimeric)
(Continued on next part...)
Part:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
|