Biotech > Glossary

Bioinformatics Glossary

Part:   1  2  3  4  5  6   7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26 

(Continued from previous part...)

FASTA format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is: 
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK

A FASTA file can also contain multiple sequences; 
>VECTOR32    Synthetic vector sequence #32
ATGAGCGGCGGCCCCATGGGCGGCAGGCCCGGCGGCAGGGGCGCCCCCGCCGTGCAGCAG
AACATCCCCAGCACCCTGCTGCAGGACCACGAGAACCAGAGGCTGTTCGAGATGCTGGGC
>VECTOR33    Synthetic vector sequence #33
 ACGAGCGGCGGTCCCATGGGCGCCAGGCCCGGCGGCAGGGGCGCTGCCGCCGTGCAGCAC
 ATCATCCCCAGCACCCTGCAGCAGGACCACGAGTACCAGAGGCTGTTCGAGATGCTGGGC
 >VECTOR34    Synthetic vector sequence #34
 GTGAGCGGCGGCTACTTGGGCGGCAGGCCCGGCGGCAGGGGCGCCCACGCCGTGCAGCAG

Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes with these exceptions:  lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Invalid characters (digits, blanks) are  automatically removed.

Fingerprint

A fingerprint is a set of motifs used to predict the occurrence of similar motifs, in either an individual sequence or in a database. Fingerprints are refined by iterative scanning of a composite protein sequence database.  A composite or multiple-motif fingerprint contains a number of aligned motifs taken from different parts of a multiple alignment.  True family members are then easy to identify by virtue of possessing all elements of the fingerprint, while subfamily members may be identified by possessing only part of it. 

Frameshift

A deletion, substitution, or duplication of one or more bases that causes the reading-frame of a structural gene to shift from the normal series of triplets. 

Functional genomics

The use of genomic information to delineate protein structure, function, pathways and networks. Function may be determined by "knocking out" or "knocking in" expressed genes in model organisms such as worm, fruitfly, yeast or mouse. 

Fusion protein

The protein resulting from the genetic joining and expression of 2 different genes (see chimeric) 

(Continued on next part...)

Part:   1  2  3  4  5  6   7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26