Bioinformatics FAQ (Frequently Asked Questions) - Practical tips
Part:
1
2
This resource is maintained by and © Damian Counsell, UK Medical Research Council Rosalind Franklin Centre
for Genomic Research (the RFCGR) 1998-2004.
Jump to the table of contents of the whole FAQ.
Practical tips
- How can I find a sequence?
- ...I have a description.
- ...I have an accession number.
- ...I have another sequence.
- ...I'm not sure whether to use the defaults.
- How can I align two sequences?
- How can I predict the function of a gene (product)?
- How can I predict the structure of a sequence?
- How can I write up?
This section includes some simple rules-of-thumb to apply when
performing common bioinformatics tasks. I try to give a reference to
a more detailed source of guidance where I know of one.
How do I find a sequence?
The most common task in bioinformatics must be the acquisition of
some bioinformatics data on which to operate. Usually this in the
form of a nucleic acid or protein sequence, stored as characters in
the appropriate alphabet together with a header of related
information: for example some kind of unique identifying number the
species from which the original biological substrate was obtained,
the names of any authors who published the sequence and so on.
You may have already generated your own sequence data
experimentally. In this case you are likely to want to find
sequences which are identical or similar (and therefore possibly
related) to yours. The task is then one of similarity
search.
...I have
a description.
A paradoxical problem generated by the success of the
bioinformatics revolution is the increasing difficulty of navigating
the huge amount of data available. Once you could print out most of
the existing sequence databases onto paper and cram them into a
single binder. Now a search for "actin" alone will pull out hundreds
and hundreds of sequences. The key to find what you want is to
develop your own discriminatory skills rather than rely on computers
to figure out what it is you're really after.
Use Entrez-PubMed
Make sure you are clear about your aim first. If you are looking
for a sequence for a specific scientific purpose then you might be
best to start with a relevant human-generated publication. For
example, you have cloned a gene which is part of a
well-characterised biochemical pathway and you want to find other
sequences of the same functional gene product in other species
(orthologues) Entrez
PubMed is your friend.
PubMed is a huge and very comprehensive database of the
biomedical scientific literature., created by the U.S. National Library of
Medicine (NLM). Entrez PubMed is another indispensable resource
of the U.S. National Centre
for Biotechnology Information (NCBI). Both are part of the U.S. Department of Health and Human
Services National Institutes of Health
Use Swiss-Prot
Swiss-Prot is curated by human beings.
Use SRS at the RFCGR
[XXXX INSERT DETAILED ADVICE HERE]
Use Boolean logic
[XXXX INSERT DETAILED ADVICE HERE]
Use cunning
[XXXX INSERT DETAILED ADVICE HERE]
...I have an
accession number.
[XXXX INSERT DETAILED SEQUENCE ADVICE HERE]
...I have
another sequence.
This section will be expanded---and there will be a more basic
and detailed explanation for novice searchers, but, in the meantime,
here are the top tips cribbed from the excellent paper
by Hugh B. Nicholas Jr., David W Deerfield II and Alexander J.
Ropelewski in BioTechniques.
(Continued on next part...)
Part:
1
2
|