Bioinformatics FAQ (Frequently Asked Questions)

BioTech FYI Center - Resources

Bioinformatics FAQ (Frequently Asked Questions) - Practical tips

Part: 1 2

This resource is maintained by and © Damian Counsell, UK Medical Research Council Rosalind Franklin Centre for Genomic Research (the RFCGR) 1998-2004.

Jump to the table of contents of the whole FAQ.

Practical tips

How can I find a sequence?
- ...I have a description.
- ...I have an accession number.
- ...I have another sequence.
- ...I'm not sure whether to use the defaults.
How can I align two sequences?
How can I predict the function of a gene (product)?
How can I predict the structure of a sequence?
How can I write up?

This section includes some simple rules-of-thumb to apply when performing common bioinformatics tasks. I try to give a reference to a more detailed source of guidance where I know of one.

How do I find a sequence?

The most common task in bioinformatics must be the acquisition of some bioinformatics data on which to operate. Usually this in the form of a nucleic acid or protein sequence, stored as characters in the appropriate alphabet together with a header of related information: for example some kind of unique identifying number the species from which the original biological substrate was obtained, the names of any authors who published the sequence and so on.

You may have already generated your own sequence data experimentally. In this case you are likely to want to find sequences which are identical or similar (and therefore possibly related) to yours. The task is then one of similarity search.

...I have a description.

A paradoxical problem generated by the success of the bioinformatics revolution is the increasing difficulty of navigating the huge amount of data available. Once you could print out most of the existing sequence databases onto paper and cram them into a single binder. Now a search for "actin" alone will pull out hundreds and hundreds of sequences. The key to find what you want is to develop your own discriminatory skills rather than rely on computers to figure out what it is you're really after.

Use Entrez-PubMed

Make sure you are clear about your aim first. If you are looking for a sequence for a specific scientific purpose then you might be best to start with a relevant human-generated publication. For example, you have cloned a gene which is part of a well-characterised biochemical pathway and you want to find other sequences of the same functional gene product in other species (orthologues) Entrez PubMed is your friend.

PubMed is a huge and very comprehensive database of the biomedical scientific literature., created by the U.S. National Library of Medicine (NLM). Entrez PubMed is another indispensable resource of the U.S. National Centre for Biotechnology Information (NCBI). Both are part of the U.S. Department of Health and Human Services National Institutes of Health

Use Swiss-Prot

Swiss-Prot is curated by human beings.

Use SRS at the RFCGR

[XXXX INSERT DETAILED ADVICE HERE]

Use Boolean logic

[XXXX INSERT DETAILED ADVICE HERE]

Use cunning

[XXXX INSERT DETAILED ADVICE HERE]

...I have an accession number.

[XXXX INSERT DETAILED SEQUENCE ADVICE HERE]

...I have another sequence.

This section will be expanded---and there will be a more basic and detailed explanation for novice searchers, but, in the meantime, here are the top tips cribbed from the excellent paper by Hugh B. Nicholas Jr., David W Deerfield II and Alexander J. Ropelewski in BioTechniques.

(Continued on next part...)

Part: 1 2

Bioinformatics FAQ (Frequently Asked Questions) - Practical tips