Biotech > FAQ > BioPerl FAQ (Frequently Asked Questions)

Accession numbers are not present for FASTA sequ...

To see other biotech frequently asked questions, please visit http://biotech.fyicenter.com/faq/

(Continued from previous question...)

Accession numbers are not present for FASTA sequence files

If you parse a FASTA sequence format file with Bio::SeqIO the sequences won't have the accession number. What to do?

All the data is in the $seq->display_id it just needs to be parsed out. Here is some code to set the accession number.

my ($gi,$acc,$locus);
(undef,$gi,undef,$acc,$locus) = split(/\|/,$seq->display_id);
$seq->accession_number($acc);

Why don't we just go ahead and do this? For one, we don't make any assumptions about the format of the ID part of the sequence. Perhaps the parser code could try and detect if it is a GenBank formatted ID and go ahead and set the accession number field. It would be trivial to do, just no one has volunteered the time - put it on the Project priority list if you think it is important and better yet, volunteer the code patch!

Also see http://bioperl.org/pipermail/bioperl-l/2005-August/019579.html

(Continued on next question...)

Other Frequently Asked Questions