|
BioPerl FAQ (Frequently Asked Questions)
Part:
1
2
3
4
5
6
7
(Continued from previous part...)
Cannot get an accession from GenBank when I know it is there
I'm using Bio::DB::GenBank to query GenBank and I'm certain that the id is there but I'm seeing the error MSG: acc does not
exist. This bug in versions 1.2 and 1.2.1, but it is fixed in 1.2.2. Either upgrade to 1.2.2 or higher, or edit the module
Bio::DB::GenBank and change protein to nucleotide in the BEGIN block.
Also see http://bioperl.org/pipermail/bioperl-l/2004-February/014958.html
Accession numbers are not present for FASTA sequence files
If you parse a FASTA sequence format file with Bio::SeqIO the sequences won't have the accession number. What to do?
All the data is in the $seq->display_id it just needs to be parsed out. Here is some code to set the accession
number.
my ($gi,$acc,$locus);
(undef,$gi,undef,$acc,$locus) = split(/\|/,$seq->display_id);
$seq->accession_number($acc);
Why don't we just go ahead and do this? For one, we don't make any assumptions about the format of the ID part of the sequence.
Perhaps the parser code could try and detect if it is a GenBank formatted ID and go ahead and set the accession number field. It would
be trivial to do, just no one has volunteered the time - put it on the Project priority list if you think it is important and better yet,
volunteer the code patch!
Also see http://bioperl.org/pipermail/bioperl-l/2005-August/019579.html
How do I get genomic sequences when all I have is an gene identifier or name?
This question has a few different answers, it deserves its own page.
I would like to make my own custom fasta header - how do I do this?
You want to use the method preferred_id_type(). Here's some example code:
use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-file => $file,
-format => 'genbank');
my $seqout = Bio::SeqIO->new(-fh => \*STDOUT,
-format => 'fasta');
# From Bio::SeqIO::fasta
$seqout->preferred_id_type('display');
my $count = 1;
while (my $seq = $seqin->next_seq) {
# override the regular display_id with your own
$seq->display_id('foo'.$count);
$seqout->write_seq($seq);
$count++;
}
You can pass one of the following values to preferred_id_type: "accession", "accession.version", "display", "primary".
The description line is automatically appended to the preferred id type but this can also be set, like so:
$seq->desc($some_string);
Report Parsing
I want to parse BLAST, how do I do this?
As of version 1.1, BioPerl only supports one approach - the Bio::SearchIO interface. There are other BLAST parsing modules in the
package, but they remain just to support older legacy code. Bio::SearchIO supports:
- BLAST
- MegaBLAST (PSL)
- PSIBLAST
- HMMER
- WABA
- BLASTZ (AXT)
- exonerate
- SIM4
- Wise tools
- FASTA reports
It is strongly recommended you read the HOWTO:SearchIO for more information.
(Continued on next part...)
Part:
1
2
3
4
5
6
7
|