|
BioPerl FAQ (Frequently Asked Questions)
Part:
1
2
3
4
5
6
7
(Continued from previous part...)
How do I submit a patch or enhancement to BioPerl?
We suggest the following. Post your idea to the appropriate mailing list. If it is a really new idea consider taking us through your
thought process. We'll help you tease out the necessary information such as what methods you'll want and how it can interact with other
BioPerl modules. If it is a port of something you've already worked on, give us a summary of the current methods. Make sure there is an
interface to the module, not just an implementation and make sure there will be a set of tests that will be in the t/ directory
to insure that your module is tested. If you have a suggested patch and/or code enhancement, the SubmitPatch HOWTO gives guidelines on
how to properly submit them via Bugzilla. See also Advanced BioPerl for more information.
Why can't I easily get a list of all the methods a object can call?
This a problem with perl, not only with bioperl. To list all the methods, you have to walk the inheritance tree and standard perl is
not able to do it. As usual, help can be found in the CPAN. Install the CPAN module Class::Inspector and put the following script
perlmethods into your path and run it, e.g, >perlmethods Bio::Seq.
#!/usr/bin/perl -w
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector->methods($class,'full','public')}), "\n";
There is also a project called Deobfuscator developed during the 2005 Bioinformatics course at Cold Spring Harbor Labs. The
Deobfuscator displays available methods for an object type and provide links to the return types of the methods. An older version can
also be found here.
Can you explain the Object Model design and rationale?
There is no simple answer to this question. Simply put, this is a toolkit which has grown organically. The goals and user audience
has evolved. Some decisions have been made and we have been forced to live by them rather than destroy backward compatibility. In
addition there are different philosophies of software development. The major developers on the project have tried to impose a set of
standards on the code so that the project can be coordinated without every commit being cleared by a few key individuals (see Eric S.
Raymond's essay "The Cathedral and the Bazaar" for different styles of running an open source project - we are clearly on the Bazaar
end).
Advanced BioPerl talks more about specific design goals.
The clear consensus of the project developers is that BioPerl should be consistent. This may cause us to pay the price of some
copy-and-paste of code, with the Get/Set accessor methods being a sore spot for some, and the lack of using AUTOLOAD. By being
consistent we hope that someone can grok the gist of a module from the basic documentation, see example code, and get a set of methods
from the API documentation. We aim to make the core object design easy to understand. This has not been realized by any stretch of the
imagination as the toolkit has well over 1000 modules in bioperl-live and bioperl-run alone.
That said we do want to improve things. We want to experiment with newer modules which make Perl more object-oriented. We have
high hopes for some of the promises of Perl6. To try and realize this goal we are encouraging developers to play with new object models
in a bioperl-experimental project.
Some useful discussion on the mailing list can be found at this node
http://bioperl.org/pipermail/bioperl-l/2003-December/014406.html. We encourage you to participate in the discussion and to join in the
development process either on existing BioPerl code or the bioperl-experimental code if you have a particular interest in making the
toolkit more object-oriented.
Sequences
How do I parse a sequence file?
Use the Bio::SeqIO system. This will create Bio::Seq objects for you. For more information see the BioPerl Tutorials, the SeqIO HOWTO,
the Bio::SeqIO Wiki page, or the
Bio::SeqIO POD documentation (or type perldoc Bio::SeqIO).
I can't get sequences with Bio::DB::GenBank any more, why not?
If you are running an old BioPerl version, NCBI changed the web CGI script that provided this access. You must use a modern version
like 1.4.x or 1.5.x.
How can I get NT_ or NM_ or NP_ accessions from NCBI (Reference sequences)?
To retrieve GenBank reference sequences, or RefSeqs, use Bio::DB::RefSeq, not Bio::DB::GenBank or Bio::DB::GenPept when you are
retrieving these accession numbers. This is still an area of active development because the data providers have not provided the best
interface for us to query. EBI has provided a mirror with their dbfetch system which is accessible through the Bio::DB::RefSeq
object however, there are cases where NT_ accession numbers will not be retrievable.
How can I use Bio::SeqIO to parse sequence data to or from a string?
Use this code to parse sequence records from a string:
use IO::String;
use Bio::SeqIO;
my $stringfh = new IO::String($string);
my $seqio = new Bio::SeqIO(-fh => $stringfh,
-format => 'fasta');
while( my $seq = $seqio->next_seq ) {
# process each seq
}
And here is how to write to a string:
use IO::String;
use Bio::SeqIO;
my $s;
my $io = IO::String->new(\$s);
my $seqOut = new Bio::SeqIO(-format =>'swiss', -fh =>$io);
$seqOut->write_seq($seq1);
print $s; # $s contains the record in swissprot format and is stored in the string
How do I use Bio::Index::Fasta and index on different ids?
I'm using Bio::Index::Fasta in order to retrieve sequences from my indexed fasta file but I keep seeing MSG: Did not provide a
valid Bio::PrimarySeqI object when I call fetch followed by write_seq() on a Bio::SeqIO handle. Why?
It's likely that fetch didn't retrieve a Bio::Seq object. There are few possible explanations but the most common
cause is that the id you're passing to fetch is not the key to that sequence in the index. For example, if the FASTA header
is >gi|12366 and your id is 12366 then fetch won't find the sequence, it expects to see
gi|12366. You need to use the get_id method to specify the key used in indexing, like this:
$inx = Bio::Index::Fasta->new(-filename =>$indexname);
$inx = id_parser(\&get_id);
$inx->make_index($fastaname);
sub get_id {
my $header = shift;
$header =~ /^>gi\|(+)/;
$1;
}
The same issue arises when you use Bio::DB::Fasta, but in that case the code might look like this:
$inx = Bio::DB::Fasta->new($fastaname, -makeid => \&get_id);
(Continued on next part...)
Part:
1
2
3
4
5
6
7
|