BioTech FYI Center - Resources

BioPerl FAQ (Frequently Asked Questions)

Part:   1  2  3  4  5  6  7 

(Continued from previous part...)

How do I submit a patch or enhancement to BioPerl?

We suggest the following. Post your idea to the appropriate mailing list. If it is a really new idea consider taking us through your thought process. We'll help you tease out the necessary information such as what methods you'll want and how it can interact with other BioPerl modules. If it is a port of something you've already worked on, give us a summary of the current methods. Make sure there is an interface to the module, not just an implementation and make sure there will be a set of tests that will be in the t/ directory to insure that your module is tested. If you have a suggested patch and/or code enhancement, the SubmitPatch HOWTO gives guidelines on how to properly submit them via Bugzilla. See also Advanced BioPerl for more information.

Why can't I easily get a list of all the methods a object can call?

This a problem with perl, not only with bioperl. To list all the methods, you have to walk the inheritance tree and standard perl is not able to do it. As usual, help can be found in the CPAN. Install the CPAN module Class::Inspector and put the following script perlmethods into your path and run it, e.g, >perlmethods Bio::Seq.

  #!/usr/bin/perl -w
  use Class::Inspector;
  $class = shift || die "Usage: methods perl_class_name\n";
  eval "require $class";
  print join ("\n", sort @{Class::Inspector->methods($class,'full','public')}), "\n";

There is also a project called Deobfuscator developed during the 2005 Bioinformatics course at Cold Spring Harbor Labs. The Deobfuscator displays available methods for an object type and provide links to the return types of the methods. An older version can also be found here.

Can you explain the Object Model design and rationale?

There is no simple answer to this question. Simply put, this is a toolkit which has grown organically. The goals and user audience has evolved. Some decisions have been made and we have been forced to live by them rather than destroy backward compatibility. In addition there are different philosophies of software development. The major developers on the project have tried to impose a set of standards on the code so that the project can be coordinated without every commit being cleared by a few key individuals (see Eric S. Raymond's essay "The Cathedral and the Bazaar" for different styles of running an open source project - we are clearly on the Bazaar end). Advanced BioPerl talks more about specific design goals.

The clear consensus of the project developers is that BioPerl should be consistent. This may cause us to pay the price of some copy-and-paste of code, with the Get/Set accessor methods being a sore spot for some, and the lack of using AUTOLOAD. By being consistent we hope that someone can grok the gist of a module from the basic documentation, see example code, and get a set of methods from the API documentation. We aim to make the core object design easy to understand. This has not been realized by any stretch of the imagination as the toolkit has well over 1000 modules in bioperl-live and bioperl-run alone.

That said we do want to improve things. We want to experiment with newer modules which make Perl more object-oriented. We have high hopes for some of the promises of Perl6. To try and realize this goal we are encouraging developers to play with new object models in a bioperl-experimental project.

Some useful discussion on the mailing list can be found at this node http://bioperl.org/pipermail/bioperl-l/2003-December/014406.html. We encourage you to participate in the discussion and to join in the development process either on existing BioPerl code or the bioperl-experimental code if you have a particular interest in making the toolkit more object-oriented.

Sequences

How do I parse a sequence file?

Use the Bio::SeqIO system. This will create Bio::Seq objects for you. For more information see the BioPerl Tutorials, the SeqIO HOWTO, the Bio::SeqIO Wiki page, or the Bio::SeqIO POD documentation (or type perldoc Bio::SeqIO).

I can't get sequences with Bio::DB::GenBank any more, why not?

If you are running an old BioPerl version, NCBI changed the web CGI script that provided this access. You must use a modern version like 1.4.x or 1.5.x.

How can I get NT_ or NM_ or NP_ accessions from NCBI (Reference sequences)?

To retrieve GenBank reference sequences, or RefSeqs, use Bio::DB::RefSeq, not Bio::DB::GenBank or Bio::DB::GenPept when you are retrieving these accession numbers. This is still an area of active development because the data providers have not provided the best interface for us to query. EBI has provided a mirror with their dbfetch system which is accessible through the Bio::DB::RefSeq object however, there are cases where NT_ accession numbers will not be retrievable.

How can I use Bio::SeqIO to parse sequence data to or from a string?

Use this code to parse sequence records from a string:

use IO::String;
use Bio::SeqIO;
my $stringfh = new IO::String($string);
my $seqio = new Bio::SeqIO(-fh => $stringfh,
                           -format => 'fasta');
while( my $seq = $seqio->next_seq ) {
 # process each seq
}

And here is how to write to a string:

use IO::String;
use Bio::SeqIO;
my $s;
my $io = IO::String->new(\$s);
my $seqOut = new Bio::SeqIO(-format =>'swiss', -fh =>$io);
$seqOut->write_seq($seq1);
print $s; # $s contains the record in swissprot format and is stored in the string
How do I use Bio::Index::Fasta and index on different ids?

I'm using Bio::Index::Fasta in order to retrieve sequences from my indexed fasta file but I keep seeing MSG: Did not provide a valid Bio::PrimarySeqI object when I call fetch followed by write_seq() on a Bio::SeqIO handle. Why?

It's likely that fetch didn't retrieve a Bio::Seq object. There are few possible explanations but the most common cause is that the id you're passing to fetch is not the key to that sequence in the index. For example, if the FASTA header is >gi|12366 and your id is 12366 then fetch won't find the sequence, it expects to see gi|12366. You need to use the get_id method to specify the key used in indexing, like this:

$inx = Bio::Index::Fasta->new(-filename =>$indexname);
$inx = id_parser(\&get_id);
$inx->make_index($fastaname);

sub get_id {
  my $header = shift;
  $header =~ /^>gi\|(+)/;
  $1;
}

The same issue arises when you use Bio::DB::Fasta, but in that case the code might look like this:

$inx = Bio::DB::Fasta->new($fastaname, -makeid => \&get_id);

(Continued on next part...)

Part:   1  2  3  4  5  6  7 

BioPerl FAQ (Frequently Asked Questions)