|
BioPerl FAQ (Frequently Asked Questions)
Part:
1
2
3
4
5
6
7
(Continued from previous part...)
How do I retrieve a nucleotide coding sequence when I have a protein gi number?
You could go through the protein's feature table and find the coded_by value. The trick is to associate the coded_by
nucleotide coordinates to the nucleotide entry, which you'll retrieve using the accession number from the same feature.
use Bio::Factory::FTLocationFactory;
use Bio::DB::GenPept;
use Bio::DB::GenBank;
my $gp = Bio::DB::GenPept->new;
my $gb = Bio::DB::GenBank->new;
# factory to turn strings into Bio::Location objects
my $loc_factory = Bio::Factory::FTLocationFactory->new;
my $prot_obj = $gp->get_Seq_by_id($protein_gi);
foreach my $feat ( $prot_obj->top_SeqFeatures ) {
if ( $feat->primary_tag eq 'CDS' ) {
# example: 'coded_by="U05729.1:1..122"'
my @coded_by = $feat->each_tag_value('coded_by');
my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0];
my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc);
# create Bio::Location object from a string
my $loc_object = $loc_factory->from_string($loc_str);
# create a Feature object by using a Location
my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object);
# associate the Feature object with the nucleotide Seq object
$nuc_obj->add_SeqFeature($feat_obj);
my $cds_obj = $feat_obj->spliced_seq;
print "CDS sequence is ",$cds_obj->seq,"\n";
}
}
How do I get the complete spliced nucleotide sequence from the CDS section?
You can use the spliced_seq method. For example:
my $seq_obj = $db->get_Seq_by_id($gi);
foreach my $feat ( $seq_obj->top_SeqFeatures ) {
if ( $feat->primary_tag eq 'CDS' ) {
my $cds_obj = $feat->spliced_seq;
print "CDS sequence is ",$cds_obj->seq,"\n";
}
}
How do I get the reverse-complement of a sequence using the subseq method?
One way is to pass the location to subseq in the form of a Bio::LocationI object. This object holds strand information as
well as coordinates.
use Bio::Location::Simple;
my $location = Bio::Location::Simple->new(-start => $start,
-end => $end,
-strand => "-1");
# assume we already have a sequence object
my $rev_comp_substr = $seq_obj->subseq($location);
I get the warning (old style Annotation) on new style Annotation::Collection. What is wrong?
Wow, you're using an old version! You'll see this error because the modules and interface has changed starting with BioPerl
1.0.
Before v1.0 there was a Bio::Annotation module with add_Comment, add_Reference,
each_Comment, and each_Reference methods.
After v1.0 there is a Bio::Annotation::Collection module with add_Annotation('comment', $ann) and
get_Annotations('comment').
Please update your code in order to avoid seeing these warning messages. In the future the Reference objects will likely be
implemented by the Bio::Biblio system but we hope to maintain a compatible API for these.
Utilities
How do I find all the ORFs in a nucleotide sequence? Antigenic sites in a protein? Calculate nucleotide melting temperature? Find
repeats?
In fact, none of these functions are built into BioPerl but they are all available in the EMBOSS package, as well as many others. The
BioPerl developers created a simple interface to EMBOSS such that any and all EMBOSS programs can be run from within BioPerl. See
Bio::Factory::EMBOSS for more information, it's in the bioperl-run package.
If you can't find the functionality you want in BioPerl then make sure to look for it in EMBOSS, these packages integrate quite
gracefully with BioPerl. Of course, you will have to install EMBOSS to get this functionality.
In addition, BioPerl after version 1.0.1 contains the Pise/Bioperl modules. The Pise package was designed to provide a uniform
interface to bioinformatics applications, and currently provides wrappers to greater than 250 such applications! Included amongst these
wrapped apps are HMMER, PHYLIP, BLAST, GENSCAN, and the EMBOSS suite. Use of the Pise/BioPerl modules does not require installation of
Pise locally as it runs through the HTTP protocol of the web. Also, see the BioMOBY project for information on running applications
remotely.
How do I do motif searches with BioPerl? Can I do "find all sequences that are 75% identical" to a given motif?
There are a number of approaches. Within BioPerl take a look at Bio::Tools::SeqPattern. Or, take a look at the TFBS package. This
BioPerl-compliant package specializes in pattern searching of nucleotide sequence using matrices.
It's also conceivable that the combination of BioPerl and Perl's regular expressions could do the trick. You might also consider
the CPAN module String::Approx (this module addresses the percent match query), but experienced users question whether its distance
estimates are correct, the Unix agrep command is thought to be faster and more accurate. Finally, you could use EMBOSS, as discussed in
the previous question (or you could use Pise to run EMBOSS applications). The relevant programs would be fuzzpro or
fuzznuc.
Can I query MEDLINE or other bibliographic repositories using BioPerl?
Yes! The solution lies in Bio::Biblio*, a set of modules that provide access to MEDLINE and OpenBQS-compliant servers using SOAP. See
Bio::Biblio, Bio::DB::BiblioI, scripts/biblio.PLS, or examples/biblio/* for details and example code.
(Continued on next part...)
Part:
1
2
3
4
5
6
7
|