Biotech > FAQ > BioPerl FAQ (Frequently Asked Questions)

How do I retrieve a nucleotide coding sequence w...

To see other biotech frequently asked questions, please visit http://biotech.fyicenter.com/faq/

(Continued from previous question...)

How do I retrieve a nucleotide coding sequence when I have a protein gi number?

You could go through the protein's feature table and find the coded_by value. The trick is to associate the coded_by nucleotide coordinates to the nucleotide entry, which you'll retrieve using the accession number from the same feature.

use Bio::Factory::FTLocationFactory;
use Bio::DB::GenPept;
use Bio::DB::GenBank;

my $gp = Bio::DB::GenPept->new;
my $gb = Bio::DB::GenBank->new;
# factory to turn strings into Bio::Location objects
my $loc_factory = Bio::Factory::FTLocationFactory->new;
	  
my $prot_obj = $gp->get_Seq_by_id($protein_gi);
foreach my $feat ( $prot_obj->top_SeqFeatures ) {
   if ( $feat->primary_tag eq 'CDS' ) {
   # example: 'coded_by="U05729.1:1..122"'
   my @coded_by = $feat->each_tag_value('coded_by');
   my ($nuc_acc,$loc_str) = split /\:/, $coded_by[0];
   my $nuc_obj = $gb->get_Seq_by_acc($nuc_acc);
   # create Bio::Location object from a string
   my $loc_object = $loc_factory->from_string($loc_str);
   # create a Feature object by using a Location
   my $feat_obj = Bio::SeqFeature::Generic->new(-location =>$loc_object);
   # associate the Feature object with the nucleotide Seq object
   $nuc_obj->add_SeqFeature($feat_obj);
    my $cds_obj = $feat_obj->spliced_seq;
    print "CDS sequence is ",$cds_obj->seq,"\n";
   }
}

(Continued on next question...)

Other Frequently Asked Questions