Biotech > FAQ > BioPerl FAQ (Frequently Asked Questions)

Does Bio::SearchIO parse the HTML output that BL...

To see other biotech frequently asked questions, please visit http://biotech.fyicenter.com/faq/

(Continued from previous question...)

Does Bio::SearchIO parse the HTML output that BLAST creates using the -T option?

Yes, with a twist. You can modify Bio::SearchIO's _readline() method such that it reads in the HTML and strips it of tags using the HTML::Strip module.

Please note: We do not suggest parsing BLAST HTML output if it can be avoided. We actively support XML, tabular, and text output parsing of NCBI BLAST reports only; we have never supported parsing of NCBI BLAST HTML output directly through BioPerl and will not attempt to rectify problems where HTML output parsing post-stripping of the tags breaks but parsing text output works. Consider this fair warning.

use Bio::SearchIO;
use HTML::Strip;
my $hs = HTML::Strip->new();
# replace the blast parser's _readline method with one that
# auto-strips HTML:
package Bio::SearchIO::blast;

sub Bio::SearchIO::blast::_readline {
 my ($self, @args) = @_;
 my $line = $self->SUPER::_readline(@args);
 return unless defined $line;
 return $hs->parse($line);
}
# now parse using the BLAST format module
 my $in = new Bio::SearchIO(-format => 'blast', -file   => $file);

(Continued on next question...)

Other Frequently Asked Questions