How To Extract A Specific Row From A Html Table Using HTML::TreeBuilder
I am very new to the perl programming and now got stuck very badly.Actually i have to parse a html file containing a single table and i have to extract a row from there whose one c
Solution 1:
Use HTML::TableExtract to process tables in an HTML document. It's an excellent tool.
A very basic example
use warnings;
use strict;
use feature 'say';
use List::MoreUtils qw(none);
use HTML::TableExtract;
my $file = shift @ARGV;
die "Usage: $0 html-file\n" if not $file or not -f $file;
my $html = do { # read the whole file into $html string
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>;
};
my $te = HTML::TableExtract->new;
$te->parse($html);
# Print all tables in this html page
foreach my $ts ($te->tables) {
say "Table (", join(',', $ts->coords), "):";
foreach my $row ($ts->rows) {
say "\t", join ',', grep { defined } @$row;
}
}
# Assume that the table of interest is the second one
my $table = ($te->tables)[1];
foreach my $row ($table->rows) {
# Select the row you need; for example, identify distinct text in a cell
next if none { defined and /Maximum_Capacity/ } @$row;
say "\t", join ',', grep { defined } @$row;
}
The module provides many ways to set up parsing preferences, specify tables, retrieve elements, use headers, etc. Please see documentation and search this site for related posts.
I used none
from List::MoreUtils to test if no elements of a list satisfy a condition.
Also see this post and this post, with different processing details, and search for more.
Post a Comment for "How To Extract A Specific Row From A Html Table Using HTML::TreeBuilder"