Skip to content Skip to sidebar Skip to footer

How To Extract A Specific Row From A Html Table Using HTML::TreeBuilder

I am very new to the perl programming and now got stuck very badly.Actually i have to parse a html file containing a single table and i have to extract a row from there whose one c

Solution 1:

Use HTML::TableExtract to process tables in an HTML document. It's an excellent tool.

A very basic example

use warnings;
use strict;
use feature 'say';

use List::MoreUtils qw(none);
use HTML::TableExtract;

my $file = shift @ARGV;
die "Usage: $0 html-file\n" if not $file or not -f $file;

my $html = do {  # read the whole file into $html string
    local $/;
    open my $fh, '<', $file or die "Can't open $file: $!";
    <$fh>;
};

my $te = HTML::TableExtract->new;
$te->parse($html);

# Print all tables in this html page
foreach my $ts ($te->tables) {
   say "Table (", join(',', $ts->coords), "):";
   foreach my $row ($ts->rows) {
      say "\t", join ',', grep { defined } @$row;
   }
}

# Assume that the table of interest is the second one
my $table = ($te->tables)[1];    
foreach my $row ($table->rows) {
    # Select the row you need; for example, identify distinct text in a cell
    next if none { defined and /Maximum_Capacity/ } @$row;
    say "\t", join ',', grep { defined } @$row;
}

The module provides many ways to set up parsing preferences, specify tables, retrieve elements, use headers, etc. Please see documentation and search this site for related posts.

I used none from List::MoreUtils to test if no elements of a list satisfy a condition.

Also see this post and this post, with different processing details, and search for more.


Post a Comment for "How To Extract A Specific Row From A Html Table Using HTML::TreeBuilder"