Skip to content Skip to sidebar Skip to footer

How To Extract A Specific Row From A Html Table Using Html::treebuilder

I am very new to the perl programming and now got stuck very badly.Actually i have to parse a html file containing a single table and i have to extract a row from there whose one c

Solution 1:

Use HTML::TableExtract to process tables in an HTML document. It's an excellent tool.

A very basic example

use warnings;
use strict;
use feature 'say';

use List::MoreUtils qw(none);
use HTML::TableExtract;

my $file = shift @ARGV;
die"Usage: $0 html-file\n"ifnot $file ornot -f $file;

my $html = do {  # read the whole file into $html stringlocal $/;
    openmy $fh, '<', $file ordie"Can't open $file: $!";
    <$fh>;
};

my $te = HTML::TableExtract->new;
$te->parse($html);

# Print all tables in this html pageforeachmy $ts ($te->tables) {
   say"Table (", join(',', $ts->coords), "):";
   foreachmy $row ($ts->rows) {
      say"\t", join',', grep { defined } @$row;
   }
}

# Assume that the table of interest is the second onemy $table = ($te->tables)[1];    
foreachmy $row ($table->rows) {
    # Select the row you need; for example, identify distinct text in a cellnextif none { definedand /Maximum_Capacity/ } @$row;
    say"\t", join',', grep { defined } @$row;
}

The module provides many ways to set up parsing preferences, specify tables, retrieve elements, use headers, etc. Please see documentation and search this site for related posts.

I used none from List::MoreUtils to test if no elements of a list satisfy a condition.

Also see this post and this post, with different processing details, and search for more.

Post a Comment for "How To Extract A Specific Row From A Html Table Using Html::treebuilder"