Skip to content Skip to sidebar Skip to footer

How Do I Get Extract A Table From An Html Page As A Data.frame Using Xml And Rcurl In R

I need to extract a table as a data.frame from the following HTML Page: https://www.forbes.com/powerful-brands/list/#tab:rank.html

Solution 1:

That table has live content, so you need a headless browser, Rselenium should be your first choice. Also, you need rvest to extract the table

Note: After you navigate to that page, there will be a transition page, you can click continue manually or just wait a few seconds. This page will pop up

Code:

library(rvest)
library(RSelenium)

remDr <-rsDriver(port = 4445L,browser = "chrome")
myclient <- remDr$client
#navigate to that page#After navigate to that page,you need to manually click "continue button" or select and click it with css or just wait a few seconds
myclient$navigate("https://www.forbes.com/powerful-brands/list/#tab:rank")
#you need to scroll down several times, or you will get only top 10 in the list
replicate(20,myclient$sendKeysToActiveElement(list(key="page_down")))
#get pagesource
mypagesource <- unlist(myclient$getPageSource())
#Using rvest to extract table
mytable <-read_html(mypagesource) %>% html_node("#the_list") %>% html_table()


> str(mytable)
'data.frame':   109 obs. of  8 variables:
    $                    : logi  NA NA NA NA NA NA ...
$ Rank               : chr"#1""#2""#3""#4" ...
$ Brand              : chr"Apple""Google""Microsoft""Facebook" ...
$ Brand Value        : chr"$170 B""$101.8 B""$87 B""$73.5 B" ...
$ 1-Yr Value Change  : chr"10%" "23%" "16%" "40%" ...
$ Brand Revenue      : chr"$214.2 B""$80.5 B""$85.3 B""$25.6 B" ...
$ Company Advertising: chr"$1.8 B""$3.9 B""$1.6 B""$310 M" ...
$ Industry           : chr"Technology""Technology""Technology""Technology" ...

Then you can clean the data afterwards: datatable

Introduction and tutorials to those packages: https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-basics.htmlhttps://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

Post a Comment for "How Do I Get Extract A Table From An Html Page As A Data.frame Using Xml And Rcurl In R"