Webscrape Vba With If Condition

I am trying to import the bullet point from a website into an excel table (each bulletpoint filling with a li tag). Yet I am facing an important difficulty as some page I would lik

Solution 1:


I would gather a nodeList via css selectors to match on the relevant nodes. I would have two separate nodeLists. One for the generalities and another for the parts. I would determine the number of parts (as they repeat) and loop to those number of parts; concatenating the html for the repeated part that comes later with the former. Then put that combined html into a surrogate HTMLDocument variable and make a new nodeList of all the li elements contained. Use a helper function to return the text of the nodeList nodes in an array and then write that out to the sheet on a new combined text per row basis.


Option Explicit

Public Sub WindInfo()
    'VBE> Tools > References:
    '1. Microsoft, XML v6
    '2. Microsoft HTML Object Library
    '3. Microsoft Scripting Runtime
    Dim xhr As MSXML2.XMLHTTP60: Set xhr = New MSXML2.XMLHTTP60
    Dim html As MSHTML.HTMLDocument: Set html = New MSHTML.HTMLDocument
    Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("Sheet1")
    With xhr
        .Open "GET", "", False
        html.body.innerHTML = .responseText
    End With

    Dim generalities AsObject, arrGen(), partsList AsObject
    Dim r As Long

    Set generalities = html.querySelectorAll("#bloc_texte table ~ table li")
    arrGen = GetNodesTextAsArray(generalities)
    Dim parts AsObject, numberOfParts As Long
    Set partsList = html.querySelectorAll("h1 ~ h3, ul ~ h3")
    r = 1If partsList.Length > 0 Then
        numberOfParts = html.querySelectorAll("h1 ~ h3, ul ~ h3").Length / 2
        Set parts = html.querySelectorAll("h3 + ul")
        Dim i As Long, liNodes AsObject, arr()
        Dim html2 As MSHTML.HTMLDocument: Set html2 = New MSHTML.HTMLDocument
        For i = 0 To numberOfParts - 1
            ws.Cells(r, 1).Resize(1, UBound(arrGen)) = arrGen
            html2.body.innerHTML = parts.Item(i).outerHTML & parts.Item(i + numberOfParts).outerHTML
            Set liNodes = html2.querySelectorAll("li")
            arr = GetNodesTextAsArray(liNodes)
            ws.Cells(r, 5).Resize(1, UBound(arr)) = arr
            r = r + 1
        Dim alternateNodeList AsObject: Set alternateNodeList = html.querySelectorAll("#bloc_texte h1 + ul")
        If alternateNodeList.Length >= 1 Then
            arr = GetNodesTextAsArray(alternateNodeList.Item(1).getElementsByTagName("li"))
            arr = Array("No", "Data", vbNullString)
        End If
        ws.Cells(r, 1).Resize(1, UBound(arrGen)) = arrGen
        ws.Cells(r, 5).Resize(1, UBound(arr)) = arr
    End If
End Sub

PublicFunctionGetNodesTextAsArray(ByVal nodeList AsObject) AsVariant()
    DimiAsLong, results()
    IfnodeList.Length = 0 ThenGetNodesTextAsArray = Array("No", "Data", vbNullString)
        ExitFunctionEndIfReDimresults(1 To nodeList.Length)

    Fori = 0 TonodeList.Length - 1
        results(i + 1) = nodeList.Item(i).innerTextNextiGetNodesTextAsArray = resultsEndFunction


