Skip to content Skip to sidebar Skip to footer

XPath Taking Text With Hyperlinks (Python)

I'm new at using XPath (and I'm a relative beginner at Python in general). I'm trying to take the text out of the first paragraph of a Wikipedia page through it. Take for instance

Solution 1:

The links themselves are nodes that you need to descend.


Solution 2:

Your XPath query matches the text child nodes of that node only. The text of the embedded live on another node and therefore excluded.

  1. To descend use //text() as suggested; this will retrieve the text value of any descending node starting from the node in question.

  2. Alternatively, you can select the node in question itself and retrieve the text using a parser method text_content() to retrieve the text including all child nodes.

lxml import html
import requests

page = requests.get('')
tree = html.fromstring(page.content)
firstp = tree.xpath('/html/body/div[3]/div[3]/div[4]/div/p[1]')

Post a Comment for "XPath Taking Text With Hyperlinks (Python)"