Issue
I am trying to extract the text using Xclass that follows the bolded politicians name in the below code. I am able to extract the politicians name and URL to their profile, but how would I go about pulling the text that follows?
In the code below, I'm trying to extract it using:
desctext = elem.find_element(By.XPATH,".//b/following-sibling::text()")
I've tried a million other things, but to no avail. For example on the website it says: "Corey Stapleton (R), former Montana Secretary of State, announced his candidacy on November 11, 2022.[35] Stapleton withdrew from the race on October 13, 2023"
I want to pull the text after Corey Stapleton. There is an a href tag embedded inside a bold tag and the text follows.
driver = webdriver.Chrome()
pres_candidates_url = "https://ballotpedia.org/Presidential_candidates,_2024"
driver.get(pres_candidates_url)
elems = driver.find_elements(By.XPATH, "//div[@class='mw-parser-output']//ul//li")
all_members = []
for elem in elems:
member = {}
try:
linktext = elem.find_element(By.XPATH,".//b//a")
except:
continue
words = linktext.text.split()
print
# words = elem.text.split()
count = 0
for w in words: #linktext contains non-names so remove those based on more than one word being lowercase
if w[0].islower():
count +=1
if count < 1:
name = linktext.text
member_url = linktext.get_attribute("href")
try:
desctext = elem.find_element(By.XPATH,".//b/following-sibling::text()")
except:
print("error")
if "(D)" in desctext:
party = "Democrat"
elif "(R)" in desctext:
party = "Republican"
else:
party = desctext
metadata = {"Party:": party}
print(name, member_url, metadata)
member["name"], member["url"], member["metadata"] = name, member_url, metadata
else:
continue
all_members.append(member)
Solution
I don't see any choice than to get the parents and parse the text. You can get the parents by doing:
parents = elem.find_elements(By.XPATH,".//b/a/../..")
This will find all bold anchors/links and go up two levels (so up to <b> and then their parent). You then have to parse their resulting text content.
You can't find it using following-sibling
because that text is not a sibling element (with a tag of its own)
Answered By - Olivier Samson
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.