Thursday, November 23, 2023

[FIXED] Using Selenium and XPATH, how would I extract text that follows a bolded link?

November 23, 2023 python, selenium-webdriver No comments

Issue

I am trying to extract the text using Xclass that follows the bolded politicians name in the below code. I am able to extract the politicians name and URL to their profile, but how would I go about pulling the text that follows?

In the code below, I'm trying to extract it using:

desctext = elem.find_element(By.XPATH,".//b/following-sibling::text()")

I've tried a million other things, but to no avail. For example on the website it says: "Corey Stapleton (R), former Montana Secretary of State, announced his candidacy on November 11, 2022.[35] Stapleton withdrew from the race on October 13, 2023"

I want to pull the text after Corey Stapleton. There is an a href tag embedded inside a bold tag and the text follows.

driver = webdriver.Chrome()
pres_candidates_url = "https://ballotpedia.org/Presidential_candidates,_2024"
driver.get(pres_candidates_url)
   
elems = driver.find_elements(By.XPATH, "//div[@class='mw-parser-output']//ul//li")

all_members = []
for elem in elems:
    member = {}
    try:
        linktext = elem.find_element(By.XPATH,".//b//a")
    except:
        continue
    words = linktext.text.split()
    
    print
    # words = elem.text.split()
    
    count = 0
    for w in words: #linktext contains non-names so remove those based on more than one word being lowercase
        if w[0].islower(): 
            count +=1 
    if count < 1:
        name = linktext.text
        member_url = linktext.get_attribute("href")
        try:
            desctext = elem.find_element(By.XPATH,".//b/following-sibling::text()")
        except:
            print("error")
        if "(D)" in desctext:
            party = "Democrat"
        elif "(R)" in desctext:
            party = "Republican"
        else:
            party = desctext
        metadata = {"Party:": party}
        print(name, member_url, metadata)
        member["name"], member["url"], member["metadata"] = name, member_url, metadata 
    else:
        continue
    all_members.append(member)

Solution

I don't see any choice than to get the parents and parse the text. You can get the parents by doing:

parents = elem.find_elements(By.XPATH,".//b/a/../..")

This will find all bold anchors/links and go up two levels (so up to <b> and then their parent). You then have to parse their resulting text content.

You can't find it using following-sibling because that text is not a sibling element (with a tag of its own)

Answered By - Olivier Samson

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, November 23, 2023

[FIXED] Using Selenium and XPATH, how would I extract text that follows a bolded link?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels