Friday, August 19, 2022

[FIXED] Find "a" element in BS4 by partial class name not working?

August 19, 2022 beautifulsoup, python No comments

Issue

I want to find an a element in a soup object by a substring present in its class name. This particular element will always have JobTitle inside the class name, with random preceding and trailing characters, so I need to locate it by its substring of JobTitle.

You can see the element here:

It's safe to assume there is only 1 a element to find, so using find should work, however my attempts (there have been more than the 2 shown below) have not worked. I've also included the top elements in case it's relevant for location for some reason.

I'm on Windows 10, Python 3.10.5, and BS4 4.11.1.

I've created a reproducible example below (I thought the regex way would have worked, but I guess not):

import re
from bs4 import BeautifulSoup

# Parse this HTML, getting the only a['href'] in it (line 22)
html_to_parse = """
    <li>
        <div class="cardOutline tapItem fs-unmask result job_5ef6bf779263a83c sponsoredJob resultWithShelf sponTapItem desktop vjs-highlight">
            <div class="slider_container css-g7s71f eu4oa1w0">
            <div class="slider_list css-kyg8or eu4oa1w0">
                <div class="slider_item css-kyg8or eu4oa1w0">
                <div class="job_seen_beacon">
                    <div class="fe_logo">
                    <img alt="CyberCoders logo" class="feLogoImg desktop" src="https://d2q79iu7y748jz.cloudfront.net/s/_squarelogo/256x256/f0b43dcaa7850e2110bc8847ebad087b" />
                    </div>
                    <table cellpadding="0" cellspacing="0" class="jobCard_mainContent big6_visualChanges" role="presentation">
                    <tbody>
                        <tr>
                        <td class="resultContent">
                            <div class="css-1xpvg2o e37uo190">
                            <h2 class="jobTitle jobTitle-newJob css-bdjp2m eu4oa1w0" tabindex="-1">
                                <a aria-label="full details of REMOTE Senior Python Developer" class="jcs-JobTitle css-jspxzf eu4oa1w0" data-ci="385558680" data-empn="8690912762161442" data-hide-spinner="true" data-hiring-event="false" data-jk="5ef6bf779263a83c" data-mobtk="1g9u19rmn2ea6000" data-tu="https://jsv3.recruitics.com/partner/a51b8de1-f7bf-11e7-9edd-d951492604d9.gif?client=521&amp;rx_c=&amp;rx_campaign=indeed16&amp;rx_group=110383&amp;rx_source=Indeed&amp;job=KE2-168714218&amp;rx_r=none&amp;rx_ts=20220808T034442Z&amp;rx_pre=1&amp;indeed=sp" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0CpFJQzrgRR8WqXWK1qKKEqALWJw739KlKqr2H-MSI4eoBlI4EFrmor2FYZMP3muM35UEpv7D8dnBwRFuIf8XmtgYykaU5Nl3fSsXZ8xXiGdq3dZVwYJYR2-iS1SqyS7j4jGQ4Clod3n72L285Zn7LuBKMjFoBPi4tB5X2mdRnx-UikeGviwDC-ahkoLgSBwNaEmvShQxaFt_IoqJP6OlMtTd7XlgeNdWJKY9Ph9u8n4tcsN_tCjwIc3RJRtS1O7U0xcsVy5Gi1JBR1W7vmqcg5n4WW1R_JnTwQQ8LVnUF3sDzT4IWevccQb289ocL5T4jSfRi7fZ6z14jrR6bKwoffT6ZMypqw4pXgZ0uvKv2v9m3vJu_e5Qit1D77G1lNCk9jWiUHjWcTSYwhhwNoRzjAwd4kvmzeoMJeUG0gbTDrXFf3V2uJQwjZhTul-nbfNeFPRX6vIb4jgiTn4h3JVq-zw0woq3hTrLq1z9Xpocf5lIGs9U7WJnZM-Mh7QugzLk1yM3prCk7tQYRl3aKrDdTsOdbl5Afs1DkatDI7TgQgFrr5Iauhiv7I9Ss-fzPJvezhlYR4hjkkmSSAKr3Esz06bh5GlZKFONpq1I0IG5aejSdS_kJUhnQ1D4Uj4x7X_mBBN-fjQmL_CdyWM1FzNNK0cZwdLjKL-d8UK1xPx3MS-O-WxVGaMq0rn4lyXgOx7op9EHQ2Qdxy9Dbtg6GNYg5qBv0iDURQqi7_MNiEBD-AaEyqMF3riCBJ4wQiVaMjSTiH_DTyBIsYc0UsjRGG4a949oMHZ8yL4mGg57QUvvn5M_urCwCtQTuyWZBzJhWFmdtcPKCn7LpvKTFGQRUUjsr6mMFTQpA0oCYSO7E-w2Kjj0loPccA9hul3tEwQm1Eh58zHI7lJO77kseFQND7Zm9OMz19oN45mvwlEgHBEj4YcENhG6wdB6M5agUoyyPm8fLCTOejStoecXYnYizm2tGFLfqNnV-XtyDZNV_sQKQ2TQ==&amp;xkcb=SoD0-_M3b-KooEWCyR0LbzkdCdPP&amp;p=0&amp;fvj=0&amp;vjs=3" id="sj_5ef6bf779263a83c" role="button" target="_blank">
                                <span id="jobTitle-5ef6bf779263a83c" title="REMOTE Senior Python Developer">REMOTE Senior Python Developer</span>
                                </a>
                            </h2>
                            </div>
                        </td>
                        </tr>
                    </tbody>
                    </table>
                </div>
                </div>
            </div>
            </div>
        </div>
    </li>
"""

# Soupify it
soup = BeautifulSoup(html_to_parse, "html.parser")

# Start by making sure "find_all("a")" works
all_links = soup.find_all("a")
print(all_links)
# Good.

# Attempt 1
job_url = soup.find('a[class*="JobTitle"]').a['href']
print(job_url)
# Nope.

# Attempt 2
job_url = soup.find("a", {"class": re.compile("^.*jobTitle.*")}).a['href']
print(job_url)
# Nope...

Solution

To find an element with partial class name you need to use select, not find. The will give you the <a> tag, the href will be in it

job_url = soup.select_one('a[class*="JobTitle"]')['href']
print(job_url)
# /pagead/clk?mo=r&ad=-6NYlbfkN0CpFJQzrgRR8WqXWK1qKKEqALWJw739KlKqr2H-MSI4eoBlI4EFrmor2FYZMP3muM35UEpv7D8dnBwRFuIf8XmtgYykaU5Nl3fSsXZ8xXiGdq3dZVwYJYR2-iS1SqyS7j4jGQ4Clod3n72L285Zn7LuBKMjFoBPi4tB5X2mdRnx-UikeGviwDC-ahkoLgSBwNaEmvShQxaFt_IoqJP6OlMtTd7XlgeNdWJKY9Ph9u8n4tcsN_tCjwIc3RJRtS1O7U0xcsVy5Gi1JBR1W7vmqcg5n4WW1R_JnTwQQ8LVnUF3sDzT4IWevccQb289ocL5T4jSfRi7fZ6z14jrR6bKwoffT6ZMypqw4pXgZ0uvKv2v9m3vJu_e5Qit1D77G1lNCk9jWiUHjWcTSYwhhwNoRzjAwd4kvmzeoMJeUG0gbTDrXFf3V2uJQwjZhTul-nbfNeFPRX6vIb4jgiTn4h3JVq-zw0woq3hTrLq1z9Xpocf5lIGs9U7WJnZM-Mh7QugzLk1yM3prCk7tQYRl3aKrDdTsOdbl5Afs1DkatDI7TgQgFrr5Iauhiv7I9Ss-fzPJvezhlYR4hjkkmSSAKr3Esz06bh5GlZKFONpq1I0IG5aejSdS_kJUhnQ1D4Uj4x7X_mBBN-fjQmL_CdyWM1FzNNK0cZwdLjKL-d8UK1xPx3MS-O-WxVGaMq0rn4lyXgOx7op9EHQ2Qdxy9Dbtg6GNYg5qBv0iDURQqi7_MNiEBD-AaEyqMF3riCBJ4wQiVaMjSTiH_DTyBIsYc0UsjRGG4a949oMHZ8yL4mGg57QUvvn5M_urCwCtQTuyWZBzJhWFmdtcPKCn7LpvKTFGQRUUjsr6mMFTQpA0oCYSO7E-w2Kjj0loPccA9hul3tEwQm1Eh58zHI7lJO77kseFQND7Zm9OMz19oN45mvwlEgHBEj4YcENhG6wdB6M5agUoyyPm8fLCTOejStoecXYnYizm2tGFLfqNnV-XtyDZNV_sQKQ2TQ==&xkcb=SoD0-_M3b-KooEWCyR0LbzkdCdPP&p=0&fvj=0&vjs=3

Answered By - Guy

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, August 19, 2022

[FIXED] Find "a" element in BS4 by partial class name not working?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels