Issue
I am trying to web scrape using selenium and beautiful soupe but i cannot get selenium to find the element I need and return the text.
here is the html:
<span class="t-14 t-normal">
<span aria-hidden="true"><!---->Crédit Agricole CIB · Full-time<!----></span><span class="visually-hidden"><!---->Crédit Agricole CIB · Full-time<!----></span>
</span>
Do you know how to get the text 'Crédit Agricole CIB Full-time' from this html?
I am trying to do something like this:
src = driver.page_source
soup = BeautifulSoup(src, 'lxml') # Now using beautiful soup
intro = soup.find('div', {'class': 'pv-text-details__left-panel'})
text_loc = intro.find( ???? ) # Extracting the text
text = text_loc.get_text().strip() # Removing extra blank space
I do not know what to put in the ????
Solution
I can't confirm without knowing exactly what the full HTML looks like - there might be other very similarly nested elements before the snippet shared in the question, but if there aren't then you can use soup.select_one
with the css selectors used below:
spanTxt1 = soup.select_one('span.t-14.t-normal span[aria-hidden="true"]')
if spanTxt1 is not None: spanTxt1 = spanTxt1.get_text(strip=True)
spanTxt2 = soup.select_one('span.t-14.t-normal span.visually-hidden')
if spanTxt2 is not None: spanTxt2 = spanTxt2.get_text(strip=True)
print(f' Text1: "{spanTxt1}" \n Text2: "{spanTxt2}" ')
should give the output
Text1: "Crédit Agricole CIB · Full-time"
Text2: "Crédit Agricole CIB · Full-time"
EDIT:
I think the ember..
section ids are dynamically generated and might be different every time. A more reliable selector for the jobs listed in the experience section might be
expSel = 'div#experience ~ div.pvs-list__outer-container ul.pvs-list li'
(It's going for the list next to the [empty] div id="experience"
anchor)
You can even choose a specific experience from the list by changing the end to li:nth-child(2)
for the second experience, li:last-child
for the last experience, li:nth-last-child(2)
for the second-to-last experience, etc...
You could directly add on to the selector to get the first company:
c1span = soup.select_one(expSel+' span.t-14.t-normal span')
if c1span is not None:
print(c1span.get_text(strip=True))
and that should print Crédit Agricole CIB · Full-time
You could also use expSel
to get all the listed experience:
expSelRef = {
'Position': 'span.mr1.t-bold',
'Company+Type': 'span.t-14.t-normal',
'Dates': 'span.t-14.t-normal.t-black--light',
'Location': 'span.t-14.t-normal.t-black--light + span'
}
for e in soup.select(expSel):
for r in expSelRef:
eDet = e.select_one(expSelRef[r]+' span[aria-hidden="true"]')
if eDet is not None:
print(f' [ {r}: "{eDet.get_text(strip=True)}" ] ', end='')
print()
output:
[ Position: "Structured Products & Equity Derivatives Sales" ] [ Company+Type: "Crédit Agricole CIB · Full-time" ] [ Dates: "Jan 2020 - Present · 2 yrs 10 mos" ] [ Location: "Paris, Île-de-France, France" ]
[ Position: "Equity Sales Trader Assistant" ] [ Company+Type: "ODDO BHF · Internship" ] [ Dates: "Jun 2019 - Jan 2020 · 8 mos" ] [ Location: "Paris, Île-de-France, France" ]
[ Position: "Wealth Management Analyst" ] [ Company+Type: "HSBC · Internship" ] [ Dates: "Mar 2018 - Sep 2018 · 7 mos" ] [ Location: "Paris, Île-de-France, France" ]
[ Position: "Business Developper" ] [ Company+Type: "Capgemini · Internship" ] [ Dates: "Jan 2017 - Aug 2017 · 8 mos" ]
Answered By - Driftr95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.