Issue
When page loads, I can find, 1 div, 2 divs, 3 divs or 0 div, on every load
An example with 3 divs below:
<div class="SOME_dIV">
<span class="class_title">TITLE-1</span>
<span class="_some_class1">VALUE-1</span>
<span class="_some_class2">VALUE-2</span>
<span class="_some_class3">VALUE-3</span>
</div>
<div class="SOME_dIV">
<span class="class_title">TITLE-2</span>
<span class="_some_class1">VALUE-10</span>
<span class="_some_class2">VALUE-20</span>
<span class="_some_class3">VALUE-30</span>
</div>
<div class="SOME_dIV">
<span class="class_title">TITLE-3</span>
<span class="_some_class1">VALUE-100</span>
<span class="_some_class2">VALUE-200</span>
<span class="_some_class3">VALUE-300</span>
</div>
My Python code
html = webdriver.Firefox()
html.get('DYNAMIC_URL')
html_source = html.page_source
html_source_bs = bs(html_source, 'html.parser')
all_divs = html_source_bs.find_all('div', class_='SOME_DIV')
span_litle = all_divs[0].find('div', class_='class_title')
span_1 = all_divs[0].find_all('span', class_=lambda c: '_some_class1' in c)
span_2 = all_divs[0].find_all('span', class_=lambda c: '_some_class2' in c)
span_3 = all_divs[0].find_all('span', class_=lambda c: '_some_class3' in c)
title_list = ['Title']
span1_list = ['Span1']
span2_list = ['Span2']
span3_list = ['Span3']
for l_title in corrida_numero:
result = l_title.strip()
title_list.append(result)
for l_1 in participantes_numeros[0:]:
result = l_1.text.strip()
span1_list.append(result)
for l_2 in participantes_nomes[0:]:
result = l_2.text.strip()
span2_list.append(result)
for l_3 in participantes_odds[0:]:
result = l_3.text.strip()
span3_list.append(result)
print(title_list)
print(span1_list)
print(span2_list)
print(span3_list)
Output
['Title', 'TITLE-1']
['Span1', 'VALUE-1']
['Span2', 'VALUE-2']
['Span3', 'VALUE-3']
Expected Output if there are 3 divs
['Title', 'TITLE-1']
['Span1', 'VALUE-1']
['Span2', 'VALUE-2']
['Span3', 'VALUE-3']
['Title', 'TITLE-2']
['Span1', 'VALUE-10']
['Span2', 'VALUE-20']
['Span3', 'VALUE-30']
['Title', 'TITLE-3']
['Span1', 'VALUE-100']
['Span2', 'VALUE-200']
['Span3', 'VALUE-300']
I'm web scraping information from one site. When the site loads, I can find one div with class 'SOME_DIV, 2 divs, or 3 divs, or even more and also any div (0).
If there are 3 divs with class 'SOME_DIV' when webdriver loads the page, then I want to get info for all the divs.
At this moment, I can get only the first div data with "all_divs[0].find_all", I want to get data of other divs if exists, but i don't kwnow how many divs will be find until the page loads.
Solution
You could use the length of all_divs and use a for loop and the corresponding index to scrape and parse the data.
see sample code below,
all_divs = html_source_bs.find_all('div', class_='SOME_DIV')
span_title = []
span_1 =[]
span_2 =[]
span_3 =[]
for i in range(len(all_divs):
span_title.append(all_divs[i].find('div', class_='class_title'))
span_1.append(all_divs[0].find_all('span', class_=lambda c: '_some_class1' in c))
#Add span_2 & 3 here
Answered By - Sureshmani Kalirajan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.