Issue
I need to scrape some pages. The exact structure of the part that I want is as follows:
<div class="someclasses">
<h3>...</h3> # Not needed
<ul class="ul-class1 ul-class2">
<li id="li1-id" class="li-class1 li-class2">
<div id ="div1-id" class="div-class1 div-class2 ... div-class6">
<div class="div2-class">
<div class="div3-class">...</div> #Not needed
<div class="div4-class1 div4-class2 div4-class3">
<a href="href1" data-control-id="id1" data-control-name="name" id ="a1-id" class="a-class1 a-class2">
<h3 class="h3-class1 h3-class2 h3-class3">Text1</h3>
</a></div>
<div>...</div> # Not needed
</div>
</li>
<li id="li2-id" class="li-class1 li-class2">
<div id ="div2-id" class="div-class1 div-class2 ... div-class6">
<div class="div2-class">
<div class="div3-class">...</div> #Not needed
<div class="div4-class1 div4-class2 div4-class3">
<a href="href2" data-control-id="id2" data-control-name="name" id ="a2-id" class="a-class1 a-class2">
<h3 class="h3-class1 h3-class2 h3-class3">Text2</h3>
</a></div>
<div>...</div> # Not needed
</div>
</li>
# More <li> elements
</ul>
</div>
Now what I want is to get the Texts
as well as the hrefs
.I have used the naming in above example exactly realistic i.e the same names are also the same in the real webpage. The code that I am currently using is:
elems = driver.find_elements_by_xpath("//div[@class='someclasses']/ul[@class='ul-class1']/li[@class='li-class1']")
print(len(elems))
for elem in elems:
elem1 = driver.find_element_by_xpath("./a[@data-control-name='name']")
names2.append(elem1.text)
print(elem1.text)
hrefs.append(elem.get_attribute("href"))
The result of the print
statement above is 0
so basically the elements are not found. Can anyone please tell me what am I doing wrong.
Solution
You are using only part of the class name... in XPATH you need the full class name...
FYI: With CSS you can use part of the class name...
If you want to use XPATH try:
elems = driver.find_elements_by_xpath("//div[@class='someclasses']//li//a")
print(len(elems))
for elem in elems:
names2.append(elem.text)
print(elem.text)
new_href = elem.get_attribute("href")
print(new_href)
hrefs.append(new_href)
For CSS use: div.someclasses ul.ul-class1
elems = driver.find_elements_by_css_selector("div.someclasses ul.ul-class1 li a")
for elem in elems:
names2.append(elem.text)
print(elem.text)
new_href = elem.get_attribute("href")
print(new_href)
hrefs.append(new_href)
Answered By - Moshe Slavin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.