Issue
With the following code:
data = driver.find_elements(By.XPATH, '//div[@class="postInfo desktop"]/span[@class="nameBlock"]')
I got those html codes below:
<span class="nameBlock">
<span class="name">Anonymous</span>
<span class="posteruid id_RDS8pJvL">(ID:
<span class="hand" title="Highlight posts by this ID" style="background-color: rgb(228, 51,
138); color: white;">RDS8pJvL</span>)</span>
<span title="United States" class="flag flag-us"></span>
</span>
And
<span class="nameBlock">
<span class="name">Pierre</span>
<span class="postertrip">!AYZrMZsavE</span>
<span class="posteruid id_y5EgihFc">(ID:
<span class="hand" title="Highlight posts by this ID"
style="background-color: rgb(136, 179, 155); color: black;">y5EgihFc</span>)</span>
<span title="Australia" class="flag flag-au"></span>
</span>
Now I need to get the "countries" => "United States" and "Australia".
With the whole dataset (more than 120k entries), I was doing:
for i in data:
country = i.find_element(By.XPATH, './/span[contains(@class,"flag")]').get_attribute('title')
But after a while I got empty entries and I figured out than sometime the class of the country was completely changing from "flag something" to "bf something" or "cd something"
This is why I decided to go with the last children for each element:
for i in data:
country = i.find_element(By.XPATH, './/span[3]').get_attribute('title')
But again, after a while I got error again because sometime there were some <span class="postertrip">BLABLA</span>
popping, moving the "country" location to "span[4]".
So, I changed for the following one:
for i in data:
country = i.find_element(By.XPATH, './/span[last()]').get_attribute('title')
But this last one always give me the second level child (posteruid child):
<span class="hand" title="Highlight posts by this ID"
style="background-color: rgb(136, 179, 155); color: black;">y5EgihFc</span>)
One thing that I'm certain: the country is ALWAYS the last child (span) of the first level of children.
So I'm out of ideas this is why I'm asking you this question.
Solution
Use the following xpath
to always identify the last child of parent.
(//span[@class='nameBlock']//span[@title])[last()]
Code block.
for country in driver.find_elements(By.XPATH, "(//span[@class='nameBlock']//span[@title])[last()]"):
print(country.get_attribute("title"))
Answered By - KunduK
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.