Issue
I apologize for the weird title. But I truly did my best to describe my question.
I am currently scraping a real estate website using bs4. It all went fine until when one building had several rooms each having different prices.
If I'd gone:
monthly=soup.find_all('span',{'class':'cassetteitem_other-emphasis ui-text--bold'})
The results were:
[<span class="cassetteitem_other-emphasis ui-text--bold">6.6万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.6万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.7万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.5万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.4万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.2万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.5万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.5万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.5万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.7万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.5万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.7万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">6.4万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.3万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7万円</span>]
Quite eye squinting. But looking through the list, the list does not distinguish the one building with many rooms. I would like to tie the building with its room and get rid of the nasty html and get only the text.
So I did,
for i in range(len(rent)):
monthly=rent[i].find_all('span',{'class':'cassetteitem_other-emphasis ui-text--bold'})
to get,
[<span class="cassetteitem_other-emphasis ui-text--bold">6.6万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.6万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.7万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.5万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.4万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">7.2万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.5万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.5万円</span>, <span class="cassetteitem_other-emphasis ui-text--bold">7.5万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.7万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.5万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.7万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">6.4万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">7.3万円</span>]
[<span class="cassetteitem_other-emphasis ui-text--bold">7万円</span>]
This is where I am stuck. As you can see, there is one list that is longer than the other. It has 4 rooms in 1 building.
['6.6万円', '6.6万円', '6.7万円', '6.5万円', '6.4万円', '7.2万円', '7.5万円', '7.5万円', '7.5万円', '6万円', '6.7万円', '6万円', '6.5万円', '6.7万円', '6.4万円', '7.3万円', '7万円']
Iterating through it to get texts, leaves me with this. While my desired output would be something like,
['6.6万円', '6.6万円', '6.7万円', '6.5万円', '6.4万円', ['7.2万円', '7.5万円', '7.5万円', '7.5万円'], '6万円', '6.7万円', '6万円', '6.5万円', '6.7万円', '6.4万円', '7.3万円', '7万円']
Is it possible to get this result?
- beautifulsoup gave me a list.
- cannot .text it right away
- want to create a list within a list as shown above.
I really appreciate all of your guys help.
** This is the code I used to iterate, get text and create a list
for ii in range(len(monthly)):
cleared_monthly=monthly[ii].text
monthly_list.append(cleared_monthly)
print(monthly_list)
Solution
There's no need to call items by index when iterating over a list. Furthermore I suggest checking the length of the lists in monthly
to see whether it contains only one element or not, if the latter is the case add an extra loop for the sublist.
Using list comprehension that would make:
monthly = [i.find_all('span',{'class':'cassetteitem_other-emphasis ui-text--bold'}) for i in rent]
monthly_list = [l[0].get_text() if len(l)==1 else [i.get_text() for i in l] for l in monthly]
Answered By - RJ Adriaansen
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.