Issue
This html has 3 divs with the same name accounts-table__count
but different types of information.
I'm trying to get the posts count and and follower count of this page. is there a way to take the texts using css selector?
site;https://mastodon.online/explore
<div class='directory__card__extra'>
<div class='accounts-table__count'>
629
<small>posts</small>
</div>
<div class='accounts-table__count'>
72
<small>followers</small>
</div>
<div class='accounts-table__count'>
<time class='time-ago' datetime='2021-05-18' title='May 18, 2021'>May 18, 2021</time>
<small>last active</small>
</div>
</div>
my code;
def parse(self, response):
for users in response.css('div.directory__card'):
yield {
'id': users.css('span::text').get().replace('@','').replace('.','-'),
'name': users.css('strong.p-name::text').get(),
'posts': '' // this is the post count //
'followers': '' // this is the follower count //
'description': users.css('p::text').get(),
'fediverse': users.css('span::text').get(),
'link': users.css('a.directory__card__bar__name').attrib['href'],
'image': users.css('img.u-photo').attrib['src'],
'bg-image': users.css('img').attrib['src'],
}
for nextpage in response.css('span.next'):
next_page = nextpage.css('a').attrib['href']
if next_page is not None:
yield response.follow(next_page, callback=self.parse)
Solution
As example, iterate over card, for each one get the values in shape of text
and filter out the values.
raw_data = response.css(".directory__card")[0].css(".accounts-table__count::text").getall()
values = list(filter(lambda s: s != "", map(lambda s: s.strip(), raw_data)))
Some values from css selector of .accounts-table__count::text
are empty, because div
elements with this class has no text, but other html elements in it.
Answered By - Serhii Shynkarenko
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.