Tuesday, January 25, 2022

[FIXED] Trying to scrape texts with the same divs and no other info

January 25, 2022 scrapy No comments

Issue

This html has 3 divs with the same name accounts-table__count but different types of information.

I'm trying to get the posts count and and follower count of this page. is there a way to take the texts using css selector?

site;https://mastodon.online/explore

<div class='directory__card__extra'>
    <div class='accounts-table__count'>
        629
        <small>posts</small>
    </div>
    <div class='accounts-table__count'>
        72
        <small>followers</small>
    </div>
    <div class='accounts-table__count'>
        <time class='time-ago' datetime='2021-05-18' title='May 18, 2021'>May 18, 2021</time>
        <small>last active</small>
    </div>
</div>

my code;

    def parse(self, response):
        for users in response.css('div.directory__card'):
            yield {
                'id': users.css('span::text').get().replace('@','').replace('.','-'),
                'name': users.css('strong.p-name::text').get(),
                'posts': ''              // this is the post count //
                'followers': ''             // this is the follower count //
                'description': users.css('p::text').get(),
                'fediverse': users.css('span::text').get(),
                'link': users.css('a.directory__card__bar__name').attrib['href'],
                'image': users.css('img.u-photo').attrib['src'],
                'bg-image': users.css('img').attrib['src'],

            }
        for nextpage in response.css('span.next'):
            next_page = nextpage.css('a').attrib['href']
            if next_page is not None:
                yield response.follow(next_page, callback=self.parse)

Solution

As example, iterate over card, for each one get the values in shape of text and filter out the values.

raw_data = response.css(".directory__card")[0].css(".accounts-table__count::text").getall()
values = list(filter(lambda s: s != "", map(lambda s: s.strip(), raw_data)))

Some values from css selector of .accounts-table__count::text are empty, because div elements with this class has no text, but other html elements in it.

Answered By - Serhii Shynkarenko

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 25, 2022

[FIXED] Trying to scrape texts with the same divs and no other info

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels