Issue
I queried the html node, where the date of an article is stored. I noticed a different date in the datetime attribute compared to the text inside the node when scraping the site. In the development tools of Google Chrome the datetime attribute is the same as the displayed text. My question is, why does scrapy get a different datetime attribute as the development tools? And can I somehow get the correct date from the datetime attribute?
This is the code and the return value:
response.xpath("//*[@class='a20-news-date']/time").getall()
['<time datetime="2021-11-15T08:17:20+01:00">Sonntag, 08.03.2020 // 17:20 Uhr</time>']
The development tools of Google display the node as:
<div class="a20-news-date">
<time datetime="2020-03-08T17:20:16+01:00">8. März 2020</time>
</div>
Solution
Because if you check HTML source code (Ctrl+U
) you'll find that there are several <time>
elements in the page. What you see in Dev Tools is a result DOM after Javascript execution. Your target element is located inside <article>
tag in source HTML:
response.xpath("//article//time/text()").get()
Answered By - gangabass
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.