Issue
before this is marked as duplicate, I've searched and tried other solutions found on SO, which are:
- scrapy css selector: get text of all inner tags
- How to get the text from child nodes if it is parents to other node in Scrapy using XPath
- scrapy get the entire text including children
The HTML I want to extract from is:
<span class="location">
Mandarin Oriental Hotel
<a class="" href="/search-results/Jalan+Pinang%252C+Kuala+Lumpur+City+Centre%252C+50088+Kuala+Lumpur%252C+Wilayah+Persekutuan./?state=Kuala+Lumpur" itemprop="addressRegion" title="Jalan Pinang, Kuala Lumpur City Centre, 50088 Kuala Lumpur, Wilayah Persekutuan.">
Jalan Pinang, Kuala Lumpur City Centre, 50088 Kuala Lumpur, Wilayah Persekutuan.
</a>
,
<a class="" href="/search-results/?neighbourhood=Kuala+Lumpur&state=Kuala+Lumpur" title="Kuala Lumpur">
Kuala Lumpur
</a>
,
<a class="" href="/search-results/?state=Kuala+Lumpur" title="Kuala Lumpur">
Kuala Lumpur
</a>
<span class="" itemprop="postalCode">
50088
</span>
</span>
I want to get all the text in the //span[@class='location'] .
I have tried:
response.xpath("//span[@class='location']//text()").extract_first()
response.css("span.location *::text").extract_first()
response.css("span.location ::text").extract_first()
All of them only return Mandarin Oriental Hotel
, not the full address.
EDIT: The text should yield
Mandarin Oriental Hotel Jalan Pinang, Kuala Lumpur City Centre, 50088 Kuala Lumpur, Wilayah Persekutuan., Kuala Lumpur, Kuala Lumpur 50088
Solution
Try to use below code to get string representation of each span
with address:
for entry in response.xpath("//div[@class='entry']"):
print(entry.xpath("normalize-space(./span[@class='location'])").extract_first())
Answered By - Andersson
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.