Issue
Hello I was scraping a site but then I ran into trouble because of the structure of the site,
Here is one page of the site https://www.dehatilyrics.top/2018/09/dilli-wali-gori-ridam-tripathi-lyrics.html
I want to get the main body of content excluding the Song Info part, As you can see there are many span tags, I can't understand how to get the entire page at once.
Here is what I tried,
response.xpath('//*[@class="post-body entry-content"]/div[1]/span/text()').extract()
This returned me a part of the entire content the bottom part How to get the entire content
Solution
By Song Info you mean this part?
Song :- Dilli Wali Gori
Singer :- Ridam Tripathi
Lyrics & Composition :- Ridam Tripathi
Music Director :- Ajay Verma "AV"
Video Director :- Shunty
Dop :- Govind Bist
Company/ Label :- Wave
Since that is in the first span
, so you can exclude it using list slicing,
You can get it like this
entire_body = " ".join(response.xpath('//*[@class="post-body entry-content"]/div[1]/span//text()').extract()[1:])
entire_body = "".join(entire_body)
You can always do that to check Selectors/Xpaths
Answered By - Umair Ayub
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.