Issue
After hours troubleshooting, I finally was able to determine that the reason I couldn't scrape this data is because the most vital data is being commented out, and js must be loading it. A "print response" does actually see it, but scrapy will not pull that data.
Solution
xpath
has comment()
to get comment.
But it gives comment as normal text and you have to remove <!--
and -->
and parse it to search inside this HTML
. In scrapy
you can use class Selector()
to parse it.
Minimal working code
from scrapy.selector import Selector
sel = Selector(text='''
<div>
<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->
</div>''')
comment = sel.xpath('//comment()').get()
print(comment)
#html = comment.replace('<!--', '').replace('-->', '')
html = comment[4:-3]
print(html)
sel = Selector(text=html)
divs = sel.xpath('//div').getall()
print(divs)
Result:
<!--
<div class="outer">
<div class="inner">Hello World</div>
</div>
-->
<div class="outer">
<div class="inner">Hello World</div>
</div>
['<div class="outer">\n<div class="inner">Hello World</div>\n</div>', '<div class="inner">Hello World</div>']
Answered By - furas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.