Issue
class LinkSpider(scrapy.Spider):
name = "link"
def start_requests(self):
urlBasang = "https://bloomberg.com"
yield scrapy.Request(url = urlBasang, callback = self.parse)
def parse(self, response):
newCsv = open('data_information/link.csv', 'a')
for j in response.xpath('//a'):
title_to_save = j.xpath('/text()').extract_first()
href_to_save= j.xpath('/@href').extract_first()
print("test")
print(title_to_save)
print(href_to_save)
newCsv.write(title_to_save+ "\n")
newCsv.close()
this is my code but title_to_save and href_to_save return None
I want to get all text inside tag "a" and its href
Solution
You want
title_to_save = j.xpath('./text()').get()
href_to_save= j.xpath('./@href').get()
Note the dot before the path
(I use get
instead of extract_first
due to this).
On the output csv, perhaps you are aware but you should probably yield
the information you want to write out and then run your spider using the -o data_information/link.csv
option which is a bit more flexible than opening a file for appending in your parse
method. So your code would look something like
class LinkSpider(scrapy.Spider):
name = "link"
# No need for start_requests for as this is the default anyway
start_urls = ["https://bloomberg.com"]
def parse(self, response):
for j in response.xpath('//a'):
title_to_save = j.xpath('./text()').get()
href_to_save= j.xpath('./@href').get()
print("test")
print(title_to_save)
print(href_to_save)
yield {'title': title_to_save}
Answered By - tomjn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.