Issue
EDIT:
Goal:
To see if I can implement functions within scrapy's items.py
and call them in the main scraper. The problem I have is that I cannot download the returned output, i.e.
scrapy crawl SQL -o test.csv
will return nothing. I wanted to know what was the best method to convert the Field
values in items.py
into a pandas dataframe and store that output. I had assumed my simple example would help only partially.
If I instead have return print(table)
the correct table values are printed, but I cannot download the output with the command shell.
import scrapy
import pandas as pd
class ScrapyExercisesItem(scrapy.Item):
def __init__(self):
self._name = scrapy.Field()
self._keyword = scrapy.Field()
def returnTable(self):
table = pd.DataFrame( [ self._name , self._keyword] )
return table
scraper:
import scrapy
from scrapy_exercises.items import ScrapyExercisesItem
class SQLTest(scrapy.Spider):
name = 'SQL'
start_urls = [f'https://quotes.toscrape.com/page/{i}/' for i in range(1, 11)]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback = self.parse
)
def parse(self, response):
content = response.xpath("//div[@class='col-md-8']//div")
for items in content:
table = ScrapyExercisesItem()
table._name= items.xpath(".//span//@href").get()
table._keyword= items.xpath(".//div[@class = 'tags']//a[1]//text()").get()
yield table.returnTable()
Solution
Your error is because you did not give values to the required attributes you're trying to perform the function with:
table = functionInit()
table.string = 'some string'
table.integer = 111
From here you can create your dataframe:
table.convert()
Answered By - try_hard
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.