Issue
my script is below, first it reads plate_num_xlsx
value from excel file data.xlsx
successfully then requests scrapy to scrape data from url. At each parse() invocation, I am taking values parsed to item
then trying to export them to excel with pandas.
if plate_num_xlsx==plate.replace(" ","").strip():
item= {"plate": plate.strip(), "price": price.strip()}
else:
item = {"plate": plate.strip(), "price": "-"}
df_output=pd.DataFrame([item],columns=["PLATE","PRICE"])
df_output.to_excel("output_res.xlsx",sheet_name="result",index=False,header=True)
Excel file output_res.xlsx
has been created successfully. But parsed data in item is not being exported to (written to) that file. What can be issue?
import scrapy
from scrapy.crawler import CrawlerProcess
import pandas as pd
class plateScraper(scrapy.Spider):
name = 'scrapePlate'
allowed_domains = ['dvlaregistrations.direct.gov.uk']
def start_requests(self):
df=pd.read_excel('data.xlsx')
columnA_values=df['PLATE']
for row in columnA_values:
global plate_num_xlsx
plate_num_xlsx=row
base_url =f"https://dvlaregistrations.dvla.gov.uk/search/results.html?search={plate_num_xlsx}&action=index&pricefrom=0&priceto=&prefixmatches=¤tmatches=&limitprefix=&limitcurrent=&limitauction=&searched=true&openoption=&language=en&prefix2=Search&super=&super_pricefrom=&super_priceto="
url=base_url
yield scrapy.Request(url)
def parse(self, response):
for row in response.css('div.resultsstrip'):
plate = row.css('a::text').get()
price = row.css('p::text').get()
if plate_num_xlsx==plate.replace(" ","").strip():
item= {"plate": plate.strip(), "price": price.strip()}
else:
item = {"plate": plate.strip(), "price": "-"}
df_output=pd.DataFrame([item],columns=["PLATE","PRICE"])
df_output.to_excel("output_res.xlsx",sheet_name="result",index=False,header=True)
process = CrawlerProcess()
process.crawl(plateScraper)
process.start()
Solution
The keys of dictionary item
are used as columns of your new pd.DataFrame
. As per pandas documentation, providing columns to pd.DataFrame
names columns by provided column names if data doesn't have column names already, and if data does contain column names (as it does in your case), it uses column names from pd.DataFrame method to do a column select.
As you don't have dictionary items named "PLATE" or "PRICE", the resulting dataframe is empty.
So, skip providing column names in pd.DataFrame
, and if you want to rename them, do so after creating the dataframe.
df_output=pd.DataFrame([item])
df_output.columns = ["PLATE","PRICE"]
Answered By - dm2
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.