Issue
I want to scrape information from this site: https://www.atl.no/finn-trafikkskole?limit=0&limitstart=0 (a national list of driving schools) to map zip codes and company names on a map (I've already got the mapping from zip codes to coordinates) to find areas with a significant concentration of schools. An optimal result would be a selector that extracts all the relevant information of each of the 710 companies (all relevant information of each company)
Highlighted zip code of the first driving school
I've tried copying both CSS "selector" and XPath of the wanted table (table as in Chrome DevTools) but when running the CSS selector/XPath in Scrapy it returns nothing.
Example of the copied CSS selector that gives nothing when ran in a Scrapy shell:
In(1): response.css("#adminForm > table > tbody").extract()
Out(1): []
What have I done wrong and how should I proceed to get the wanted result?
Solution
Based on page structure I would split the parsing job as follows:
def extract_text(self, item):
text = item.get()
text = re.sub(r'<.*?>', '', text)
return text
def parse(self, response):
for school in response.css('.uk-table tr'):
yield {
'address': self.extract_text(school.css('.school-address')),
'school': school.css('tr > td > a::text').get(),
}
Answered By - Bogdan Veliscu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.