Friday, June 3, 2022

[FIXED] Scrapy Extract Dynamic Table Data from Datasource directly

June 03, 2022 datatable, dynamic, scrapy No comments

Issue

using scrapy I want to extract the data that is shown in a dynamic table on the webpage. As the table is a dynamic one - scrapy's response xpath to tbody-tag doesn't return any data

In [1]: response.xpath('//table/tbody').getall()
Out[1]: ['<tbody></tbody>']

On the other hand scrapy's response xpath to table-tag actually already contains all data - even in a structured way:

In [2]: response.xpath('//table').getall()
Out[2]: ['<table class="table icms-dt rs_preserve" cellspacing="0" width="100%" id="publikation" data-webpack-module="datatables" data-entity-type="publikation" data-entities="{&quot;emptyColumns&quot;:[&quot;privatKategorie&quot;,&quot;_thumbnail&quot;],&quot;data&quot;:[{&quot;name&quot;:&quot;&lt;a href=\\&quot;\\/_rte\\/publikation\\/35897\\&quot;&gt;Nutzungsbedingungen&lt;\\/a&gt;&quot;,&quot;name-sort&quot;:&quot;nutzungsbedingungen&quot;,&quot;herausgeber&quot;:&quot;Informatikdienst&quot;,&quot;herausgeber-sort&quot;:&quot;informatikdienst&quot;,&quot;datum&quot;:&quot;16.12.2010&quot;,&quot;datum-sort&quot;:&quot;2010-12-16&quot;,&quot;kategorieId&quot;:&quot;publikation&quot;,&quot;kategorieId-sort&quot;:&quot;publikation&quot;,&quot;privatKategorie&quot;:&quot;&quot;,&quot;privatKategorie-sort&quot;:&quot;&quot;,&quot;_thumbnail&quot;:&quot;&quot;,&quot;_downloadBtn

I want to extract the table data in a structured way - e.g. by row and column. Is there a way with BeautifulSoup for instance? Any idea & help are highly appreciated.

The table can be examined with scrapy shell as follows:

scrapy shell "rapperswil-jona.ch/publikationen"

Solution

Here you go:

import json
raw_data =response.xpath('//table/@data-entities').get()
data = json.loads(raw_data)

The data is in the data-entities attribute. You can extract that using the XPath as above. This returns a string.

This string can then be converted to a dict using json.loads().

Expanding this further, the actual data is in the key data. If you access it, you will get a list. You can run a loop, export to CSV, or process it further as you wish:

for item in data['data']:
     print(item['name-sort'])

Answered By - Upendra

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, June 3, 2022

[FIXED] Scrapy Extract Dynamic Table Data from Datasource directly

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels