Issue
I am pretty new to the Python and i am trying to get some data from website. But i am struggling that when i execute code below. I am getting values from the page in the apostrophe which is not valid format of the json.
something like
[{'companyId': 1,
'companyPhotoId': 9120,
'description': 'Pracovní '
'prostory',
'fileId': '4ec99adf-f89b-481d-8f6d-3d2f49b1f1f1',
'isThumbHorizontal': False,
'order': 1,
'thumbnailFileId': 'e00c3c9c-55d3-4bad-bd5a-d485bfab2986'},
{'companyId': 1,
'companyPhotoId': 9121,
'description': 'mDevcamp 2018',
'fileId': '089dfef5-5c89-4e56-ad49-c6458d258a3f',
'isThumbHorizontal': False,
'order': 2,
'thumbnailFileId': '411cbd66-dbb4-4385-8ae9-cc89f8787346'},
{'companyId': 1,
'companyPhotoId': 9122,
'description': 'Kancl 2018',
'fileId': 'fcdadaeb-3960-45be-b575-0a0be34a73bc',
'isThumbHorizontal': True,
'order': 3,
'thumbnailFileId': '7cd162e9-1d18-4629-b685-9b4246637fef'}]
import scrapy
from pprint import pprint
import json
class Project1SpiderSpider(scrapy.Spider):
name = 'project1-spider'
allowed_domains = ['somewebsite']
start_urls = ['somewebsite'.format(i + 1) for i in range(2000)]
def parse(self, response):
results = json.loads(response.body)
pprint(results)
i need to get it in the format like this
[{"companyId": 1,
"companyPhotoId": 9120,
"description": "Pracovní "
"prostory",
"fileId": "4ec99adf-f89b-481d-8f6d-3d2f49b1f1f1",
"isThumbHorizontal": False,
"order": 1,
"thumbnailFileId": "e00c3c9c-55d3-4bad-bd5a-d485bfab2986"},
{"companyId": 1,
"companyPhotoId": 9121,
"description": "mDevcamp 2018",
"fileId": "089dfef5-5c89-4e56-ad49-c6458d258a3f",
"isThumbHorizontal": False,
"order": 2,
"thumbnailFileId": "411cbd66-dbb4-4385-8ae9-cc89f8787346"},
{"companyId": 1,
"companyPhotoId": 9122,
"description": "Kancl 2018",
"fileId": "fcdadaeb-3960-45be-b575-0a0be34a73bc",
"isThumbHorizontal": True,
"order": 3,
"thumbnailFileId": "7cd162e9-1d18-4629-b685-9b4246637fef"}]
Could you please help me how the code should look like instead please.
Thank you very much
Solution
When you do json.loads(response.body)
it will convert from json string into python object. And you got the result because you print the python object.
To get the result you want, you should either print the original json: print(response.body)
or if you want to print it nicely you should convert the python object into json string with indent, i.e. print(json.dumps(results, indent=2))
.
def parse(self, response):
# Get a python object
results = json.loads(response.body)
# Pretty print the json
print(json.dumps(results, indent=2))
Answered By - Yosua
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.