Issue
This is an example of one of the JSON objects given to me by the API. There are 100 of these.
[{"id": "133248644",
"associations": {"deals": {"results": [{"id": "2762673039",
"type": "line_item_to_deal"}]}},
"properties": {
"createdate": "2020-08-06T15:05:23.253Z",
"description": null,
"hs_lastmodifieddate": "2020-08-06T15:05:23.253Z",
"hs_object_id": "133248644",
"name": "test product",
"price": "100"},
"createdAt": "2020-08-06T15:05:23.253Z",
"updatedAt": "2020-08-06T15:05:23.253Z",
"archived": false}]
I want to create a pandas dataframe that has a column for id as well as all of the properties associated with it, in addition to the id nested under "associations". Essentially I want to remove the properties from being nested under properties and the id from being nested under associations (as well as renamed). How would I go about this?
Here is a reproducable example of my attempt to solve the problem:
import json
import pandas as pd
response = """[{"id": "133248644",
"associations": {"deals": {"results": [{"id": "2762673039",
"type": "line_item_to_deal"}]}},
"properties": {
"createdate": "2020-08-06T15:05:23.253Z",
"description": null,
"hs_lastmodifieddate": "2020-08-06T15:05:23.253Z",
"hs_object_id": "133248644",
"name": "test product",
"price": "100"},
"createdAt": "2020-08-06T15:05:23.253Z",
"updatedAt": "2020-08-06T15:05:23.253Z",
"archived": false},
{"id": "133345685",
"associations": {"deals": {"results": [{"id": "2762673038",
"type": "line_item_to_deal"}]}},
"properties": {
"createdate":
"2020-08-06T18:29:06.773Z",
"description": null,
"hs_lastmodifieddate": "2020-08-06T18:29:06.773Z",
"hs_object_id": "133345685",
"name": "TEST PRODUCT 2",
"price": "2222"},
"createdAt": "2020-08-06T18:29:06.773Z",
"updatedAt": "2020-08-06T18:29:06.773Z",
"archived": false}]"""
data = json.loads(response)
data_flat = [dict(id=x["id"], **x["properties"]) for x in data]
And this is a better solution, but still isn't quite perfect.
data_flat = [dict(lineid=x["id"],dealid=x["associations"]["deals"]["results"][0]["id"], **x["properties"]) for x in data]
Finally, this is very useful but still requires me to extract the id from the associations column in kind of a convoluted way.
normal_data = pd.normalize_data(data)
Solution
- Dealing with
list
s of nestdict
s is convoluted. There is not a readable one-liner for extracting the data. - Read
data
withpandas.json_normalize
associations.deals.results
is alist
ofdict
s, usepandas.DataFrame.explode
to separate eachdict
in thelist
to a separate row- Use
.json_normalize
on'associations.deals.results'
to convert thedict
s to columns. pandas.DataFrame.join
df
to the normalize columns.id
already exists in the dataframe soid
in thedict
will get a right suffix, buttype
does not require a suffix, since it doesn't exist indf
.
- Use
pandas.DataFrame.rename
to rename any desired columns. - Tested in
python 3.10
,pandas 1.4.3
import pandas as pd
import json
# convert response from a string to a list of dicts
data = json.loads(response)
# create a pandas dataframe
df = pd.json_normalize(data)
# associations.deals.results is a list of dicts, explode them
df = df.explode('associations.deals.results', ignore_index=True)
# normalize the dicts in associations.deals.results and join them back to df
df = df.join(pd.json_normalize(df.pop('associations.deals.results')), rsuffix='.associations.deals.results')
# display(df)
id createdAt updatedAt archived properties.createdate properties.description properties.hs_lastmodifieddate properties.hs_object_id properties.name properties.price id.associations.deals.results type
0 133248644 2020-08-06T15:05:23.253Z 2020-08-06T15:05:23.253Z False 2020-08-06T15:05:23.253Z None 2020-08-06T15:05:23.253Z 133248644 test product 100 2762673039 line_item_to_deal
1 133345685 2020-08-06T18:29:06.773Z 2020-08-06T18:29:06.773Z False 2020-08-06T18:29:06.773Z None 2020-08-06T18:29:06.773Z 133345685 TEST PRODUCT 2 2222 2762673038 line_item_to_deal
response
response = """[{"id": "133248644",
"associations": {"deals": {"results": [{"id": "2762673039",
"type": "line_item_to_deal"}]}},
"properties": {
"createdate": "2020-08-06T15:05:23.253Z",
"description": null,
"hs_lastmodifieddate": "2020-08-06T15:05:23.253Z",
"hs_object_id": "133248644",
"name": "test product",
"price": "100"},
"createdAt": "2020-08-06T15:05:23.253Z",
"updatedAt": "2020-08-06T15:05:23.253Z",
"archived": false},
{"id": "133345685",
"associations": {"deals": {"results": [{"id": "2762673038",
"type": "line_item_to_deal"}]}},
"properties": {
"createdate":
"2020-08-06T18:29:06.773Z",
"description": null,
"hs_lastmodifieddate": "2020-08-06T18:29:06.773Z",
"hs_object_id": "133345685",
"name": "TEST PRODUCT 2",
"price": "2222"},
"createdAt": "2020-08-06T18:29:06.773Z",
"updatedAt": "2020-08-06T18:29:06.773Z",
"archived": false}]"""
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.