Issue
So I wanted to flatten nested json data in this pandas data frame as additional columns. I am using requests to get json data from an api and create a pandas data frame.
import requests import pandas as pd import json
url = 'https://api.openaq.org/v2/locations?coordinates=38.7437396,-9.2302436&radius=5000'
api_request = requests.get(url)
api_content = api_request.content
api_json = json.loads(api_content)
df = pd.json_normalize(api_json)
df
This outputs the following:
results meta.name meta.license meta.website meta.page meta.limit meta.found
0 [{'id': 8209, 'city': 'Lisboa', 'name': 'PT031... openaq-api CC BY 4.0d api.openaq.org 1 100 4
so, I did
df = pd.json_normalize(api_json, record_path=["results"])
Which gives a nicer result:
id city name entity country sources isMobile isAnalysis parameters sensorType lastUpdated firstUpdated measurements bounds manufacturers coordinates.latitude coordinates.longitude
0 8209 Lisboa PT03100 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25414, 'unit': 'µg/m³', 'count': 10005... reference grade 2022-08-27T16:00:00+00:00 2017-09-22T11:00:00+00:00 246216 None None 38.748056 -9.202500
1 8211 Lisboa PT03084 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25397, 'unit': 'µg/m³', 'count': 93281... reference grade 2022-08-27T16:00:00+00:00 2017-09-22T10:00:00+00:00 239733 None None 38.754167 -9.230833
2 7199 Lisboa PT03087 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 24800, 'unit': 'µg/m³', 'count': 97702... reference grade 2022-08-27T16:00:00+00:00 2017-09-22T13:00:00+00:00 235512 None None 38.705000 -9.210278
3 7767 Lisboa PT03082 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25393, 'unit': 'µg/m³', 'count': 83437... reference grade 2022-08-27T16:00:00+00:00 2018-02-27T01:00:00+00:00 304693 None None 38.738889 -9.207500
However, the 'sources' and 'parameter' columns are still in json format. How can I unpack these into new columns?
Also, can I for instance just pull the 'url' key out of the 'sources' json data and make it its own column?
Solution
You can do like this. Although there are so many columns you might want to prune things up a bit before merging into main df.
df_sources = pd.json_normalize(df.sources.explode())
df_params = pd.json_normalize(df.parameters.explode())
df = pd.concat([df, df_sources, df_params], axis=1)
del df['sources'], df['parameters']
Answered By - Firelord
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.