Issue
I have Pandas dataframe created from a list of dictionaries that looks as follows:
import pandas as pd
raw_data = [{'type': 'message',
'subtype': 'bot',
'text': 'Direction: North\nDestination: New York\nNew Customer: true\n',
'timestamp': '1650907136.554649',
'bot_id': 'ABC'},
{'type': 'message',
'subtype': 'bot',
'text': 'Direction: South\nDestination: Miami\nNew Customer: false\n',
'timestamp': '1650907098.340029',
'bot_id': 'DEF'},
{'type': 'message',
'subtype': 'bot',
'text': 'Direction: East\nDestination: Los Angeles\nNew Customer: true\n',
'timestamp': '1650906935.705729',
'bot_id': 'XYZ'}]
df = pd.DataFrame(raw_data)
df
type subtype text timestamp bot_id
0 message bot Direction: North\nDestination: New York\nNew C... 1650907136.554649 ABC
1 message bot Direction: South\nDestination: Miami\nNew Cust... 1650907098.340029 DEF
2 message bot Direction: East\nDestination: Los Angeles\nNew... 1650906935.705729 XYZ
I use the following code to parse the text
column and create a Pandas dataframe consisting of the following:
df[['direction', 'destination', 'new_customer', 'foo']] = df['text'].str.split(pat='\n', expand=True)
type subtype text timestamp bot_id direction destination new_customer foo
0 message bot Direction: North\nDestination: New York\nNew C... 1650907136.554649 ABC Direction: North Destination: New York New Customer: true
1 message bot Direction: South\nDestination: Miami\nNew Cust... 1650907098.340029 DEF Direction: South Destination: Miami New Customer: false
2 message bot Direction: East\nDestination: Los Angeles\nNew... 1650906935.705729 XYZ Direction: East Destination: Los Angeles New Customer: true
I then strip some unnecessary detail from the columns:
df['direction'] = df['direction'].str.replace('Direction: ', '')
df['destination'] = df['destination'].str.replace('Destination: ', '')
df['new_customer'] = df['new_customer'].str.replace('New Customer: ', '')
This gives:
type subtype text timestamp bot_id direction destination new_customer foo
0 message bot Direction: North\nDestination: New York\nNew C... 1650907136.554649 ABC North New York true
1 message bot Direction: South\nDestination: Miami\nNew Cust... 1650907098.340029 DEF South Miami false
2 message bot Direction: East\nDestination: Los Angeles\nNew... 1650906935.705729 XYZ East Los Angeles true
So far, so good. Now, I have the following (more complicated) list of nested dictionaries:
raw_data_2 = [{'type': 'message',
'subtype': 'bot_message',
'text': "This content can't be displayed.",
'timestamp': '1650905606.755969',
'username': 'admin',
'bot_id': 'BPD4K3SJW',
'blocks': [{'type': 'section',
'block_id': 'BJNTn',
'text': {'type': 'mrkdwn',
'text': 'You have a new message.',
'verbatim': False}},
{'type': 'section',
'block_id': 'WPn/l',
'text': {'type': 'mrkdwn',
'text': '*Heard By*\nFriend',
'verbatim': False}},
{'type': 'section',
'block_id': '5yp',
'text': {'type': 'mrkdwn',
'text': '*Which Direction? *\nNorth',
'verbatim': False}},
{'type': 'section',
'block_id': 'fKEpF',
'text': {'type': 'mrkdwn',
'text': '*Which Destination*\nNew York',
'verbatim': False}},
{'type': 'section',
'block_id': 'qjAH',
'text': {'type': 'mrkdwn',
'text': '*New Customer:*\Yes',
'verbatim': False}},
{'type': 'section',
'block_id': 'IBr',
'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]},
{'type': 'message',
'subtype': 'bot_message',
'text': "This content can't be displayed.",
'timestamp': '1650899428.077709',
'username': 'admin',
'bot_id': 'BPD4K3SJW',
'blocks': [{'type': 'section',
'block_id': 'Smd',
'text': {'type': 'mrkdwn',
'text': 'You have a new message.',
'verbatim': False}},
{'type': 'section',
'block_id': '6YaLt',
'text': {'type': 'mrkdwn',
'text': '*Heard By*\nOnline Search',
'verbatim': False}},
{'type': 'section',
'block_id': 'w3o',
'text': {'type': 'mrkdwn',
'text': '*Which Direction: *\nNorth',
'verbatim': False}},
{'type': 'section',
'block_id': 'PTQ',
'text': {'type': 'mrkdwn',
'text': '*Which Destination? *\nMiami',
'verbatim': False}},
{'type': 'section',
'block_id': 'JCfSP',
'text': {'type': 'mrkdwn',
'text': '*New Customer? *\nNo',
'verbatim': False}},
{'type': 'section',
'block_id': 'RJOA',
'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]}]
I put this into a dataframe df2
as follows:
df2 = pd.DataFrame(raw_data_2)
type subtype text timestamp username bot_id blocks
0 message bot_message This content can't be displayed. 1650905606.755969 admin BPD4K3SJW [{'type': 'section', 'block_id': 'BJNTn', 'tex...
1 message bot_message This content can't be displayed. 1650899428.077709 admin BPD4K3SJW [{'type': 'section', 'block_id': 'Smd', 'text'...
How would I parse the blocks
column of df2
to achieve the following dataframe?
type subtype text timestamp username bot_id heard_by direction destination new_customer
message bot_message This content can't be displayed 1650905606.755969 admin BPD4K3SJW Friend North New York Yes
message bot_message This content can't be displayed 1650899428.077709 admin BPD4K3SJW Online North Miami No
Thanks!
EDIT
The solution provided by @onyambu works perfectly on raw_data_2
, but not on the following raw_data_3
(which includes another block):
raw_data_3 = [{'type': 'message',
'subtype': 'bot_message',
'text': "This content can't be displayed.",
'timestamp': '1650905606.755969',
'username': 'admin',
'bot_id': 'BPD4K3SJW',
'blocks': [{'type': 'section',
'block_id': 'BJNTn',
'text': {'type': 'mrkdwn',
'text': 'You have a new message.',
'verbatim': False}},
{'type': 'section',
'block_id': 'WPn/l',
'text': {'type': 'mrkdwn',
'text': '*Heard By*\nFriend',
'verbatim': False}},
{'type': 'section',
'block_id': '5yp',
'text': {'type': 'mrkdwn',
'text': '*Which Direction? *\nNorth',
'verbatim': False}},
{'type': 'section',
'block_id': 'fKEpF',
'text': {'type': 'mrkdwn',
'text': '*Which Destination*\nNew York',
'verbatim': False}},
{'type': 'section',
'block_id': 'qjAH',
'text': {'type': 'mrkdwn',
'text': '*New Customer:*\Yes',
'verbatim': False}},
{'type': 'actions',
'block_id': 'yt4',
'elements': [{'type': 'button',
'action_id': '+bc',
'text': {'type': 'plain_text', 'text': 'View results', 'emoji': True},
'url': 'www.example.com'}]},
{'type': 'section',
'block_id': 'IBr',
'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]},
{'type': 'message',
'subtype': 'bot_message',
'text': "This content can't be displayed.",
'timestamp': '1650899428.077709',
'username': 'admin',
'bot_id': 'BPD4K3SJW',
'blocks': [{'type': 'section',
'block_id': 'Smd',
'text': {'type': 'mrkdwn',
'text': 'You have a new message.',
'verbatim': False}},
{'type': 'section',
'block_id': '6YaLt',
'text': {'type': 'mrkdwn',
'text': '*Heard By*\nOnline Search',
'verbatim': False}},
{'type': 'section',
'block_id': 'w3o',
'text': {'type': 'mrkdwn',
'text': '*Which Direction: *\nNorth',
'verbatim': False}},
{'type': 'section',
'block_id': 'PTQ',
'text': {'type': 'mrkdwn',
'text': '*Which Destination? *\nMiami',
'verbatim': False}},
{'type': 'section',
'block_id': 'JCfSP',
'text': {'type': 'mrkdwn',
'text': '*New Customer? *\nNo',
'verbatim': False}},
{'type': 'actions',
'block_id': 'yt4',
'elements': [{'type': 'button',
'action_id': '+bc',
'text': {'type': 'plain_text', 'text': 'View results', 'emoji': True},
'url': 'www.example.com'}]},
{'type': 'section',
'block_id': 'RJOA',
'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]}]
It's throwing a key error:
KeyError: 'text'
Solution
import re
d_new = (pd.DataFrame([[re.sub(".*[*]\\W+", "", val['text']['text'])
for val in dat['blocks']] for dat in raw_data_2]).
drop([0, 5], axis = 1))
d_new.columns = ['heard_by', 'direction','destination', 'new_customer']
d_new
heard_by direction destination new_customer
0 Friend North New York Yes
1 Online Search North Miami No
You can then append this to your original data
Answered By - onyambu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.