Issue
I have a Pandas DataFrame with the following column called "image_versions2.candidates":
df_myposts['image_versions2.candidates']
That give me:
0 [{'width': 750, 'height': 498, 'url': 'https:/XXX'}]
1 NaN
2 [{'width': 750, 'height': 498, 'url': 'https:/YYY'}]
3 [{'width': 750, 'height': 498, 'url': 'https:/ZZZ'}]
I'm trying to extract the url into a new column called for example 'image_url'.
I can extract a single URL with the following code:
df_myposts['image_versions2.candidates'][0][0]['url']
'https:/XXX'
But with the second row it give me the following error due to the NaN value:
df_myposts['image_versions2.candidates'][1][0]['url']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-64-3f0532195cb7> in <module>
----> 1 df_myposts['image_versions2.candidates'][1][0]['url']
TypeError: 'float' object is not subscriptable
I'm trying with some type of loop and if condition but I'm having similar error messages:
for i in df_myposts['image_versions2.candidates']:
if type(i[0]) == 'list':
Which could be the better option to perform this without dropping NaN rows? I have another column with the Id so I want to maintain the relation id <-> url. Thanks
Solution
Use:
df = pd.DataFrame({'a':[1,2,3], 'b':[[{'width': 750, 'height': 498, 'url': 'https:/XXX'}], [{'width': 750, 'height': 498, 'url': 'https:/YYY'}], None]})
# df.dropna(inplace = True) #drop rows with null values
# to preserve rows with NaN, first replace NaN values with a scalar/dict value
df.fillna('null', inplace=True)
df['c'] = df['b'].apply(lambda x: [y['url'] if isinstance(x, list) else 'null' for y in x])
df['c'] = df['c'].apply(lambda x:x[0]) #get only the url from the list
#Output:
a b c
0 1 [{'width': 750, 'height': 498, 'url': 'https:/... https:/XXX
1 2 [{'width': 750, 'height': 498, 'url': 'https:/... https:/YYY
2 3 null null
Answered By - amanb
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.