Thursday, August 18, 2022

[FIXED] Extract 'url' value from Pandas Series

August 18, 2022 pandas, python No comments

Issue

I have a Pandas DataFrame with the following column called "image_versions2.candidates":

df_myposts['image_versions2.candidates']

That give me:

0      [{'width': 750, 'height': 498, 'url': 'https:/XXX'}]
1                                                    NaN
2      [{'width': 750, 'height': 498, 'url': 'https:/YYY'}]
3      [{'width': 750, 'height': 498, 'url': 'https:/ZZZ'}]

I'm trying to extract the url into a new column called for example 'image_url'.

I can extract a single URL with the following code:

df_myposts['image_versions2.candidates'][0][0]['url']

'https:/XXX'

But with the second row it give me the following error due to the NaN value:

df_myposts['image_versions2.candidates'][1][0]['url']

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-64-3f0532195cb7> in <module>
----> 1 df_myposts['image_versions2.candidates'][1][0]['url']

TypeError: 'float' object is not subscriptable

I'm trying with some type of loop and if condition but I'm having similar error messages:

for i in df_myposts['image_versions2.candidates']:
    if type(i[0]) == 'list':

Which could be the better option to perform this without dropping NaN rows? I have another column with the Id so I want to maintain the relation id <-> url. Thanks

Solution

Use:

df = pd.DataFrame({'a':[1,2,3], 'b':[[{'width': 750, 'height': 498, 'url': 'https:/XXX'}], [{'width': 750, 'height': 498, 'url': 'https:/YYY'}], None]})
# df.dropna(inplace = True) #drop rows with null values
# to preserve rows with NaN, first replace NaN values with a scalar/dict value
df.fillna('null', inplace=True)
df['c'] = df['b'].apply(lambda x: [y['url'] if isinstance(x, list) else 'null' for y in x])
df['c'] = df['c'].apply(lambda x:x[0]) #get only the url from the list

#Output:
    a                        b                                   c
0   1   [{'width': 750, 'height': 498, 'url': 'https:/...   https:/XXX
1   2   [{'width': 750, 'height': 498, 'url': 'https:/...   https:/YYY
2   3                       null                                null

Answered By - amanb

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, August 18, 2022

[FIXED] Extract 'url' value from Pandas Series

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels