Issue
I am trying to remove the United States listed in the organization columns within a list for all the rows of an extensive dataset.
My df looks something like this:
ID | Organizations |
---|---|
1 | ['education', 'health', 'United States', 'facebook'] |
2 | ['health', 'Airlines', 'WHO', 'United States'] |
...
I want my output to look like this:
ID | Organizations |
---|---|
1 | ['education', 'health','facebook'] |
2 | ['health', 'Airlines', 'WHO'] |
The code I tried:
df=df['organizations'].remove("United States")
gave me the following error:
AttributeError: 'Series' object has no attribute 'remove'
Solution
You would need to loop here, using apply
:
df['Organizations'].apply(lambda l: l.remove('United States'))
Or a list comprehension:
df['Organizations'] = [[x for x in l if x != 'United States'] for l in df['Organizations']]
Output:
ID Organizations
0 1 [education, health, facebook]
1 2 [health, Airlines, WHO]
Note that the first one will fail if you don't have 'United States' in all the lists
handling NaNs
df['Organizations'] = [[x for x in l if x != 'United States']
if isinstance(l, list) else l
for l in df['Organizations']]
Used input:
df = pd.DataFrame({'ID': [1, 2],
'Organizations': [['education', 'health', 'United States', 'facebook'],
['health', 'Airlines', 'WHO', 'United States']]})
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.