Issue
I have a data frame and I want to remove some rows if their value is not equal to some values that I have stored in a list.
So I have a list variable stating the values of objects I want to keep:
allowed_values = ["value1", "value2", "value3"]
And I am attempting to remove rows from my dataframe if a certain column does not contain 1 of the allowed_values
. At first I was using a for
loop and if
statement like this:
for index, row in df.iterrows():
if row["Type"] not in allowed_values:
# drop the row, was about to find out how to do this, but then I found out about the `.loc()` method and thought it would be better to use this.
So using the .loc()
method, I can do something like this to only keep objects that have a Type
value equal to value1
:
df = df.loc[df["Type"] == "value1"]
But I want to keep all objects that have a Type
in the allowed_values
list. I tried to do this:
df = df.loc[df["Type"] in allowed_values]
but this gives me the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I would expect this to still work as using the in
or a combination of not in
operators still results in a boolean, so I'm not sure why the .loc()
method doesn't like these operators.
What exactly is wrong with using in
or not
operators in the .loc()
method and how can I create a logical statment that will drop rows if their Type
value is not in the allowed_values
list?
EDIT: I found this question asking about the same error I got and the answer was that you need to use bitwise operators only (e.g. ==
, !=
, &
, |
, etc) and not
and in
are not bitwise operators and require something called "truth-values". So I think the only way to get the functionality I want is to just have a lengthy bitwise logical operator, something like:
df = df.loc[(df["Type"] == "value1") | (df["Type"] == "value2") | (df["Type"] == "value3")]
Is there no other way to check each value is in the allowed_values
list? This would make my code a lot neater (I have more than 3 values in the list, so this is a lengthy line).
Solution
Try this:
import pandas as pd
allowed_values = ['White', 'Green', 'Red']
df = pd.DataFrame({'color': ['White', 'Black', 'Green', 'White']})
df = df[df['color'].isin(allowed_values)]
df
color
0 White
2 Green
3 White
If you must use .loc
then you can use:
df = df.loc[df['color'].isin(allowed_values)]
Answered By - gtomer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.