Issue
I have a dataframe with 2 columns field
and value
. I am performing some checks on each field's value. For field a
I need to check its corresponding value is always list
type and storing its result in status
column
Below is code:
import pandas as pd
from pandas.api.types import is_list_like
data = {
"field": ["a", "b", "c"],
"value": [[1, "na", -99], 20, 80],
}
df = pd.DataFrame(data)
print("Initial DF")
print(f"{df=}")
condlist = [df["field"] == "a", df["field"] == "b", df["field"] == "c"]
choicelist = [
df["value"].apply(is_list_like).any(),
df["value"].isin([10, 20, 30, 40]),
df["value"].between(50, 100),
]
df["status"] = np.select(condlist, choicelist, False)
print("After check DF")
print(f"{df=}")
But getting error as
df["value"].between(50, 100),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/ops.pyx", line 107, in pandas._libs.ops.scalar_compare
TypeError: '>=' not supported between instances of 'list' and 'int'
What I am missing?
Solution
You can transform all non-numeric values in df["value"]
to None
by trying to apply pd.to_numeric
to the column. That way between
won't throw an error and status will be updated correctly.
choicelist = [
df["value"].apply(is_list_like).any(),
df["value"].isin([10, 20, 30, 40]),
pd.to_numeric(df["value"], errors='coerce').between(50, 100),
]
However, it may be worth considering rethinking the data structure, if possible - and store only one data type per column.
Answered By - Maria K
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.