Tuesday, January 9, 2024

[FIXED] check value in column is list type

January 09, 2024 dataframe, pandas, python No comments

Issue

I have a dataframe with 2 columns field and value. I am performing some checks on each field's value. For field a I need to check its corresponding value is always list type and storing its result in status column

Below is code:

import pandas as pd
from pandas.api.types import is_list_like

data = {
    "field": ["a", "b", "c"],
    "value": [[1, "na", -99], 20, 80],
}

df = pd.DataFrame(data)

print("Initial DF")
print(f"{df=}")

condlist = [df["field"] == "a", df["field"] == "b", df["field"] == "c"]

choicelist = [
    df["value"].apply(is_list_like).any(),
    df["value"].isin([10, 20, 30, 40]),
    df["value"].between(50, 100),
]

df["status"] = np.select(condlist, choicelist, False)

print("After check DF")
print(f"{df=}")

But getting error as

 df["value"].between(50, 100),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/ops.pyx", line 107, in pandas._libs.ops.scalar_compare
TypeError: '>=' not supported between instances of 'list' and 'int'

What I am missing?

Solution

You can transform all non-numeric values in df["value"] to None by trying to apply pd.to_numeric to the column. That way between won't throw an error and status will be updated correctly.

choicelist = [
    df["value"].apply(is_list_like).any(),
    df["value"].isin([10, 20, 30, 40]),
    pd.to_numeric(df["value"], errors='coerce').between(50, 100),
]

However, it may be worth considering rethinking the data structure, if possible - and store only one data type per column.

Answered By - Maria K

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 9, 2024

[FIXED] check value in column is list type

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels