Friday, August 26, 2022

[FIXED] New DataFrame column that contains IDs where value is outside bounds?

August 26, 2022 dataframe, pandas, python-3.x No comments

Issue

I have the following DataFrame :

data: Dict[str, list[int]] = {
    "x1": [5 , 6, 7, 8, 9],
    "min1": [3 , 3, 3, 3, 3],
    "max1": [8, 8, 8, 8, 8],
    "x2": [0 , 1, 2, 3, 4],
    "min2": [2 , 2, 2, 2, 2],
    "max2": [7, 7, 7, 7, 7],
    "x3": [7 , 6, 7, 6, 7],
    "min3": [1 , 1, 1, 1, 1],
    "max3": [6, 6, 6, 6, 6],
}
n: int = 3 # number of xi
df: pd.DataFrame = pd.DataFrame(data=data)
print(df)

Output

   x1  min1  max1  x2  min2  max2  x3  min3  max3
0   5     3     8   0     2     7   7     1     6
1   6     3     8   1     2     7   6     1     6
2   7     3     8   2     2     7   7     1     6
3   8     3     8   3     2     7   6     1     6
4   9     3     8   4     2     7   7     1     6

I would like to add a new column alert to df that contains the IDs i where xi < mini or xi > maxi.

Expected result

   x1  min1  max1  x2  min2  max2  x3  min3  max3   alert
0   5     3     8   0     2     7   7     1     6   "2,3"
1   6     3     8   1     2     7   6     1     6     "2"
2   7     3     8   2     2     7   7     1     6     "3"
3   8     3     8   3     2     7   6     1     6      ""
4   9     3     8   4     2     7   7     1     6   "1,3"

I looked at this answer but could not understand how to apply it to my problem.
Below is my working implementation that I wish to improve.

def f(row: pd.Series) -> str:
    alert: str = ""
    for k in range(1, n+1):
        if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
            alert += f"{k}"
    return ",".join(list(alert))

df["alert"] = df.apply(f, axis=1)

Solution

Actually given your output as strings, your approach isn't too bad. I would just suggest making alert a list, not a string:

def f(row: pd.Series) -> str:
    alert: list = []
    for k in range(1, n+1):
        if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
            alert.append(f"{k}")

    return ",".join(alert)

In a bit fancy way, you can do:

xs = df.filter(regex='^x')
mins = df.filter(like='min').to_numpy()
maxes = df.filter(like='max').to_numpy()

mask = (xs < mins) | (xs > maxes)
df['alert'] = ( mask @ xs.columns.str.replace('x',',')).str.replace('^,','')

Answered By - Quang Hoang

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, August 26, 2022

[FIXED] New DataFrame column that contains IDs where value is outside bounds?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels