Issue
I have the following DataFrame :
data: Dict[str, list[int]] = {
"x1": [5 , 6, 7, 8, 9],
"min1": [3 , 3, 3, 3, 3],
"max1": [8, 8, 8, 8, 8],
"x2": [0 , 1, 2, 3, 4],
"min2": [2 , 2, 2, 2, 2],
"max2": [7, 7, 7, 7, 7],
"x3": [7 , 6, 7, 6, 7],
"min3": [1 , 1, 1, 1, 1],
"max3": [6, 6, 6, 6, 6],
}
n: int = 3 # number of xi
df: pd.DataFrame = pd.DataFrame(data=data)
print(df)
Output
x1 min1 max1 x2 min2 max2 x3 min3 max3
0 5 3 8 0 2 7 7 1 6
1 6 3 8 1 2 7 6 1 6
2 7 3 8 2 2 7 7 1 6
3 8 3 8 3 2 7 6 1 6
4 9 3 8 4 2 7 7 1 6
I would like to add a new column alert
to df
that contains the IDs i
where xi < mini or xi > maxi
.
Expected result
x1 min1 max1 x2 min2 max2 x3 min3 max3 alert
0 5 3 8 0 2 7 7 1 6 "2,3"
1 6 3 8 1 2 7 6 1 6 "2"
2 7 3 8 2 2 7 7 1 6 "3"
3 8 3 8 3 2 7 6 1 6 ""
4 9 3 8 4 2 7 7 1 6 "1,3"
I looked at this answer but could not understand how to apply it to my problem.
Below is my working implementation that I wish to improve.
def f(row: pd.Series) -> str:
alert: str = ""
for k in range(1, n+1):
if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
alert += f"{k}"
return ",".join(list(alert))
df["alert"] = df.apply(f, axis=1)
Solution
Actually given your output as strings, your approach isn't too bad. I would just suggest making alert
a list, not a string:
def f(row: pd.Series) -> str:
alert: list = []
for k in range(1, n+1):
if row[f"x{k}"] < row[f"min{k}"] or row[f"x{k}"] > row[f"max{k}"]:
alert.append(f"{k}")
return ",".join(alert)
In a bit fancy way, you can do:
xs = df.filter(regex='^x')
mins = df.filter(like='min').to_numpy()
maxes = df.filter(like='max').to_numpy()
mask = (xs < mins) | (xs > maxes)
df['alert'] = ( mask @ xs.columns.str.replace('x',',')).str.replace('^,','')
Answered By - Quang Hoang
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.