Issue
I have a dataframe with some duplicate rows (by two columns, t1 and t2), but I only want to keep one row for each duplicate, the one with the lowest value, calculated from three other columns: n, m and c
import pandas as pd
df = pd.DataFrame({
"t1": [1, 1, 1, 1, 1, 1],
"t2": [1, 2, 2, 3, 4, 4],
"x": [1.01, 0.66, 1.01, 0.45, 0.89, 0.64],
"y": [0.23, 0.31, 0.06, 1.12, 0.70, 0.60],
"z": [0.06, 1.07, 0.12, 0.20, 0.62, 0.68],
"n": [6, 6, 7, 6, 7, 7],
"m": [0.21, 1.19, 0.81, 1.18, 0.28, 0.67],
"c": [64.4, 64.4, 63.2, 65.6, 63.2, 63.2]
})
The rows (indexes) 1 and 2 are duplicates, as well as rows 4 and 5, and when doing
w = (12/df['n'])*0.4 + (df['m']/0.35)*0.2 + (df['c']/150)*0.4
for each duplicate, I want to keep the row with the lowest w
(se result below).
I can drop the desired rows, with this code, which gives me this final df above.
# adding a column with temporary values
df['w'] = (12/df['n'])*0.4 + (df['m']/0.35)*0.2 + (df['c']/150)*0.4
# create a df with the duplicated rows
dfd = df[df.duplicated(['t1', 't2'], keep=False) == True]
# initializing a list with rows (indexes) to drop
rows_to_drop = []
# groupby returns a group (g) and df (dfg)
for g, dfg in df.groupby(['t1', 't2']):
# only groups with two or more rows
if len(dfg) > 1:
# get the index of the row with highest w, the one to drop
idx = dfg[dfg['w'] == dfg['w'].max()].index
rows_to_drop.append(idx[0])
# drop the rows
df = df.drop(index=rows_to_drop)
However, the code feels cumbersome. I'm for instance adding a temporary column, w, just to hold the value to compare with.
I would appreciate suggestions how to improve this.
Solution
You can use a groupby.idxmin
:
out = df.loc[w.groupby([df['t1'], df['t2']]).idxmin()]
Output:
t1 t2 x y z n m c
0 1 1 1.01 0.23 0.06 6 0.21 64.4
2 1 2 1.01 0.06 0.12 7 0.81 63.2
3 1 3 0.45 1.12 0.20 6 1.18 65.6
4 1 4 0.89 0.70 0.62 7 0.28 63.2
Or, if you also want to have w
in the output:
df['w'] = w
out = df.loc[df.groupby(['t1', 't2'])['w'].idxmin()]
Output:
t1 t2 x y z n m c w
0 1 1 1.01 0.23 0.06 6 0.21 64.4 1.091733
2 1 2 1.01 0.06 0.12 7 0.81 63.2 1.317105
3 1 3 0.45 1.12 0.20 6 1.18 65.6 1.649219
4 1 4 0.89 0.70 0.62 7 0.28 63.2 1.014248
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.