Issue
I have a dataframe with duplicates in each row. I would like to return the non duplicates with a new column name like Num1 thru Num5. For instance,
import numpy as np
import pandas as pd
df1 = pd.DataFrame([[1,1,2,4,5,6,7,7],
[2,5,6,7,22,23,34,7],
[3,3,5,6,7,45,46,7],
[4,6,7,14,29,32,33,7],
[5,6,7,13,23,33,35,7],
[6,1,6,7,8,9,10,7],
[7,0,2,5,7,10,30,7]],
columns = ['Row_Num', 'Num1','Num2','Num3','Num4','Num5','Num6','Num7'])
I would like my results to be like this.
result = pd.DataFrame([[1,1,2,4,5,6],
[2,5,6,22,23,34],
[3,3,5,6,45,46],
[4,6,14,29,32,33],
[5,6,13,23,33,35],
[6,1,6,8,9,10],
[7,0,2,5,10,30]],
columns = ['Row_Num', 'Num1','Num2','Num3','Num4','Num5'])
Solution
To avoid transposing the dataframe you can use iterrows
instead of items
as in my other answer, and using drop_duplicates
as @Timeless proposed this is a bit faster than the other answers currently:
remaining_values = {row_num: values.drop_duplicates(keep=False).values
for row_num, values in df1.set_index('Row_Num').iterrows()}
result = pd.DataFrame(remaining_values).T.rename(columns=lambda i: f"Num{i+1}")
print(result)
Output:
Num1 Num2 Num3 Num4 Num5
1 1 2 4 5 6
2 5 6 22 23 34
3 3 5 6 45 46
4 6 14 29 32 33
5 6 13 23 33 35
6 1 6 8 9 10
7 0 2 5 10 30
The last step will only work if there are an equal number of duplicated values in each row, as others have pointed out.
Answered By - Bill
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.