Saturday, December 9, 2023

[FIXED] How to return more than 1 of the same values in each row

December 09, 2023 dataframe, jupyter-notebook, list, python, spyder No comments

Issue

I have a data frame with the same values in some of the rows. I would like to return the row number and values that has more than 1 of the same value starting from Num1 to Num7 in df.
Also would like to return the row number and same value. Either or I'm okay with.

For example

import numpy as np
import pandas as pd

df = pd.DataFrame([[1,1,2,4,5,6,7,7],
            [2,5,6,7,22,23,34,48],
            [3,3,5,6,7,45,46,48],
            [4,6,7,14,29,32,6,29], # duplicate 6 and 29
            [5,6,7,13,23,33,35,7], # duplicate 7 but nothing else
            [6,1,6,7,8,9,10,8],
            [7,0,2,5,7,19,7,5]], # duplicate 7,5 
            columns = ['Row_Num', 'Num1','Num2','Num3','Num4','Num5','Num6','Num7'])

First result

df2 = pd.DataFrame([[4,6,7,14,29,32], # Return 6 and 29
                   [7,0,2,5,7,19]], # Return 7,5 
            columns = ['Row_Num', 'Num1','Num2','Num3','Num4','Num5'])

Second result

df3 = pd.DataFrame([[4,6,29], # Return 6 and 29
                   [7,7,5]], # Return 7,5 
            columns = ['Row_Num', 'Num1','Num2'])

Solution

IIUC you can use pd.duplicated with a some data manipulation to get your result:

df = df.set_index('Row_Num') # set index
df_duplicated = df.transform(lambda x: x.duplicated(), axis=1) # returns if rows have duplicates

# First result where 
res1 = df[df_duplicated.sum(axis=1) >= 2][~df_duplicated[df_duplicated.sum(axis=1) >= 2]].dropna(axis=1)

# second result
res2 = df[df_duplicated.sum(axis=1) >= 2][df_duplicated[df_duplicated.sum(axis=1) >= 2]].dropna(axis=1)

Output:

result1

         Num1  Num2  Num3  Num4  Num5
Row_Num                              
4           6     7    14    29    32
7           0     2     5     7    19

result2:

         Num6  Num7
Row_Num            
4           6    29
7           7     5

To exactly match your output just reset_index and rename column names for the second result.

Answered By - Suraj Shourie

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, December 9, 2023

[FIXED] How to return more than 1 of the same values in each row

Issue

For example

First result

Second result

Solution

0 comments:

Post a Comment

Popular Posts

Labels