Monday, October 4, 2021

[FIXED] How to show multiple matches in the same row separated by comma in Pandas Dataframe?

October 04, 2021 dataframe, numpy, pandas, python, python-3.x No comments

Issue

got a problem with showing any matching value the same row as my lookup value, so i.e.

my first table:

some_primary_key
unique_value_1
unique_value_2
unique_value_3

other table:

some_primary_key	values
unique_value_1	some_value_1
unique_value_1	some_value_2
unique_value_2	some_value_3
unique_value_2	some_value_4
unique_value_3	some_value_5
unique_value_3	some_value_6

and finally I'd like to have this:

some_primary_key	values
unique_value_1	some_value_1, some_value_2
unique_value_2	some_value_3, some_value_4
unique_value_3	some_value_5, some_value_6

Should I use list comprehension iterating through df items and create list of lists of matching values? Any ideas?

Answer:

It is my sample solution:

import pandas as pd

data = {'some_primary_key':['unique_value_1',
                            'unique_value_2',
                            'unique_value_3']*2,
        'values':['some_value_1', 'some_value_3', 'some_value_5',
                  'some_value_2', 'some_value_4', 'some_value_6']
                      }
                                                                          
df = pd.DataFrame(data=data)


list_of_values = []
for item in df['some_primary_key']:
    filtered_values = df[df['some_primary_key']==item]
    list_of_values.append(','.join(x for x in filtered_values['values']))

df['values'] = list_of_values
df = df.drop_duplicates()
print(df)

Any other, neat solutions? :)

Solution

What you are looking for is groupby. You can apply any custom function with transform after grouping them over some_primary_key.

You can try this:

concat_func  = lambda x: ','.join(map(str, x.sort_values(ascending=True).unique()))
df['values'] = df.groupby(['some_primary_key'])['values'].transform(concat_func)

Then, df will have concatenated values for each some_primary_key which makes duplicated values. Therefore, just remove the duplicated rows:

df = df.drop_duplicates()

Output:

  some_primary_key                     values
0   unique_value_1  some_value_1,some_value_2
1   unique_value_2  some_value_3,some_value_4
2   unique_value_3  some_value_5,some_value_6

PS:

In concat_func, sort_values() and unique() methods are applied to have a nicer view and prevent occurrence of the same values in a row. Otherwise, if df is:

  some_primary_key        values
0   unique_value_1  some_value_1
1   unique_value_1  some_value_1

the output will be:

  some_primary_key                     values
0   unique_value_1  some_value_1,some_value_1

If this is the desired output just use following concat_func:

concat_func = lambda x: ','.join(map(str, x))

Answered By - Ersel Er

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, October 4, 2021

[FIXED] How to show multiple matches in the same row separated by comma in Pandas Dataframe?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels