Issue
got a problem with showing any matching value the same row as my lookup value, so i.e.
my first table:
some_primary_key |
---|
unique_value_1 |
unique_value_2 |
unique_value_3 |
other table:
some_primary_key | values |
---|---|
unique_value_1 | some_value_1 |
unique_value_1 | some_value_2 |
unique_value_2 | some_value_3 |
unique_value_2 | some_value_4 |
unique_value_3 | some_value_5 |
unique_value_3 | some_value_6 |
and finally I'd like to have this:
some_primary_key | values |
---|---|
unique_value_1 | some_value_1, some_value_2 |
unique_value_2 | some_value_3, some_value_4 |
unique_value_3 | some_value_5, some_value_6 |
Should I use list comprehension iterating through df items and create list of lists of matching values? Any ideas?
Answer:
It is my sample solution:
import pandas as pd
data = {'some_primary_key':['unique_value_1',
'unique_value_2',
'unique_value_3']*2,
'values':['some_value_1', 'some_value_3', 'some_value_5',
'some_value_2', 'some_value_4', 'some_value_6']
}
df = pd.DataFrame(data=data)
list_of_values = []
for item in df['some_primary_key']:
filtered_values = df[df['some_primary_key']==item]
list_of_values.append(','.join(x for x in filtered_values['values']))
df['values'] = list_of_values
df = df.drop_duplicates()
print(df)
Any other, neat solutions? :)
Solution
What you are looking for is groupby
. You can apply any custom function with transform
after grouping them over some_primary_key
.
You can try this:
concat_func = lambda x: ','.join(map(str, x.sort_values(ascending=True).unique()))
df['values'] = df.groupby(['some_primary_key'])['values'].transform(concat_func)
Then, df
will have concatenated values
for each some_primary_key
which makes duplicated values
. Therefore, just remove the duplicated rows:
df = df.drop_duplicates()
Output:
some_primary_key values
0 unique_value_1 some_value_1,some_value_2
1 unique_value_2 some_value_3,some_value_4
2 unique_value_3 some_value_5,some_value_6
PS:
In concat_func
, sort_values()
and unique()
methods are applied to have a nicer view and prevent occurrence of the same values
in a row. Otherwise, if df
is:
some_primary_key values
0 unique_value_1 some_value_1
1 unique_value_1 some_value_1
the output will be:
some_primary_key values
0 unique_value_1 some_value_1,some_value_1
If this is the desired output just use following concat_func
:
concat_func = lambda x: ','.join(map(str, x))
Answered By - Ersel Er
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.