Tuesday, November 21, 2023

[FIXED] Is there anyway to merge two datasets that with two similar key column but not the same key column?

November 21, 2023 pandas, python, r, tidyverse No comments

Issue

I have two datasets as follows:

have lunch	have dinner
set a	set 1
set b	set 2
...	...

variable	variable_value	value
have lunch	set a	0.2
have lunch	set b	0.5
have dinner	set 1	1.5
have dinner	set 2	1.7
...	...	...

My expected outcome:

have lunch	have lunch value	have dinner	have dinner value
set a	0.2	set 1	1.5
set b	0.5	set 2	1.7
...	...	...	...

Is there anyway to merge these two datasets in python/r?

I can think of one solution in R:

library(tidyverse)
library(magrittr)
b %<>% pivot_wider(names_from = variable, values_from = value)

left_join(a,b)

Solution

You can melt, merge, then pivot:

out = (a
  .melt(ignore_index=False, value_name='variable_value')
  .reset_index()
  .merge(b, how='left')
  .pivot(index='index', columns='variable')
)

Output:

         variable_value                  value           
variable    have dinner have lunch have dinner have lunch
index                                                    
0                 set 1      set a         1.5        0.2
1                 set 2      set b         1.7        0.5

For a formatting closer to what you have, further reprocess the column names:

out = (a.melt(ignore_index=False, value_name='variable_value')
  .reset_index()
  .merge(b, how='left')
  .pivot(index='index', columns='variable')
  .sort_index(level=1, axis=1)
  .rename_axis(index=None)
)

out.columns = out.columns.map(lambda x: f'{x[1]}{" value" if x[0] == "variable_value" else ""}')

Output:

   have dinner have dinner value  have lunch have lunch value
0          1.5             set 1         0.2            set a
1          1.7             set 2         0.5            set b

alternative

Or, using a loop and concat:

out = pd.concat([a[[col]].merge(b.loc[b['variable'].eq(col),
                                      ['variable_value', 'value']]
                                 .rename(columns={'variable_value': col,
                                                  'value': f'{col} value'
                                                 }),
                               how='left'
                               )
                 for col in a], axis=1)

Output:

  have lunch  have lunch value have dinner  have dinner value
0      set a               0.2       set 1                1.5
1      set b               0.5       set 2                1.7

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 21, 2023

[FIXED] Is there anyway to merge two datasets that with two similar key column but not the same key column?

Issue

Solution

alternative

0 comments:

Post a Comment

Popular Posts

Labels