Issue
I have two dataframes. In DF_1 there is about 800 repeating key values. In DF_2 100 unique key values and 20 columns with data values.
I want to create third dataframe where each matching value splitted by number of matches.
for example
Table 1
id |
---|
A |
B |
A |
C |
A |
A |
C |
Table 2
id | Col1 | Col2 | Col3 |
---|---|---|---|
A | 400 | 100 | 20 |
B | 200 | 800 | |
C | 600 | 800 |
Output
id | Col1 | Col2 | Col3 |
---|---|---|---|
A | 100 | 25 | 5 |
B | 200 | 800 | |
A | 100 | 25 | 5 |
C | 300 | 400 | |
A | 100 | 25 | 5 |
A | 100 | 25 | 5 |
C | 300 | 400 |
How to achieve this?
Solution
You can merge
after standardizing df2
with the value_counts
of df1['id']
:
out = (df1.reset_index()
.merge(df2.set_index('id')
.div(df1['id'].value_counts(), axis=0),
on='id', how='left')
.set_index('index').reindex(df1.index)
)
Output:
id Col1 Col2 Col3
0 A 100.0 25.0 5.0
1 B 200.0 NaN 800.0
2 A 100.0 25.0 5.0
3 C 300.0 400.0 NaN
4 A 100.0 25.0 5.0
5 A 100.0 25.0 5.0
6 C 300.0 400.0 NaN
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.