Issue
I have to timeseries dataframes:
from pandas import Timestamp
df1 = pd.DataFrame(index=[Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:02:00')], columns = ['Peter', 'Adam'], data =[['A', 'C'], ['C', 'D']])
df2 = pd.DataFrame(index=[Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:01:00'), Timestamp('2023-12-22 00:02:00'), Timestamp('2023-12-22 00:02:00'), Timestamp('2023-12-22 00:02:00'), Timestamp('2023-12-22 00:02:00')], columns=['Level', 'Score'], data=[['A', 1], ['B', 2], ['C', 3], ['D', 4], ['A', 5], ['B', 6], ['C', 7], ['D', 8]])
>>> df1
Peter Adam
2023-12-22 00:01:00 A C
2023-12-22 00:01:00 C D
>>> df2
Level Score
2023-12-22 00:01:00 A 1
2023-12-22 00:01:00 B 2
2023-12-22 00:01:00 C 3
2023-12-22 00:01:00 D 4
2023-12-22 00:02:00 A 5
2023-12-22 00:02:00 B 6
2023-12-22 00:02:00 C 7
2023-12-22 00:02:00 D 8
I wish to get
Peter Adam
2023-12-22 00:01:00 1 3
2023-12-22 00:01:00 7 8
I can do using for loop, either go through by columns of df1 with apply or something
I feel there is a transform I could do to achieve the values very fast.
Anyone has any idea?
Solution
A possible option using Indexing-lookup :
def lookup(df, name):
idx, cols = pd.factorize(df[name])
return df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
inter = df1.join(df2.pivot(columns="Level", values="Score"))
out = (inter.assign(**{n: inter.pipe(lookup, n)
for n in df1.columns})[df1.columns])
NB : Your MRE seems broken. The second index of df1
should be 2023-12-22 00:02:00
.
Output :
print(out)
Peter Adam
2023-12-22 00:01:00 1 3
2023-12-22 00:02:00 7 8
[2 rows x 2 columns]
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.