Issue
I have a dict like below, it represents one horse race. There are many races in the dataset grouped by raceId:
data_orig = {
'meetingId': [178515] * 6,
'raceId': [879507] * 6,
'horseId': [90001, 90002, 90003, 90004, 90005, 90006],
'position': [1, 2, 3, 4, 5, 6],
'weight': [51, 52, 53, 54, 55, 56],
}
I want to add every row's horse specific data to every row. The result should look like this:
data_new = {
'meetingId': [178515] * 6,
'raceId': [879507] * 6,
'horseId_a':[90001, 90002, 90003, 90004, 90005, 90006],
'position_a':[1, 2, 3, 4, 5, 6],
'weight_a':[51, 52, 53, 54, 55, 56],
'horseId_b':[90002, 90003, 90004, 90005, 90006, 90001],
'position_b':[2, 3, 4, 5, 6, 1],
'weight_b':[52, 53, 54, 55, 56, 51],
'horseId_c':[90003, 90004, 90005, 90006, 90001, 90002],
'position_c':[3, 4, 5, 6, 1, 2],
'weight_c':[53, 54, 55, 56, 51, 52],
'horseId_d':[90004, 90005, 90006, 90001, 90002, 90003],
'position_d':[4, 5, 6, 1, 2, 3],
'weight_d':[54, 55, 56, 51, 52, 53],
'horseId_e':[90005, 90006, 90001, 90002, 90003, 90004],
'position_e':[5, 6, 1, 2, 3, 4],
'weight_e':[55, 56, 51, 52, 53, 54,],
'horseId_f':[90006, 90001, 90002, 90003, 90004, 90005],
'position_f':[6, 1, 2, 3, 4, 5],
'weight_f':[56, 51, 52, 53, 54, 55],
}
I have tried below, which transposes the dataframe.
data_orig_df = pd.DataFrame(data_orig)
new_df = pd.DataFrame()
for index, row_i in data_orig_df.iterrows():
horseId = row_i['horseId']
row_new = row_i.copy()
for index, row_j in race_df.iterrows():
if row_j['horseId']:
continue
row_new = pd.merge(row_new, row_j[getHorseSpecificCols()], suffixes=('', row_j['position']))
new_df = pd.concat([new_df, row_new], axis=1)
Thanks for your help.
Solution
You can use numpy to easily roll/index the values:
def roll(g):
a = g.to_numpy()
x = np.arange(len(a))
return pd.DataFrame(a[((x[:,None] + x)%len(a)).ravel()].reshape(len(a), -1),
index=g.index,
columns=[f'{c}_{i+1}' for i in x for c in g.columns])
cols = ['meetingId', 'raceId']
out = (data_orig_df.groupby(cols)
.apply(lambda g: roll(g.drop(columns=cols)))
.reset_index(cols)
)
Output:
meetingId raceId horseId_1 position_1 weight_1 horseId_2 position_2 weight_2 horseId_3 position_3 weight_3 horseId_4 position_4 weight_4 horseId_5 position_5 weight_5 horseId_6 position_6 weight_6
0 178515 879507 90001 1 51 90002 2 52 90003 3 53 90004 4 54 90005 5 55 90006 6 56
1 178515 879507 90002 2 52 90003 3 53 90004 4 54 90005 5 55 90006 6 56 90001 1 51
2 178515 879507 90003 3 53 90004 4 54 90005 5 55 90006 6 56 90001 1 51 90002 2 52
3 178515 879507 90004 4 54 90005 5 55 90006 6 56 90001 1 51 90002 2 52 90003 3 53
4 178515 879507 90005 5 55 90006 6 56 90001 1 51 90002 2 52 90003 3 53 90004 4 54
5 178515 879507 90006 6 56 90001 1 51 90002 2 52 90003 3 53 90004 4 54 90005 5 55
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.