Issue
Consider the following data frame:
import pandas as pd
import random
characteristics = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
N = int(1e5)
random_characteristics = [random.choice(characteristics) for i in range(N)]
df = pd.DataFrame(data={'character': random_characteristics})
df
--------------------------------------------------------------------
character
0 A
1 G
2 E
3 G
4 D
... ...
99995 E
99996 G
99997 A
99998 D
99999 B
100000 rows × 1 columns
--------------------------------------------------------------------
Now, my goal is to create a new column characteristics_shifted
that is shifted according to the list characteristics
, where the number of shifts can be specified by the user.
For instance, if you specify shift = 1
, then each character is shifted by one. If the character equals H
, then it cannot be shifted by one and therefore remains the same. If shift = 2
and the character equals B
, then I want to get D
. If, in turn, the character equals G
, I want to get H
. The same holds for negative shifts, but in the other direction.
Example:
character characteristics_shifted (shift=1) characteristics_shifted (shift=-2)
A B A
G H F
E F D
G H F
D E C
H H F
H H F
A B A
E F D
C D B
F G E
Note: My data frame contains around 21 mio. rows. It does not contain nan values.
Solution
You can craft a Series and use map
on the shifted Series:
c = pd.Series(characteristics, index=characteristics)
shifts = [1, -2]
for s in shifts:
df[f'shift={s}'] = df['character'].map(c.shift(-s).ffill().bfill())
print(df)
output:
character shift=1 shift=-2
0 G H E
1 G H E
2 A B A
3 E F C
4 H H F
... ... ... ...
99995 G H E
99996 E F C
99997 E F C
99998 C D A
99999 D E B
[100000 rows x 3 columns]
optimization
if there are many columns, this variant will be faster (thanks @MichaelSzczesny for pointing this out!):
cat = pd.Series(pd.Categorical(df['character']))
c = pd.Series(pd.Categorical(characteristics), index=characteristics)
shifts = [1, -2]
for s in shifts:
df[f'shift={s}'] = cat.map(c.shift(-s).ffill().bfill())
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.