Issue
I have a dataframe that has columns like these:
Date temp_data holiday
01.01.2000 10000 0
02.01.2000 0 1
03.01.2000 0 1
04.01.2000 0 1
05.01.2000 0 1
06.01.2000 23000 0
..
..
..
30.01.2000 200 0
31.01.2000 0 1
01.02.2000 0 1
02.02.2000 2500 0
holiday = 0 when there is data present - indicates a working day
holiday = 1 when there is no data present - indicated a non-working day
I am trying to extract two new columns pre_long_holiday and post_long_holiday
the dataframe should look like this
Date temp_data holiday pre_long_hol post_long_hol
01.01.2000 10000 0 1 0
02.01.2000 0 1 0 0
03.01.2000 0 1 0 0
04.01.2000 0 1 0 0
05.01.2000 0 1 0 0
06.01.2000 23000 0 0 1
07.01.2000 2000 0 1 0
08.01.2000 0 1 0 0
09.01.2000 0 1 0 0
10.01.2000 0 1 0 0
11.01.2000 1000 0 0 1
..
..
..
30.01.2000 200 0 0 0
31.01.2000 0 1 0 0
01.02.2000 0 1 0 0
02.02.2000 2500 0 0 0
Long_holiday = holidays >=3 consecutive days pre and post columns has 1 before and after the holiday period
Can anyone help me with this?
The data I have is a continuous time series.
Solution
If need set only one 1
before and after holiday use Series.rolling
with sum
and test shifted values:
N = 3
m = df['holiday'].eq(0)
s = df['holiday'].rolling(N).sum()
df['pre_long_hol'] = (s.shift(-N).ge(N) & m).astype(int)
df['post_long_hol'] = (s.shift().ge(N) & m).astype(int)
print (df)
Date temp_data holiday pre_long_hol post_long_hol
0 01.01.2000 10000 0 1 0
1 02.01.2000 0 1 0 0
2 03.01.2000 0 1 0 0
3 04.01.2000 0 1 0 0
4 05.01.2000 0 1 0 0
5 06.01.2000 23000 0 0 1
6 07.01.2000 2000 0 1 0
7 08.01.2000 0 1 0 0
8 09.01.2000 0 1 0 0
9 10.01.2000 0 1 0 0
10 11.01.2000 1000 0 0 1
11 30.01.2000 200 0 0 0
12 31.01.2000 0 1 0 0
13 01.02.2000 0 1 0 0
14 02.02.2000 2500 0 0 0
EDIT: For add lengts of consecutive 0,1
is used helper Series
created by comapre shifted values with cumulative sum and then Series.map
with Series.value_counts
, last set 0
in Series.mask
:
s = df['holiday'].ne(df['holiday'].shift()).cumsum()
count = s.map(s.value_counts())
df['non-working day'] = count.mask(df['holiday'].eq(0), 0)
df['working day'] = count.mask(df['holiday'].eq(1), 0)
print (df)
Date temp_data holiday pre_long_hol post_long_hol \
0 01.01.2000 10000 0 1 0
1 02.01.2000 0 1 0 0
2 03.01.2000 0 1 0 0
3 04.01.2000 0 1 0 0
4 05.01.2000 0 1 0 0
5 06.01.2000 23000 0 0 1
6 07.01.2000 2000 0 1 0
7 08.01.2000 0 1 0 0
8 09.01.2000 0 1 0 0
9 10.01.2000 0 1 0 0
10 11.01.2000 1000 0 0 1
11 30.01.2000 200 0 0 0
12 31.01.2000 0 1 0 0
13 01.02.2000 0 1 0 0
14 02.02.2000 2500 0 0 0
non-working day working day
0 0 1
1 4 0
2 4 0
3 4 0
4 4 0
5 0 2
6 0 2
7 3 0
8 3 0
9 3 0
10 0 2
11 0 2
12 2 0
13 2 0
14 0 1
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.