Issue
I have a following dataframe where I have one another list of index position based on some condition so just want to create the new dataframe based on the index position and check some condition on that.
df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
re_ind = list(np.where(df['diff_in_min'] >= 300))
re_ind = [np.array([85, 512], dtype='int64')]
Just I want to create another dataframe based on re_ind position, ex:
first_df = df[0:85]
another_df = [85:512]
last_df = [512:]
and each dataframe I want to check one condition
count = 0
temp_df = df[:re_ind[0]]
if temp_df['diff_in_min'].sum() > 500:
count += 1
temp_df = df[re_ind[0]:re_ind[1]]
if temp_df['diff_in_min'].sum() > 500:
count += 1
if temp_df = df[re_ind[1]:]
if temp_df['diff_in_min'].sum() > 500:
count += 1
How can I do that using for loop with creating new data frame using existing dataframe?
Solution
From sample data for groups created by df['diff_in_min'] >= 300)
add cumulative sum, then aggregate sum
, compare for another condition and count True
s by sum
:
s = (df['diff_in_min'] >= 300).cumsum()
out = (df['diff_in_min'].groupby(s).sum() > 500).sum()
print (out)
2
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.