Issue
I have a dataframe that looks like:
| Datetime | Rainfall | Flow |
| ----------------- | ------------ | --------- |
| 3/19/2018 12:05 | 1 | 5.85 |
| 3/19/2018 12:10 | 2 | 4.47 |
| 3/19/2018 12:15 | 0 | (BLANK) | |
| 3/19/2018 12:20 | 0 | 2.62 |
...
| 3/19/2018 13:00 | 1 | 5.85 |
...
It is time series data on a 5-minute interval for rainfall and flow and my objective is to convert this to hourly data. The data has blanks either in the flow or rainfall columns and if there is a blank in either of these columns, I want to delete all the rows for that hour of data (I only want to keep data that has a full hours worth of data) .
For example, in the table above I would delete all of the data for 12:00 - 12:55.
So far I have gotten to converting the data to hourly but realized I likely need to delete the hour-blank rows before resampling to 1H:
rain_hourly = rain.set_index('Date & Time').resample('1h').sum()
flow_hourly = flow.set_index('Date & Time').resample('1h').mean()
df_hourly = rain_hourly.merge(flow_hourly, how='left', on='Date & Time')
Any help is greatly appreciated!
Solution
you can do something like this :
import numpy as np
df_hourly['Flow']=df_hourly['Flow'].replace('', np.nan)
df_hourly=df_hourly.loc[df_hourly['Flow'].notna(),:].copy()
Using replace
should do the work I think , now you just need to pass a list instead of one feature
Answered By - DataSciRookie
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.