Issue
I have a sample dataframe of company A's electronic consumption shown below
year-month | company | GWh |
---|---|---|
2017-01 | A | 100 |
2018-02 | A | 110 |
2019-01 | A | 90 |
2019-02 | A | 105 |
2020-01 | A | 117 |
2020-02 | A | 120 |
i would like to remove data of year 2020 and split the remaining dataframe into two sets:
- Train dataset contains records before year 2019
- Test dataset contains only 2019 records
Solution
Coerce them into datetime and select as required. This is most preferable if you'll need to use them for time analysis
df['year-month']=pd.to_datetime(df['year-month']).dt.strftime('%Y-%m')
df1=df[df['year-month'].lt('2019')]
df2=df[df['year-month'].eq('2019')]
Following your comment, I believe thats a bug. I would edit it to
df['year-month']=pd.to_datetime(df['year-month'])
df1=df[df['year-month'].dt.strftime('%Y').lt('2019')]
df2=df[df['year-month'].dt.strftime('%Y').eq('2019')]
Answered By - wwnde
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.