Issue
I want to drop rows that are before May 1 2018.
First, I changed the source date format as
df['DATE'] = pd.to_datetime(df['DATE'],format='%d/%m/%Y')
How can I specific the cut off date is 2018-04-30?
Third step, I need couple more variables and finally remove them from the raw data
df2 = df[(df['REFERENCE'].str.contains("ABC") & (df['COUNTRY'] == 'countryname')
&
<<how can i mention the cut off date and the actual date column that has all dates?) == False]
ANSWER - Thanks for all the comments.
raw_df.drop(raw_df[raw_df['REFERENCE'].str.contains("prefix") & (raw_df['COUNTRY'] == 'countryname') & (raw_df['DATE'].le('2018-05-01'))].index, inplace = True)
or use ge('2018-05-01') if you want greater than
Solution
You can use:
df2 = df[df['DATE'].ge('2018-05-01')]
Example input:
DATE
0 2018-05-02
1 2018-05-01
2 2018-04-30
Output:
DATE
0 2018-05-02
1 2018-05-01
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.