Issue
I have a DataFrame in Python that has a "date" column (and several other columns).
What I want to do is to split off two DFs based on an arbitrary date in the "date" column. Let this split point be arbitrary_date
.
- All rows with a date before
arbitrary_date
- Any rows with the one next date in the series (which may be
arbitrary_date
or another date up to a week in the future)
To be clear -
- I don't know what the arbitrary date will be until runtime
- I cannot split by index because I have no guarantee than any specific indices go into either DF
- I don't know how many days will elapse between the arbitrary date and the next date
- I don't know how many of the next date (after
arbitrary_date
) will be in the DF
Here's an example to give a bit of realism to the request.
Start with this DF.
date data
20230101 e
20230108 E
20230115 S
20230117 L
20230117 f
20230118 u
20230118 B
20230121 r
20230127 n
20230128 o
If my arbitrary date is 2023-01-17 then I want the two DFs to be
20230101 e
20230108 E
20230115 S
and
20230117 L
20230117 f
But if my arbitrary date is 2023-01-22 then my DFs should be
20230101 e
20230108 E
20230115 S
20230117 L
20230117 f
20230118 u
20230118 B
20230121 r
and
20230127 n
How can I accomplish this dynamic split?
Solution
Try:
def split_df(df, date):
split_date = pd.to_datetime(date)
next_date = df["date"].between(split_date, split_date + pd.Timedelta(days=7))
df1 = df[df["date"] < split_date]
try:
df2 = df[df["date"].eq(df.loc[next_date, "date"].iat[0])]
except IndexError:
df2 = None
return df1, df2
df["date"] = pd.to_datetime(df["date"], format="%Y%M%d")
print(*split_df(df, "2023-01-17"), sep="\n\n")
print()
print(*split_df(df, "2023-01-22"), sep="\n\n")
print()
print(*split_df(df, "2023-02-22"), sep="\n\n")
Prints:
date data
0 2023-01-01 00:01:00 e
1 2023-01-08 00:01:00 E
2 2023-01-15 00:01:00 S
date data
3 2023-01-17 00:01:00 L
4 2023-01-17 00:01:00 f
date data
0 2023-01-01 00:01:00 e
1 2023-01-08 00:01:00 E
2 2023-01-15 00:01:00 S
3 2023-01-17 00:01:00 L
4 2023-01-17 00:01:00 f
5 2023-01-18 00:01:00 u
6 2023-01-18 00:01:00 B
7 2023-01-21 00:01:00 r
date data
8 2023-01-27 00:01:00 n
date data
0 2023-01-01 00:01:00 e
1 2023-01-08 00:01:00 E
2 2023-01-15 00:01:00 S
3 2023-01-17 00:01:00 L
4 2023-01-17 00:01:00 f
5 2023-01-18 00:01:00 u
6 2023-01-18 00:01:00 B
7 2023-01-21 00:01:00 r
8 2023-01-27 00:01:00 n
9 2023-01-28 00:01:00 o
None
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.