Issue
I have a dataframe that has two date containing columns I'd like to perform the following operations on:
- Concatenate into a NEW column.
- Get the unique values (no redundant dates).
data = [
[
"2018-12-08",
"2018-12-09",
],
["2020-12-19", "2020-12-20"],
["2020-12-19", "2020-12-19"],
["2020-10-06", "2020-10-12"],
]
df = pd.DataFrame(data, columns=["date1", "date2"])
df[["date1", "date2"]].apply(lambda x: pd.to_datetime(x).dt.date)
The challenge I am facing is finding a clear + concise way to do this.
To create one column I am doing this:
df['date_range'] = df[['date1', 'date2']].astype(str).values.tolist()
but I am not sure how to get the unique values from each row?
My desired dataframe looks like this:
date1 date2 date_range
2018-12-08 2018-12-09 2018-12-08,2018-12-09
2018-12-19 2018-12-20 2018-12-19,2018-12-20
2018-12-19 2018-12-19 2018-12-19
2018-10-06 2018-10-12 2018-10-06,2018-10-12
Solution
You can use set()
to find the unique elements in each row, and a list comprehension to generate your desired result, joining each unique list together with a comma. Something like
df['date_range'] = [','.join(list(set(dates))) for dates in df[['date1', 'date2']].astype(str).values]
Also, you can drop the .tolist()
!
Answered By - Ethan Singer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.