Wednesday, January 24, 2024

[FIXED] How to replace part of a Timestamp using .where()?

January 24, 2024 dataframe, pandas, python, series, timestamp No comments

Issue

I have a dataframe that contains a few timestamps. I am trying to find certain timestamps that don't meet a condition and compute their new timestamp value based on pieces from both another timestamp and the current timestamp being tested.

df = pd.DataFrame(data={'col1': [pd.Timestamp(2021, 1, 1, 12), pd.Timestamp(2021, 1, 2, 
                                 12), pd.Timestamp(2021, 1, 3, 12)], 
                        'col2': [pd.Timestamp(2021, 1, 4, 12), pd.Timestamp(2021, 1, 5, 
                                12), pd.Timestamp(2021, 1, 6, 12)]})
print(df)
#                 col1                col2
# 0 2021-01-01 12:00:00 2021-01-04 12:00:00
# 1 2021-01-02 12:00:00 2021-01-05 12:00:00
# 2 2021-01-03 12:00:00 2021-01-06 12:00:00

I'm trying to do something like this:

testDate = pd.Timestamp(2021, 1, 2, 16)
df['newCol'] = df['col1'].where(df['col1'].dt.date <= testDate.date(), pd.Timestamp(year=testDate.year, month=testDate.month, day=testDate.day, hour=df['col1'].dt.hour))

I get an error though about ambiguity:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

If I remove the last bit hour=df['col1'].dt.hour, the code will run, so I know it has to do with that, but I don't understand why it's complaining about truthiness since that little piece of the code isn't testing any conditions, it's just assigning. I thought it was because I'm trying to compute the new value using the values that are being iterated over, but if I try this process using integers instead of timestamps, it runs just fine:

df = pd.DataFrame(data={'col1': [1,2,3], 'col2': [4,5,6]})
print(df)
#   col1  col2
# 0     1     4
# 1     2     5
# 2     3     6

testInt = 2
df['newCol'] = df['col1'].where(df['col1'] < testInt, df['col1'] + 2)
print(df)
#   col1  col2  newCol
# 0     1     4       1
# 1     2     5       4
# 2     3     6       5

What is the proper way to do what I want to do?

Solution

You have to create the target series first:

target = (pd.Series(testDate.normalize(), index=df.index)
            + (df['col1'] - df['col1'].dt.normalize()))

df['newCol'] = df['col1'].where(df['col1'] <= testDate, target)

Output:

>>> df
                 col1                col2              newCol
0 2021-01-01 12:00:00 2021-01-04 12:00:00 2021-01-01 12:00:00
1 2021-01-02 12:00:00 2021-01-05 12:00:00 2021-01-02 12:00:00
2 2021-01-03 12:00:00 2021-01-06 12:00:00 2021-01-02 12:00:00

>>> target
0   2021-01-02 12:00:00
1   2021-01-02 12:00:00
2   2021-01-02 12:00:00
dtype: datetime64[ns]

If I remove the last bit hour=df['col1'].dt.hour, the code will run, so I know it has to do with that, but I don't understand why it's complaining about truthiness since that little piece of the code isn't testing any conditions

The problem is not the condition. pd.Timestamp take some parameters (year, month, day, ...). Each of these parameters takes a scalar value not a vector (df['col1'].dt.hour). That's why the function raises a ValueError exception.

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, January 24, 2024

[FIXED] How to replace part of a Timestamp using .where()?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels