Issue
I have a dataframe that contains a few timestamps. I am trying to find certain timestamps that don't meet a condition and compute their new timestamp value based on pieces from both another timestamp and the current timestamp being tested.
df = pd.DataFrame(data={'col1': [pd.Timestamp(2021, 1, 1, 12), pd.Timestamp(2021, 1, 2,
12), pd.Timestamp(2021, 1, 3, 12)],
'col2': [pd.Timestamp(2021, 1, 4, 12), pd.Timestamp(2021, 1, 5,
12), pd.Timestamp(2021, 1, 6, 12)]})
print(df)
# col1 col2
# 0 2021-01-01 12:00:00 2021-01-04 12:00:00
# 1 2021-01-02 12:00:00 2021-01-05 12:00:00
# 2 2021-01-03 12:00:00 2021-01-06 12:00:00
I'm trying to do something like this:
testDate = pd.Timestamp(2021, 1, 2, 16)
df['newCol'] = df['col1'].where(df['col1'].dt.date <= testDate.date(), pd.Timestamp(year=testDate.year, month=testDate.month, day=testDate.day, hour=df['col1'].dt.hour))
I get an error though about ambiguity:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
If I remove the last bit hour=df['col1'].dt.hour
, the code will run, so I know it has to do with that, but I don't understand why it's complaining about truthiness since that little piece of the code isn't testing any conditions, it's just assigning. I thought it was because I'm trying to compute the new value using the values that are being iterated over, but if I try this process using integers instead of timestamps, it runs just fine:
df = pd.DataFrame(data={'col1': [1,2,3], 'col2': [4,5,6]})
print(df)
# col1 col2
# 0 1 4
# 1 2 5
# 2 3 6
testInt = 2
df['newCol'] = df['col1'].where(df['col1'] < testInt, df['col1'] + 2)
print(df)
# col1 col2 newCol
# 0 1 4 1
# 1 2 5 4
# 2 3 6 5
What is the proper way to do what I want to do?
Solution
You have to create the target series first:
target = (pd.Series(testDate.normalize(), index=df.index)
+ (df['col1'] - df['col1'].dt.normalize()))
df['newCol'] = df['col1'].where(df['col1'] <= testDate, target)
Output:
>>> df
col1 col2 newCol
0 2021-01-01 12:00:00 2021-01-04 12:00:00 2021-01-01 12:00:00
1 2021-01-02 12:00:00 2021-01-05 12:00:00 2021-01-02 12:00:00
2 2021-01-03 12:00:00 2021-01-06 12:00:00 2021-01-02 12:00:00
>>> target
0 2021-01-02 12:00:00
1 2021-01-02 12:00:00
2 2021-01-02 12:00:00
dtype: datetime64[ns]
If I remove the last bit hour=df['col1'].dt.hour, the code will run, so I know it has to do with that, but I don't understand why it's complaining about truthiness since that little piece of the code isn't testing any conditions
The problem is not the condition. pd.Timestamp
take some parameters (year, month, day, ...). Each of these parameters takes a scalar value not a vector (df['col1'].dt.hour
). That's why the function raises a ValueError
exception.
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.