Issue
[some example data at the end] I've just started working with PyArrow, so forgive me if I'm missing something obvious here.
I have a project that I'm updating to (hopefully) better handle calculations on money. Mostly, these calculations are multiplying a normal money amount by a percentage, like 9.94 * 0.04
, things like that.
I had been using pandas v1.4.x and just had all the money as floats and was not consistent with rounding, which caused headaches. In the example above, I would want 9.94 * 0.04 = 0.40
, using normal rounding to two digits.
I was going to start forcing decimal.Decimal
objects in everywhere instead of floats, when I saw that pyarrow has a builtin decimal128
datatype that should work much better with pandas.
So, not I'm getting a lot of the following exception:
pyarrow.lib.ArrowInvalid: Rescaling Decimal128 value would cause data loss
I'm also getting changes to precision that, while not raising exceptions, I don't think I want.
For example, I have a pandas dataframe with a column called 'Pay Rate' with a dtype of pa.decimal128(12,2)
. When I do df['Pay Rate'] * decimal.Decimal('0.04')
, the result is of type pa.decimal128(15,4)
. I'm assuming it is merging together the precisions of the two things being multiplied in a way that is reasonable but that I don't want. (Note: If i just do df['Pay Rate'] * 0.04
, the result is a double[pyarrow]
type.)
I want the end of my transformations here to result in columns that are type decimal128(12,2)
, and so I'm also then trying df['my_col'] = df['my_col'].astype(pd.ArrowDtype(pa.decimal128(12,2))
, and that is then sometimes giving me the error above about data loss.
It makes sense to me that there is data loss because I am indeed telling it to just drop off some decimal points, but really what I want is it to round and then, yea, drop them.
Is there some switch of function to handle this that I'm missing?
some example data
import pandas as pd
import pyarrow as pa
from decimal import Decimal
data = {'col1': {0: Decimal('39.60'), 1: Decimal('39.60'), 2: Decimal('21.60'), 3: Decimal('7.20'), 4: Decimal('18.00'), 5: Decimal('18.00'), 6: Decimal('72.00'), 7: Decimal('30.60'), 8: Decimal('36.00'), 9: Decimal('41.40')}, 'col2': {0: Decimal('0.98'), 1: Decimal('1.00'), 2: Decimal('0.97'), 3: Decimal('0.46'), 4: Decimal('0.52'), 5: Decimal('1.00'), 6: Decimal('1.00'), 7: Decimal('1.00'), 8: Decimal('1.00'), 9: Decimal('1.00')}}
df = pd.DataFrame(data,dtype=pd.ArrowDtype(pa.decimal128(12, 2)))
df['col3'] = df['col1'] * df['col2']
#df['col3'] has a dtype of decimal128(25,4)
df['col3'].astype(pd.ArrowDtype(pa.decimal128(12, 2)))
#raises exception
Solution
You can call round before casting:
df['col3'].round(2).astype(pd.ArrowDtype(pa.decimal128(12, 2)))
Answered By - 0x26res
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.