Issue
I'm using pd.cut
with the keyword argument duplicates='drop'
. However, this gives errors when you combine it with the keyword argument labels
.
The question is similar to this question, but that ignores the label part.
Does not work:
pd.cut(pd.Series([0, 1, 2, 3, 4, 5]), bins=[0, 1, 1, 2])
Works:
pd.cut(pd.Series([0, 1, 2, 3, 4, 5]), bins=[0, 1, 1, 2], duplicates='drop')
Does not work:
pd.cut(pd.Series([0, 1, 2, 3, 4, 5]), bins=[0, 1, 1, 2], duplicates='drop', labels=[0, 1, 1, 2])
Wouldn't we expect it to drop the label corresponding to the duplicate entry?
Solution
No, the cut
documentation is pretty clear, it only concerns the bins:
duplicates {default ‘raise’, ‘drop’}, optional
If bin edges are not unique, raise ValueError or drop non-uniques.
Also, in any case the labels must be one value less than the bins, so dropping the labels based on the bins would be ambiguous.
This works if you have the correct final number of labels:
pd.cut(pd.Series([0, 1, 2, 3, 4, 5]),
bins=[0, 1, 1, 2], labels=['a', 'b'],
duplicates='drop'
)
Or using a weird programmatic alternative:
bins = pd.Series([0, 1, 1, 2])
labels = pd.Series(['a', 'b', 'c'])
pd.cut(pd.Series([0, 1, 2, 3, 4, 5]),
bins=[0, 1, 1, 2],
labels=labels[~bins.duplicated()[:-1]],
duplicates='drop'
)
Output:
0 NaN
1 a
2 b
3 NaN
4 NaN
5 NaN
dtype: category
Categories (2, object): ['a' < 'b']
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.