Issue
I want to create a Series object and specify dtype=np.str_
but looks the type is ignored by Pandas.
I tried to apply the method astype but I have the same result:
import pandas as pd
import numpy as np
s1 = pd.Series(["t1", "t2"], dtype=np.str_)
print(type(s1[0])) # <class 'str'>
print(type(s1.astype(np.str_)[0])) # <class 'str'>
If I replace it with dtype=np.bytes_
or create the object using the code pd.Series([np.str_("t1"), np.str_("t2")])
, it works as expected:
s2 = pd.Series(["t1", "t2"], dtype=np.bytes_)
s3 = pd.Series([np.str_("t1"), np.str_("t2")])
print(type(s2[0])) # <class 'numpy.bytes_'>
print(type(s3[0])) # <class 'numpy.str_'>
Solution
The most straightforward answer to your question is that pandas
only supports the following text types:
object
pandas.StringDtype()
This is explicitly stated in the user guide here
This is why if you pass str
, it defaults to object
:
>>> pd.Series([1, "foo"], dtype=str)
0 1
1 foo
dtype: object
Indeed, if you look in the guts of the pandas wrappers for astype
, you see this in the source code:
# in pandas we don't store numpy str dtypes, so convert to object
if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):
values = np.array(values, dtype=object)
Answered By - juanpa.arrivillaga
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.