Issue
I have a dataframe where one of the columns is a numpy array:
DF
Name Vec
0 Abenakiite-(Ce) [0.0, 0.0, 0.0, 0.0, 0.0, 0.043, 0.0, 0.478, 0...
1 Abernathyite [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2 Abhurite [0.176, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.235, 0...
3 Abswurmbachite [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0,...
When I check the data type of each element, the correct data type is returned.
type(DF['Vec'].iloc[1])
numpy.ndarray
I save this into a csv file:
DF.to_csv('.\\file.csv',sep='\t')
Now, when I read the file again,
new_DF=pd.read_csv('.\\file.csv',sep='\t')
and check the datatype of Vec at index 1:
type(new_DF['Vec'].iloc[1])
str
The size of the numpy array is 1x127.
The data type has changed from a numpy array to a string. I can also see some new line elements in the individual vectors. I think this might be due to some problem when the vector is written into a csv but I don't know how to fix it. Can someone please help?
Thanks!
Solution
In the comments I made a mistake and said dtype
instead of converters
. What you want is to convert them as you read them using a function. With some dummy variables:
df=pd.DataFrame({'name':['name1','name2'],'Vec':[np.array([1,2]),np.array([3,4])]})
df.to_csv('tmp.csv')
def converter(instr):
return np.fromstring(instr[1:-1],sep=' ')
df1=pd.read_csv('tmp.csv',converters={'Vec':converter})
df1.iloc[0,2]
array([1., 2.])
Answered By - anishtain4
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.