Issue
I need to load from text files rows that contain string representations of 2D arrays, for later use in training a Tensorflow CNN, but I cannot get the strings converted into a format Tensorflow likes. I have tried all sorts of combinations of apply/map/various functions, but always get some cryptic error. Below is a toy example code that is close to working, but still throws an error:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray)
import tensorflow as tf
import numpy as np
import pandas as pd
from ast import literal_eval
def df_to_dataset(dataframe):
Y = tf.convert_to_tensor( dataframe['Y'].values )
X = tf.convert_to_tensor(
dataframe['X'].apply(literal_eval).apply(np.array).values
)
return tf.data.Dataset.from_tensor_slices( ( X , Y )
)
data = [[ 1, "[[0,1],[0,1]]" ] , [ 0 , "[[1,0],[1,0]]" ]]
df = pd.DataFrame(data, columns=['Y','X'])
dataset = df_to_dataset(df)
for feature in dataset.take(1):
print( feature )
Solution
So your dataframe displays as:
In [161]: df
Out[161]:
Y X
0 1 [[0,1],[0,1]]
1 0 [[1,0],[1,0]]
Though that doesn't show the string quotes.
In [162]: df['Y'].values
Out[162]: array([1, 0])
THe X
column is a 1d array of strings, object dtype:
In [163]: df['X'].values
Out[163]: array(['[[0,1],[0,1]]', '[[1,0],[1,0]]'], dtype=object)
With the eval, values
is now a array of lists:
In [164]: from ast import literal_eval
In [165]: df['X'].apply(literal_eval)
Out[165]:
0 [[0, 1], [0, 1]]
1 [[1, 0], [1, 0]]
Name: X, dtype: object
In [166]: df['X'].apply(literal_eval).values
Out[166]: array([list([[0, 1], [0, 1]]), list([[1, 0], [1, 0]])], dtype=object)
But if instead we extract it as a list:
In [168]: df['X'].apply(literal_eval).to_list()
Out[168]: [[[0, 1], [0, 1]], [[1, 0], [1, 0]]]
We can easily turn that into an array:
In [169]: np.array(_)
Out[169]:
array([[[0, 1],
[0, 1]],
[[1, 0],
[1, 0]]])
Back to the array form, we can "reduce" that using stack
In [170]: np.stack(df['X'].apply(literal_eval).values)
Out[170]:
array([[[0, 1],
[0, 1]],
[[1, 0],
[1, 0]]])
stack
is like concatenate
or vstack
except it adds a dimension, acting more like np.array
.
Now the tensorflow conversion should work.
Your second apply
, only changes the array of lists into an array of arrays.
In [174]: df['X'].apply(literal_eval).apply(np.array).values
Out[174]:
array([array([[0, 1],
[0, 1]]), array([[1, 0],
[1, 0]])], dtype=object)
np.stack
works on that as well.
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.