Issue
The data is basically in CSV format, which is a fasta/genome sequence, basically the whole sequence is a string. To pass this data into a CNN model I convert the data into numeric. The genome/fasta sequence, which I want to change into tensor acceptable format so I convert this string into float e.g., "AACTG,...,AAC.." to [[0.25,0.25,0.50,1.00,0.75],....,[0.25,0.25,0.50.....]]. But the conversion data shows like this (see #data show 2). But, when I run tf.convert_to_tensor(train_data) it gives me an error of Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray). But in order to pass the data into CNN model, it has to be a tensor, but I don't know why it gives an error! What will be the solution to it?
# data show 2
array([array([0.25, 0.5 , 0.5 , ..., 0.75, 0.25, 0.25]),
array([0.25, 0.75, 0.25, ..., 1. , 0.5 , 0.5 ]), ...,
array([0.25, 1. , 1. , ..., 0.25, 0.25, 0.25])], dtype=object)
# end of data show
DataFrame of my genome/fasta sequence look like this
Underneath is the function used for encoding the data.
def string_to_array(my_string):
my_string = my_string.lower()
my_string = re.sub('[^acgt]', 'z', my_string)
my_array = list(my_string)
return my_array
# create a label encoder with 'acgtz' alphabet
label_encoder = LabelEncoder()
label_encoder.fit(['a','c','g','t','z'])
def ordinal_encoder(my_array):
integer_encoded = label_encoder.transform(my_array)
float_encoded = integer_encoded.astype(float)
float_encoded[float_encoded == 0] = 0.25 # A
float_encoded[float_encoded == 1] = 0.50 # C
float_encoded[float_encoded == 2] = 0.75 # G
float_encoded[float_encoded == 3] = 1.00 # T
float_encoded[float_encoded == 4] = 0.00 # anything else, z
return float_encoded
def conversion(tdf):
data = []
for i in tdf.index:
val = tdf['seq'].iloc[i]
val = ordinal_encoder(string_to_array(val))
data.append(val)
return data
train_data = conversion(df) # calling the function
train_data = np.asarray(train_data)
Solution
The problem is probably in your numpy array dtype.
Using array with dtype float32
should fix problem: tf.convert_to_tensor(train_data.astype(np.float32))
Answered By - kacpo1
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.