Thursday, October 28, 2021

[FIXED] Failed to convert a NumPy array ((the whole sequence is a string)) to a Tensor, in genome sequence classification for CNN?

October 28, 2021 conv-neural-network, deep-learning, numpy, python, tensorflow No comments

Issue

The data is basically in CSV format, which is a fasta/genome sequence, basically the whole sequence is a string. To pass this data into a CNN model I convert the data into numeric. The genome/fasta sequence, which I want to change into tensor acceptable format so I convert this string into float e.g., "AACTG,...,AAC.." to [[0.25,0.25,0.50,1.00,0.75],....,[0.25,0.25,0.50.....]]. But the conversion data shows like this (see #data show 2). But, when I run tf.convert_to_tensor(train_data) it gives me an error of Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray). But in order to pass the data into CNN model, it has to be a tensor, but I don't know why it gives an error! What will be the solution to it?

# data show 2
array([array([0.25, 0.5 , 0.5 , ..., 0.75, 0.25, 0.25]),
array([0.25, 0.75, 0.25, ..., 1.  , 0.5 , 0.5 ]), ...,
array([0.25, 1.  , 1.  , ..., 0.25, 0.25, 0.25])], dtype=object)
# end of data show

DataFrame of my genome/fasta sequence look like this

Underneath is the function used for encoding the data.

def string_to_array(my_string):
    my_string = my_string.lower()
    my_string = re.sub('[^acgt]', 'z', my_string)
    my_array = list(my_string)
    return my_array

# create a label encoder with 'acgtz' alphabet
label_encoder = LabelEncoder()
label_encoder.fit(['a','c','g','t','z'])

def ordinal_encoder(my_array):
    integer_encoded = label_encoder.transform(my_array)
    float_encoded = integer_encoded.astype(float)
    float_encoded[float_encoded == 0] = 0.25 # A
    float_encoded[float_encoded == 1] = 0.50 # C
    float_encoded[float_encoded == 2] = 0.75 # G
    float_encoded[float_encoded == 3] = 1.00 # T
    float_encoded[float_encoded == 4] = 0.00 # anything else, z
    return float_encoded
def conversion(tdf):
    data = []
    for i in tdf.index:
        val = tdf['seq'].iloc[i]
        val = ordinal_encoder(string_to_array(val))
        data.append(val)
    return data

train_data = conversion(df) # calling the function
train_data = np.asarray(train_data)

Solution

The problem is probably in your numpy array dtype.

Using array with dtype float32 should fix problem: tf.convert_to_tensor(train_data.astype(np.float32))

Answered By - kacpo1

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, October 28, 2021

[FIXED] Failed to convert a NumPy array ((the whole sequence is a string)) to a Tensor, in genome sequence classification for CNN?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels