Tuesday, November 14, 2023

[FIXED] Python - Tensorflow: How to map a function to a dataset properly

November 14, 2023 dataset, dictionary, function, python, tensorflow No comments

Issue

I'm working through a machine learning curriculum and I have some trouble solving an issue with the given code.

import tensorflow as tf
import tensorflow_datasets as tfds

(data), info = tfds.load("iris", with_info=True, split="train")
print(info.splits)

data = data.shuffle(150)
train_data = data.take(120)
test_data = data.skip(120)

def preprocess(dataset):
    
    def _preprocess_img(image, label):
        label = tf.one_hot(label, depth=3)
        return image, label
            
    dataset = dataset.map(_preprocess_img)
    return dataset.batch(32).prefetch(tf.data.experimental.AUTOTUNE)

train_data = preprocess(train_data)
test_data = preprocess(test_data)

This is just a code snippet but it should cover the problem area here. I get the error message: TypeError: outer_factory..inner_factory..tf___preprocess_img() missing 1 required positional argument: 'label'

I wasn't able to solve it, has anyone an idea what went wrong here? I mean yes, the function expects the label argument, but with other examples I saw it seems to work. But I wonder if maybe the unpacking of the dataset doesn't work as expected?

What I tried was changing the function that was going to be mapped, I had a look on the elements of the dataset but it really didn't help getting me the right insight. I was also looking for other examples but I can't see anything wrong with this particular code here.

Solution

With default parameters, tfds.load returns a dictionary. Look:

import tensorflow_datasets as tfds

data = tfds.load("iris", split="train")

next(iter(data))

{
    'features': <tf.Tensor: shape=(4,), dtype=float32, 
        numpy=array([5.1, 3.4, 1.5, 0.2], dtype=float32)>, 
    'label': <tf.Tensor: shape=(), dtype=int64, numpy=0>
}

This is one object only, and so your preprocessing function is expecting two. You need to either use the dictionary format in your preprocessing function, or get your data in another format. To get it as a tuple and be able to use your function as is, use the as_supervised=True argument in tfds.load.

Simplified example working (without changing your preprocessing function:

import tensorflow_datasets as tfds
import tensorflow as tf

data = tfds.load("iris", split="train", as_supervised=True) 


def preprocess(dataset):
    def _preprocess_img(image, label):
        label = tf.one_hot(label, depth=3)
        return image, label

    dataset = dataset.map(_preprocess_img)
    return dataset.batch(4)


train_data = preprocess(data)

print(next(iter(train_data)))

(<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[5.1, 3.4, 1.5, 0.2],
       [7.7, 3. , 6.1, 2.3],
       [5.7, 2.8, 4.5, 1.3],
       [6.8, 3.2, 5.9, 2.3]], dtype=float32)>, 
<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>)

Answered By - Nicolas Gervais

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 14, 2023

[FIXED] Python - Tensorflow: How to map a function to a dataset properly

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels