Tuesday, January 2, 2024

[FIXED] How can I avoid this memory allocation error when training a keras model?

January 02, 2024 keras, out-of-memory, tensorflow No comments

Issue

I've been following this guide, trying to learn how to create a POS-tagger using keras.

I'm using Python 3.9, and I have Tensorflow 2.10 installed with CUDA Toolkit 11.2 and cuDNN 8.2, as this was the last configuration to be supported natively on Windows 10.

I'm training using an NVIDIA GeForce RTX 2070 SUPER with 8Gb of VRAM, and I have 64 Gb of RAM on my PC.

The data I'm using in training are typical tuples of tokens and POS-tags, combined into lists for sentences:

[[("hello", "INTJ"), ("world", "NOUN"), ("!", "PUNCT")], [("oh", "INTJ"), ("hi", "INTJ")], ...]

These are then split into test, validation and training sets, before being vectorised using DictVectorizer from sklearn, and one-hot-encoded, as per the guide I'm following.

I've written the following function to construct a model:

def construct_model(input_dim, hidden_neurons, output_dim):
    pos_model = Sequential([
        Dense(hidden_neurons, input_dim=input_dim),
        Activation('relu'),
        Dropout(0.2),
        Dense(hidden_neurons),
        Activation('relu'),
        Dropout(0.2),
        Dense(output_dim, activation='softmax')
    ])
    pos_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    return pos_model

Then I load my data and fit the model using this code:

if __name__ == "__main__":

    X_train = processed_data[0]
    y_train = processed_data[1]

    X_val = processed_data[2]
    y_val = processed_data[3]

    X_test = processed_data[4]
    y_test = processed_data[5]

    model_params = {
        'build_fn': construct_model,
        'input_dim': X_train.shape[1],
        'hidden_neurons': 512,
        'output_dim': y_train.shape[1],
        'epochs': 5,
        'batch_size': 256,
        'verbose': 1,
        'validation_data': (X_val, y_val),
        'shuffle': True
    }

    classifier = KerasClassifier(**model_params)
    pos_model = classifier.fit(X_train, y_train)

Any time I try to fit the model I get an error with a long Traceback:

2023-12-25 09:44:36.255452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5973 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:0a:00.0, compute capability: 7.5
2023-12-25 09:57:57.132922: W tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 73.90GiB (rounded to 79346949120)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2023-12-25 09:57:57.134352: I tensorflow/core/common_runtime/bfc_allocator.cc:1033] BFCAllocator dump for GPU_0_bfc
2023-12-25 09:57:57.134940: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (256):  Total Chunks: 12, Chunks in use: 12. 3.0KiB allocated for chunks. 3.0KiB in use in bin. 120B client-requested in use in bin.
2023-12-25 09:57:57.135083: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (512):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.135206: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (1024):     Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2023-12-25 09:57:57.135492: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (2048):     Total Chunks: 2, Chunks in use: 2. 4.0KiB allocated for chunks. 4.0KiB in use in bin. 4.0KiB client-requested in use in bin.
2023-12-25 09:57:57.135686: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (4096):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.136594: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (8192):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.136830: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (16384):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.137034: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (32768):    Total Chunks: 1, Chunks in use: 1. 34.0KiB allocated for chunks. 34.0KiB in use in bin. 34.0KiB client-requested in use in bin.
2023-12-25 09:57:57.137230: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (65536):    Total Chunks: 1, Chunks in use: 0. 67.8KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.137483: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (131072):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.137682: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (262144):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.137801: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (524288):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.138025: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (1048576):  Total Chunks: 2, Chunks in use: 1. 2.90MiB allocated for chunks. 1.00MiB in use in bin. 1.00MiB client-requested in use in bin.
2023-12-25 09:57:57.138253: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (2097152):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.138490: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (4194304):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.138743: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (8388608):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.138971: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (16777216):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.139212: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (33554432):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.139371: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (67108864):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.139585: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (134217728):    Total Chunks: 1, Chunks in use: 1. 205.92MiB allocated for chunks. 205.92MiB in use in bin. 205.92MiB client-requested in use in bin.
2023-12-25 09:57:57.139717: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Bin (268435456):    Total Chunks: 2, Chunks in use: 0. 5.63GiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-12-25 09:57:57.140005: I tensorflow/core/common_runtime/bfc_allocator.cc:1056] Bin for 73.90GiB was 256.00MiB, Chunk State: 
2023-12-25 09:57:57.141115: I tensorflow/core/common_runtime/bfc_allocator.cc:1062]   Size: 408.84MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 1.00MiB | Requested Size: 1.00MiB | in_use: 1 | bin_num: -1, next:   Size: 205.92MiB | Requested Size: 205.92MiB | in_use: 1 | bin_num: -1
2023-12-25 09:57:57.141309: I tensorflow/core/common_runtime/bfc_allocator.cc:1062]   Size: 5.23GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 205.92MiB | Requested Size: 205.92MiB | in_use: 1 | bin_num: -1
2023-12-25 09:57:57.141420: I tensorflow/core/common_runtime/bfc_allocator.cc:1069] Next region of size 6263144448
2023-12-25 09:57:57.141763: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d000000 of size 256 next 1
2023-12-25 09:57:57.141835: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d000100 of size 1280 next 2
2023-12-25 09:57:57.141920: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d000600 of size 256 next 3
2023-12-25 09:57:57.141987: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d000700 of size 256 next 4
2023-12-25 09:57:57.142074: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d000800 of size 256 next 6
2023-12-25 09:57:57.142135: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d000900 of size 2048 next 7
2023-12-25 09:57:57.142194: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001100 of size 256 next 5
2023-12-25 09:57:57.142252: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001200 of size 256 next 8
2023-12-25 09:57:57.142581: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001300 of size 2048 next 12
2023-12-25 09:57:57.142678: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001b00 of size 256 next 13
2023-12-25 09:57:57.142789: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001c00 of size 256 next 11
2023-12-25 09:57:57.142910: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001d00 of size 256 next 17
2023-12-25 09:57:57.143002: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001e00 of size 256 next 18
2023-12-25 09:57:57.143142: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d001f00 of size 256 next 14
2023-12-25 09:57:57.143262: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d002000 of size 256 next 19
2023-12-25 09:57:57.143450: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] Free  at 130d002100 of size 69376 next 20
2023-12-25 09:57:57.143583: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d013000 of size 34816 next 21
2023-12-25 09:57:57.143682: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] Free  at 130d01b800 of size 1990144 next 15
2023-12-25 09:57:57.143840: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 130d201600 of size 1048576 next 16
2023-12-25 09:57:57.143986: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] Free  at 130d301600 of size 428696832 next 9
2023-12-25 09:57:57.144123: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] InUse at 1326bd7b00 of size 215922688 next 10
2023-12-25 09:57:57.144276: I tensorflow/core/common_runtime/bfc_allocator.cc:1089] Free  at 13339c3300 of size 5615373568 next 18446744073709551615
2023-12-25 09:57:57.144457: I tensorflow/core/common_runtime/bfc_allocator.cc:1094]      Summary of in-use Chunks by size: 
2023-12-25 09:57:57.144711: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 12 Chunks of size 256 totalling 3.0KiB
2023-12-25 09:57:57.144796: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 1 Chunks of size 1280 totalling 1.2KiB
2023-12-25 09:57:57.144876: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 2 Chunks of size 2048 totalling 4.0KiB
2023-12-25 09:57:57.144950: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 1 Chunks of size 34816 totalling 34.0KiB
2023-12-25 09:57:57.145023: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 1 Chunks of size 1048576 totalling 1.00MiB
2023-12-25 09:57:57.145094: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] 1 Chunks of size 215922688 totalling 205.92MiB
2023-12-25 09:57:57.145168: I tensorflow/core/common_runtime/bfc_allocator.cc:1101] Sum Total of in-use chunks: 206.96MiB
2023-12-25 09:57:57.145231: I tensorflow/core/common_runtime/bfc_allocator.cc:1103] total_region_allocated_bytes_: 6263144448 memory_limit_: 6263144448 available bytes: 0 curr_region_allocation_bytes_: 12526288896
2023-12-25 09:57:57.145763: I tensorflow/core/common_runtime/bfc_allocator.cc:1109] Stats: 
Limit:                      6263144448
InUse:                       217014528
MaxInUse:                    647770624
NumAllocs:                          33
MaxAllocSize:                215922688
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2023-12-25 09:57:57.146178: W tensorflow/core/common_runtime/bfc_allocator.cc:491] *_____*****_________________________________________________________________________________________
Traceback (most recent call last):
  File "C:\Users\admd9\PycharmProjects\codalab-sigtyp2024\train_pos_tagger.py", line 104, in <module>
    pos_model = classifier.fit(X_train, y_train)
  File "C:\Users\admd9\anaconda3\envs\tf_codalab_sharedtask\lib\site-packages\keras\wrappers\scikit_learn.py", line 248, in fit
    return super().fit(x, y, **kwargs)
  File "C:\Users\admd9\anaconda3\envs\tf_codalab_sharedtask\lib\site-packages\keras\wrappers\scikit_learn.py", line 175, in fit
    history = self.model.fit(x, y, **fit_args)
  File "C:\Users\admd9\anaconda3\envs\tf_codalab_sharedtask\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Users\admd9\anaconda3\envs\tf_codalab_sharedtask\lib\site-packages\tensorflow\python\framework\constant_op.py", line 102, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

Apparently this error has to do with how memory is allocated, and may be the result of the validation set being passed to the GPU all at once.

I've looked for solutions online, and most seem to suggest reducing the batch size. I tried reducing batch size to 2, but this didn't help.

I've also tried inserting the following code block, which I found here, after loading my data to allow TensorFlow to allocate GPU memory:

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

This didn't fix the problem either.

Finally, someone in this thread suggested using gc.collect() in a loop to clear RAM after each loop, but I'm not using a loop like the user in who asked the question, so I don't see how I could get this to work.

How can I solve this issue and train my model?

Solution

It seems that the whole data is being loaded into the GPU memory. I suggest you to implement Generators to load your data. It avoids to pass the entire dataset to the GPU.

I suggest you to visit this thread to get some examples:

Failed copying input tensor from CPU to GPU in order to run GatherVe: Dst tensor is not initialized. [Op:GatherV2]

EDIT:

I post here the code from the example.

from tensorflow.keras.utils import Sequence
import numpy as np   

class DataGenerator(Sequence):
    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size

    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
        return batch_x, batch_y

train_gen = DataGenerator(X_train, y_train, 32)
test_gen = DataGenerator(X_test, y_test, 32)


history = model.fit(train_gen,
                    epochs=6,
                    validation_data=test_gen)

EDIT 2

The code to make it work with Keras (sciKeras removed)

if __name__ == "__main__":

    X_train = processed_data[0]
    y_train = processed_data[1]

    X_val = processed_data[2]
    y_val = processed_data[3]

    X_test = processed_data[4]
    y_test = processed_data[5]

    model_params = {
        'input_dim': X_train.shape[1],
        'hidden_neurons': 512,
        'output_dim': y_train.shape[1],
        'epochs': 5,
        'batch_size': 256,
        'verbose': 1,
        'shuffle': True
    }

    train_gen = DataGenerator(
        X_train,
        y_train, 
        model_parameters['batch_size']
    )
    test_gen = DataGenerator(
        X_test,
        y_test, 
        model_parameters['batch_size']
    )

    model = construct_model(
        input_dim=model_parameters['input_dim'],
        hidden_neurons=model_parameters['hidden_neurons'],
        output_dim=model_parameters['output_dim']
        )

    history = model.fit(
        train_gen,
        epochs=model_parameters['epochs'],
        verbose=model_parameters['verbose'],
        shuffle=model_parameters['shuffle'],
        validation_data=test_gen
    )

Answered By - Daniel Perez Efremova

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 2, 2024

[FIXED] How can I avoid this memory allocation error when training a keras model?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels