Saturday, September 17, 2022

[FIXED] How can I use different encoder and decoder transformers models

September 17, 2022 huggingface-transformers, nlp, python, pytorch No comments

Issue

simply input is image ===> output text(feature extractor ) I want to use separate encoder and decoder models for Handwriting recognition TrOCR shows an error that the input image is diff size for each model How can I modify the config of model or do normalize fro input image to models

from transformers import (
    TrOCRConfig,
    TrOCRProcessor,
    TrOCRForCausalLM,
    ViTConfig,
    ViTModel,
    VisionEncoderDecoderModel,
)
import requests
import cv2
from PIL import Image

# TrOCR is a decoder model and should be used within a VisionEncoderDecoderModel
# init vision2text model with random weights
encoder = ViTModel(ViTConfig())
decoder = TrOCRForCausalLM(TrOCRConfig())
model = VisionEncoderDecoderModel(encoder=encoder, decoder=decoder)

# If you want to start from the pretrained model, load the checkpoint with `VisionEncoderDecoderModel`
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
# model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")
tokenizer=processor.feature_extractor
# load image from the IAM dataset
url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# image normlize with 224 x 224
image_0 = cv2.imread('/content/mm.png')
pixel_values = processor(image_0, return_tensors="pt").pixel_values
text = "industry, ' Mr. Brown commented icily. ' Let us have a"

# training
model.config.decoder_start_token_id = processor.tokenizer.cls_token_id
model.config.pad_token_id = processor.tokenizer.pad_token_id
model.config.vocab_size = model.config.decoder.vocab_size
model.config.encoder.image_size = 224
# model.config.image_size  = 384
labels = processor.tokenizer(text, return_tensors="pt").input_ids
outputs = model(pixel_values, labels=labels)
loss = outputs.loss
round(loss.item(), 2)

# inference
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
generated_text

And this is the error I got /usr/local/lib/python3.7/dist-packages/transformers/models/vit/modeling_vit.py in forward(self, pixel_values, interpolate_pos_encoding) 171 if height != self.image_size[0] or width != self.image_size[1]: 172 raise ValueError( --> 173 f"Input image size ({height}{width}) doesn't match model" 174 f" ({self.image_size[0]}{self.image_size[1]})." 175 )

ValueError: Input image size (384384) doesn't match model (224224).

Solution

I think the way you've worded your question doesn't line up with the example you've given. Firstly, the example array you've given is 3D, not 2D. You can do

>>> arr.shape
(1,2,3)
>>> arr.ndim
3

Presumably this is a mistake, and you want your array to be 2D, so you would do

arr = np.array([[5., 2., -5.], [4., 3., 1.]])

instead.

Secondly, if a and b are values that, if an element is between then to set that element to value c rather than a and b being indexes, then the np.where function is great for this.

def overwrite_interval(arr , a , b , c):
    inds = np.where((arr >= a) * (arr <= b))
    arr[inds] = c
    return arr

np.where returns a tuple, so sometimes it can be easier to work with boolean arrays directly. In which case, the function would look like this

def overwrite_interval(arr , a , b , c):
    inds = (arr >= a) * (arr <= b)
    arr[inds] = c
    return arr

Does this work for you, and is this your intended meaning? Note that the solution I've provided would work as is if you still meant for the initial array to be a 3D array.

Answered By - Steven Thomas

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, September 17, 2022

[FIXED] How can I use different encoder and decoder transformers models

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels