Issue
simply input is image ===> output text(feature extractor ) I want to use separate encoder and decoder models for Handwriting recognition TrOCR shows an error that the input image is diff size for each model How can I modify the config of model or do normalize fro input image to models
from transformers import (
TrOCRConfig,
TrOCRProcessor,
TrOCRForCausalLM,
ViTConfig,
ViTModel,
VisionEncoderDecoderModel,
)
import requests
import cv2
from PIL import Image
# TrOCR is a decoder model and should be used within a VisionEncoderDecoderModel
# init vision2text model with random weights
encoder = ViTModel(ViTConfig())
decoder = TrOCRForCausalLM(TrOCRConfig())
model = VisionEncoderDecoderModel(encoder=encoder, decoder=decoder)
# If you want to start from the pretrained model, load the checkpoint with `VisionEncoderDecoderModel`
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
# model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")
tokenizer=processor.feature_extractor
# load image from the IAM dataset
url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
# image normlize with 224 x 224
image_0 = cv2.imread('/content/mm.png')
pixel_values = processor(image_0, return_tensors="pt").pixel_values
text = "industry, ' Mr. Brown commented icily. ' Let us have a"
# training
model.config.decoder_start_token_id = processor.tokenizer.cls_token_id
model.config.pad_token_id = processor.tokenizer.pad_token_id
model.config.vocab_size = model.config.decoder.vocab_size
model.config.encoder.image_size = 224
# model.config.image_size = 384
labels = processor.tokenizer(text, return_tensors="pt").input_ids
outputs = model(pixel_values, labels=labels)
loss = outputs.loss
round(loss.item(), 2)
# inference
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
generated_text
And this is the error I got /usr/local/lib/python3.7/dist-packages/transformers/models/vit/modeling_vit.py in forward(self, pixel_values, interpolate_pos_encoding) 171 if height != self.image_size[0] or width != self.image_size[1]: 172 raise ValueError( --> 173 f"Input image size ({height}{width}) doesn't match model" 174 f" ({self.image_size[0]}{self.image_size[1]})." 175 )
ValueError: Input image size (384384) doesn't match model (224224).
Solution
I think the way you've worded your question doesn't line up with the example you've given. Firstly, the example array you've given is 3D, not 2D. You can do
>>> arr.shape
(1,2,3)
>>> arr.ndim
3
Presumably this is a mistake, and you want your array to be 2D, so you would do
arr = np.array([[5., 2., -5.], [4., 3., 1.]])
instead.
Secondly, if a
and b
are values that, if an element is between then to set that element to value c
rather than a
and b
being indexes, then the np.where
function is great for this.
def overwrite_interval(arr , a , b , c):
inds = np.where((arr >= a) * (arr <= b))
arr[inds] = c
return arr
np.where
returns a tuple, so sometimes it can be easier to work with boolean arrays directly. In which case, the function would look like this
def overwrite_interval(arr , a , b , c):
inds = (arr >= a) * (arr <= b)
arr[inds] = c
return arr
Does this work for you, and is this your intended meaning? Note that the solution I've provided would work as is if you still meant for the initial array to be a 3D array.
Answered By - Steven Thomas
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.