Friday, November 24, 2023

[FIXED] validation extraction using intel image classification

November 24, 2023 machine-learning, numpy, python, scikit-learn, tensorflow No comments

Issue

I'm trying to make a machine learning model that operates validation extraction to distinguish whether it's bright or dark.

I used the dataset "intel image classification" from kaggle.

import os
import cv2
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# read image and change to HSV colorspace to extract validation
def extract_brightness(image_path):
    image = cv2.imread(image_path)
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    brightness = hsv_image[:,:,2]  # extract validation
    return brightness.flatten()

# image file directory
directory = 'D:\intel_image_classification\seg_train\seg_train/buildings' 

image_files = []
# repeat every file in directory
for filename in os.listdir(directory):
    if filename.endswith('.jpg') or filename.endswith('.png') or filename.endswith('.jpeg'):
        # jpg, png, jpeg only added in list
        file_path = os.path.join(directory, filename)
        image_files.append(file_path)



# extract validation from each file image
brightness_data = []
for file in image_files:
    brightness = extract_brightness(file)
    brightness_data.append(brightness)
binarybright_data = brightness_data.copy()
# change brightness to bright=1 dark=0 for model training
for i in range(len(binarybright_data)):
    if binarybright_data[i].all() < 127.5: #<------ First problem
        binarybright_data[i] = 0
    else:
        binarybright_data[i] = 1


# data
X = np.array(brightness_data, dtype=object)
y = np.array(binarybright_data)  # label of images (0: dark, 1: bright)

# divide data to train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# choose machine learning model
model = LogisticRegression()
model.fit(X_train, y_train)

# prediction using test data
y_pred = model.predict(X_test)

# accuracy evaluation
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

First Problem

I used 127.5 (which is half of 255) to train whether it's bright or dark.

I know it can cause inaccuracy, but there are 14k images only for training so I couldn't label all those images.

I would be grateful if you suggest me any other way to label images.

for i in range(len(binarybright_data)):
    if binarybright_data[i].all() < 127.5: #<------ First problem
        binarybright_data[i] = 0
    else:
        binarybright_data[i] = 1

if I don't put '.all()' method at the end, I get error which is

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I can't understand why this error comes up even data is composed of numbers not boolean.

Second Problem

if i use'.all()' method, then this two errors comes out.

TypeError: only size-1 arrays can be converted to Python scalars

ValueError: setting an array element with a sequence.

I really appreciate for your help in advance.

Hope you have a great day.

Solution

There are several issues with the approach you have taken in analyzing solving the problem itself. I went through your code and the Intel data.

For your first question, the variable brightness_data is a 2D list and therefore would need any or all if you are doing a condition check. You can something like the following to achieve what you want:

np.mean(brightness_data[i]) < 127.5
The second problem comes due to the fact all pics are not of same size. While most images are 150x150 size, there are some images which do not follow this shape, for example 5358.jpg which is 150x124 and this is causing the second error for you because the list shape goes once you append this image. To avoid this, you can probably skip those images from training to start with.

Answered By - SKPS

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, November 24, 2023

[FIXED] validation extraction using intel image classification

Issue

First Problem

Second Problem

Solution

0 comments:

Post a Comment

Popular Posts

Labels