Issue
I am training a tensorflow keras sequential model on around 20+ GB text based categorical data in a postgres db and i need to give class weights to the model. Here is what i am doing.
class_weights = sklearn.utils.class_weight.compute_class_weight('balanced', classes, y)
model.fit(x, y, epochs=100, batch_size=32, class_weight=class_weights, validation_split=0.2, callbacks=[early_stopping])
Since i can't load the whole thing in memory i figured i can use fit_generator method in keras model.
However how can i calculate the class weights on this data? sklearn does not provide any special function for this, is it the right tool for this ?
I thought of doing it on multiple random samples but is there a better approach where whole data can be used ?
Solution
You can use the generators and also you can compute the class weights.
Let's say you have your generator like this
train_generator = train_datagen.flow_from_directory(
'train_directory',
target_size=(224, 224),
batch_size=32,
class_mode = "categorical"
)
and the class weights for the training set can be computed like this
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes)
[EDIT 1] Since you mentioned about postgres sql in the comments, I am adding the prototype answer here.
first fetch the count for each classes using a separate query from postgres sql and use it to compute the class weights. you can compute it manually. The basic logic is the count of least weighed class gets the value 1, and the rest of the classes get <1 based on the relative count to the least weighed class.
for example you have 3 classes A,B,C with 100,200,150 then class weights becomes {A:1,B:0.5,C:0.66}
let compute it manually after fetching the values from postgres sql.
[Query]
cur.execute("SELECT class, count(*) FROM table group by classes order by 1")
rows = cur.fetchall()
The above query will return rows with tuples (class name, count for each class) ordered from least to highest.
Then the below line will code will create the class weights dictionary
class_weights = {}
for row in rows:
class_weights[row[0]]=rows[0][1]/row[1]
#dividing the least value the current value to get the weight,
# so that the least value becomes 1,
# and other values becomes < 1
Answered By - venkata krishnan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.