Issue
I have made a model using Keras in R. I want to make a multitask regression with some shared layers, followed by a series of fully connected layer, and a final fully connected layer of size 1, which corresponds to the final prediction.
Now let assume I have three outputs Y1, Y2, Y3. I would like the outputs Y1 and Y2 to sum to 100, while each output must have its own loss function (I want to apply weights to the observations).
I have built my model and it works well when I do not add the constraint that sum(Y1+Y2) = 100, but I cannot make it work with the constraint. I have tried using a softmax layer but it returns 1 per output.
I provide the graph and some sample codes. This is really an implementation problem because I think it is possible (and might be easy using softmax).
base.model <- keras_model_sequential()
input <- layer_input(shape=c(NULL, 3,6,6))
base.model <- input %>%
layer_conv_2d(filter = 64, kernel_size = c(3,3), input_shape = c(NULL, 3,6,6), padding='same',data_format='channels_first' ) %>%
layer_activation("relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filter = 20, kernel_size = c(2,2), padding = "same", activation = "relu") %>%
layer_dropout(0.4) %>%
layer_flatten()
# add outputs
Y1 <- base.model %>%
layer_dense(units = 40) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 50) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 1, name="Y1")
# add outputs
Y2 <- base.model %>%
layer_dense(units = 40) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 50) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 1, name="Y2")
# add outputs
Y3 <- base.model %>%
layer_dense(units = 40) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 50) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 1, name="Y3")
base.model <- keras_model(input,list(Y1,Y2,Y3)) %>%
compile(
loss = "mean_squared_error",
optimizer = 'adam',
loss_weights=list(Y1=1.0, Y2=1.0, Y3=1.0)
)
history <- base.model %>% fit(
x = Xtrain,
y = list(Y1 = Ytrain.y1, Y2 = Ytrain.y2, Y3 = Ytrain.y3),
epochs = 500, batch_size = 500,
sample_weights = list(Y1= data$weigth.y1[sp_new], Y2= data$weigth.y2[sp_new] Y3= data$weigth.y3[sp_new]),
validation_split = 0.2)
The general idea can be summarized with a figure: https://www.dropbox.com/s/ueclq42of46ifig/graph%20CNN.JPG?dl=0
Now if I try using a softmax layer I do:
soft.l <- layer_dense(units = 1, activation = 'softmax')
Y11 <- Y1 %>% soft.l %>% layer_dense(units = 1, name="Y11", trainable = T)
Y22 <- Y2 %>% soft.l %>% layer_dense(units = 1, name="Y11", trainable = T)
Then it becomes:
base.model <- keras_model(input,list(Y11,Y22,Y3)) %>%
compile(
loss = "mean_squared_error",
optimizer = 'adam',
loss_weights=list(Y11=1.0, Y22=1.0, Y3=1.0)
)
history <- base.model %>% fit(
x = Xtrain,
y = list(Y11 = Ytrain.y1, Y22 = Ytrain.y2, Y3 = Ytrain.y3),
epochs = 500, batch_size = 500,
sample_weights = list(Y11= data$weigth.y1[sp_new], Y22= data$weigth.y2[sp_new] Y3= data$weigth.y3[sp_new]),
validation_split = 0.2)
(base.model %>% predict(Xtest))[[1]] + (base.model %>% predict(Xtest))[[2]]
The problem is that sum(Y11+Y22) for the prediction is different of 1. What did I do wrong ?
Solution
I share the answer that might actually help others. The solution is easy using a concatenate layer and a softmax activation function, which makes all the layer outputs sum to 1:
# same first part as before
base.model <- keras_model_sequential()
input <- layer_input(shape=c(NULL, 3,6,6))
base.model <- input %>%
layer_conv_2d(filter = 64, kernel_size = c(3,3), input_shape = c(NULL, 3,6,6), padding='same', data_format='channels_first' ) %>%
layer_activation("relu") %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_conv_2d(filter = 20, kernel_size = c(2,2), padding = "same", activation = "relu") %>%
layer_dropout(0.4) %>%
layer_flatten()
# add outputs
Y1 <- base.model %>%
layer_dense(units = 40) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 50) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 1, name="Y1")
# add outputs
Y2 <- base.model %>%
layer_dense(units = 40) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 50) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 1, name="Y2")
# add outputs
Y3 <- base.model %>%
layer_dense(units = 40) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 50) %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 1, name="Y3")
## NEW
# add a layer that brings together Y1 and Y2
combined <- layer_concatenate(c(Y1, Y2)) %>% layer_activation_softmax(name= 'combined')
base.model <- keras_model(input,list(combined,Y3)) %>% compile(
loss = "mean_squared_error",
optimizer = 'adam',
loss_weights=list(combined = c(1.0,1.0), Y3=1.0)
)
history <- base.model %>% fit(
x = Xtrain,
y = list(combined = cbind(Ytrain.y1, Ytrain.y2), Y3 = Ytrain.y3),
epochs = 500, batch_size = 500,
sample_weights = list(combined = cbind(data$weigth.y1[sp_new], data$weigth.y2[sp_new]) Y3=
data$weigth.y3[sp_new]),
validation_split = 0.2)
Answered By - Alexandre Wadoux
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.