Issue
I'm trying to train a model to predict design patterns from web pages. I'm using coordinates of bounding rects given a bunch of element groupings. Patterns look like this:
[[elementId, width, height, x, y]]
so my target would be the [[x,y]]
given [[elementId, width, height]]
.
Concretely:
[[5, 1.0, 1.0], [4, 1.0, 1.0], [2, 175.0, 65.0], [2, 1.0, 1.0], [4, 1.0, 1.0]]
->
[[0.0, 0.0], [0.0, 10.0], [3.0, 0.0], [0.0, 68.0], [0.0, 10.0]]
[[2, 14.0, 14.0], [2, 14.0, 14.0], [2, 14.0, 14.0]]
->
[[0.0, 3.0], [0.0, 3.0], [0.0, 3.0]]
Patterns vary in size so I've padded them with [[0,0,0]]
. I currently have about 15k of them, but can get more.
I was told that seq2seq with attention is the right model for this job. I've started with https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/ and achieved horrendous results.
Every seq2seq example i can find (searching for keras or pytorch) is used for translation which is categorical and I'm struggling to find a good regression based example.
So my questions are:
Is this the right model (encoder/decoder LSTM) for what i'm trying to do?
Is there any examples if so?
Solution
Seq2Seq/LSTM are used when input/output are variable lengths.
Your input is of size 3 and output is of size 2 (at least for the given examples). So you can use a simple one/two-hidden layer feed-forward model with the L2/L1 loss (for regression). Any opt (SGD/Adam) should be fine, however, Adam works well in practice.
Also, I think you should not use coordinates as it is, you can scale it so that the highest coordinate is 1 and hence the input/output range would be between 0 and 1. An added advantage, this would help you to generalize to different screen sizes intuitively.
Answered By - Umang Gupta
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.