Issue
I am trying to build a PyTorch module based on dynamic dimension sizes.
import torch
from random import randint
from torch.nn import Linear, BatchNorm1d, ReLU, Dropout, Sequential
batch_size = 2
embed_size = 128
fc_1 = Sequential(
Sequential(
Linear(1, 64),
BatchNorm1d(64),
ReLU(),
Dropout(0.1),
),
Linear(64, embed_size),
Linear(embed_size, embed_size)
)
# Secondary Features
p = torch.randn(batch_size).unsqueeze(1) # torch.Size([2, 1])
q = torch.randn(batch_size).unsqueeze(1) # torch.Size([2, 1])
r = torch.randn(batch_size).unsqueeze(1) # torch.Size([2, 1])
s = torch.randn(batch_size).unsqueeze(1) # torch.Size([2, 1])
secondary = torch.cat([ # torch.Size([8, 1])
p, q, r, s
], dim=0)
# Random Dimension Size
x = randint(2, 400) # 239
# Primary Features
a = torch.rand(batch_size, embed_size) # torch.Size([2, 128])
b = fc_1(secondary) # torch.Size([8, 128])
c = torch.rand(x, embed_size) # torch.Size([239, 128])
How do I collapse all the information from a
, b
,and c
into a variable y
so that's it's size is (batch_size, embed_size)
?
I am trying to do regression analysis, so it would be important not to lose any information in the process of collapsing it. Obviously, torch.cat
is not possible. Any method that uses learnable layers to collapse it is fine.
Solution
Given the differing dimensions, especially the dynamic size of c
, simple concatenation won't work here. A common approach in such scenarios is to use attention mechanisms which can handle varying input sizes and aggregate information in a learnable manner. Attention mechanisms can weigh different parts of the input differently, allowing the model to focus on more informative parts of the input.
The following code is an implementation of a simple attention mechanism. The idea is to compute attention scores for each of a
, b
, and c
, and then use these scores to weight and sum these tensors
import torch
import torch.nn as nn
import torch.nn.functional as F
class AttentionAggregator(nn.Module):
def __init__(self, embed_size):
super().__init__()
self.embed_size = embed_size
self.query = nn.Parameter(torch.randn(embed_size))
self.key = nn.Linear(embed_size, embed_size)
def forward(self, a, b, c):
# concatenate all inputs for computing attention
combined = torch.cat([a, b, c], dim=0)
# compute keys
keys = self.key(combined)
# compute attention scores
attention_scores = torch.matmul(keys, self.query) / (self.embed_size ** 0.5)
attention_weights = F.softmax(attention_scores, dim=0).unsqueeze(-1)
# apply attention weights
weighted = combined * attention_weights
# aggregate information
aggregated = weighted.sum(dim=0)
return aggregated
embed_size = 128 # your example
# instantiate the attention aggregator
attention_aggregator = AttentionAggregator(embed_size)
# aggregate the information (a, b, c already defined/computed in your example)
y = attention_aggregator(a, b, c)
Answered By - inverted_index
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.