Issue
An example of a simple neural network in PyTorch can be found at https://visualstudiomagazine.com/articles/2020/10/14/pytorch-define-network.aspx
class Net(T.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hid1 = T.nn.Linear(4, 8) # 4-(8-8)-1
self.hid2 = T.nn.Linear(8, 8)
self.oupt = T.nn.Linear(8, 1)
T.nn.init.xavier_uniform_(self.hid1.weight)
T.nn.init.zeros_(self.hid1.bias)
T.nn.init.xavier_uniform_(self.hid2.weight)
T.nn.init.zeros_(self.hid2.bias)
T.nn.init.xavier_uniform_(self.oupt.weight)
T.nn.init.zeros_(self.oupt.bias)
def forward(self, x):
z = T.tanh(self.hid1(x))
z = T.tanh(self.hid2(z))
z = T.sigmoid(self.oupt(z))
return z
A distinctive feature of the above is that the layers are stored as fields within the Net object (as they need to be, in the sense that they contain the weights, which need to be remembered across training epochs), but the activation functors such as tanh
are re-created on every call to forward
. The author says:
The most common structure for a binary classification network is to define the network layers and their associated weights and biases in the
__init__()
method, and the input-output computations in theforward()
method.
Fair enough. On the other hand, perhaps it would be marginally faster to store the functors rather than re-create them on every call to forward
. On the third hand, it's unlikely to make any measurable difference, which means it might end up being a matter of code style.
Is the above, indeed the most common way to do it? Does either way have any technical advantage, or is it just a matter of style?
Solution
On "storing" functors
The snippet is not "re-creating" anything -- calling torch.tanh(x)
is literally just calling the function tanh
exported by the torch
package with arguments x
.
Other ways of doing it
I think the snippet is a fair example for small neural blocks that are use-and-forget or are just not meant to be parameterizable. Depending on your intentions, there are of course alternatives, but you'd have to weigh yourself whether the added complexity offers any value.
- activation functions as strings
allow a selection of an activation function from a fixed set
class Model(torch.nn.Module):
def __init__(..., activation_function: Literal['tanh'] | Literal['relu']):
...
if activation_function == 'tanh':
self.activation_function = torch.tanh
elif activation_function == 'relu':
self.activation_function = torch.relu
else:
raise ValueError(f'activation function {activation_function} not allowed, use tanh or relu.'}
def forward(...) -> Tensor:
output = ...
return self.activation_function(output)
- activation functions as callables
use arbitrary modules or functions as activations
class Model(torch.nn.Module):
def __init__(..., activation_function: torch.nn.Module | Callable[[Tensor], Tensor]):
self.activation_function = activation_function
def forward(...) -> Tensor:
output = ...
return self.activation_function(output)
which would for instance work like
def cube(x: Tensor) -> Tensor: return x**3
cubic_model = Model(..., activation_function=cube)
The key difference between the above examples and your snippet is the fact that the latter are transparent and adjustable wrt. to the activation used; you can inspect the activation function (i.e. model.activation_function
), and change it (before or after initialization), whereas in the case of the original snippet it is invisible and baked into the model's functionality (to replicate the model with a different function, you'd need to define it from scratch).
Overall, I think the best way to go is to create small, locally tunable blocks that are as parametric as you need them to be, and wrap them into bigger blocks that make generalizations over the contained parameters. i.e. if your big model consists of 5 linear layers, you could make a single, activation-parametric wrapper for 1 layer (including dropouts, layer norms, whatever), and then another wrapper for a flow of N layers, which asks once for which activation function to initialize its children with. In other words, generalize and parameterize when you anticipate this to save you from extra effort and copy-pasting code in the future, but don't overdo it or you'll end up far away from your original specifications and needs.
ps: I don't know whether calling activation functions functors is justifiable.
Answered By - KonstantinosKokos
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.