All you need is a good init
Paper
•
1511.06422
•
Published
•
1
Note FitNet-4 architecture is much more difficult to optimize and thus we focus on it in the experi- ments presented in this section. We have explored the initializations with different activation functions in very deep networks. More specifically, ReLU, hyperbolic tangent, sigmoid, maxout and the VLReLU – very leaky ReLU (Gra- ham (2014c)) – a variant of leaky ReLU ( Maas et al. (2013), with a large value of the negative slope 0.333, instead of the originally proposed 0.01) which is popular in Kaggl