All you need is a good init
Paper
•
1511.06422
•
Published
•
1
Note Latest repo: https://github.com/ducha-aiki/lsuv First, fill the weights with Gaussian noise with unit variance. Second, decompose them to orthonormal basis with QR or SVD-decomposition and replace weights with one of the components. The LSUV process then estimates output variance of each convolution and inner product layer and scales the weight to make variance equal to one. The influence of selected mini-batch size on estimated variance is negligible in wide margins, see Appendix.