I am a Senior Research Scientist at Isomorphic Labs,
an Alphabet subsidiary led by Demis Hassabis, where I work on deep learning models for interaction
of drug and protein molecules. Previously I was a Research Scientist at Samsung and PhD student
at the Bayesian Methods Research Group under the
supervision of Dmitry Vetrov.
LaMa uses convolutions in the Fourier space to generalize to a high 2k resolutions, despite being trained on 256x256 images. It achieves a strong performance even in challenging scenarios, e.g. completion of periodic structures.
SparsesVD works in practice and it was used for network sparsification in leading IT companies. However, future studies showed that careful usage of pruning-based methods can produce better results.
Training of deep models with noise is known to be hard and unstable. That is less the case with SparseVD. All variances are initialized with small values and did not change much during training. Using small variances does not hurt the performance, thus SparseVD might be considered as a fancy regulariser with (almost) no noise.
The sparse solution is just a local optimum, as better values of ELBO can be achieved with a less flexible variational posterior q(w_ij)=N(w_ij | 0, σ_ij).
We show that variational dropout trains highly sparsified deep neural networks, while a pattern of sparsity is learned jointly with weights during training.
The work shows that i) a simple ensemble of independently trained networks performs significantly better than recent techniques ii) a simple test-time augmentation applied to a conventional network outperforms low-parameters ensembles (e.g. Dropout) and also improves all ensembles for free iii) a comparison of the uncertainty estimation ability of algorithms is often done incorrectly in the literature.
The deep weight prior is the generative model for kernels of convolutional neural networks, that acts as a prior distribution while training on new datasets.
It is possible to learn a zero-centered Gaussian distribution over the weights of a neural network by learning only variances, and it works surprisingly well.
We employ semi-conditional normalizing flow architecture that allows efficiently trains normalizing flows when only few labeled data points are available.