Quantization Aware Deep Learning

Date(s):

Location:
Atkinson Hall, Room 4004

Speaker(s):
Prof. Dr. Max Welling
Research chair in Machine Learning at the University of Amsterdam, and VP of Technologies at Qualcomm
Dr. Max Welling

Abstract:

Increasingly deep learning applications are moving to the edge. Running high demand AI applications on small form factors puts very strict constraints on, among others, power efficiency of the underlying machine learning algorithms. We are thus witnessing a flurry of research activity on new ways to squeeze as much predictive power (a.k.a. intelligence) out of every Joule of available energy. For deep learning, one way to achieve this is to quantize the weights and activations of a neural network to the minimal possible number of bits required to make accurate predictions, sometimes even a single bit. However, learning with highly quantized weights and activations is very difficult due to the fact that gradients do not exist or are poorly approximated. Also, post-hoc quantization leads to a very high loss in accuracy. I will discuss a new way to achieve “quantization aware deep learning”. This means that we train the network in the cloud using high precision compute, but in such a way that quantization after training will lead to very small loss in accuracy. Our hammer is probabilistic deep learning which uses the probability of choosing a particular discrete value as a differentiable quantity amenable to back-propagation. We also include terms that encourage the weights and activations to cluster around the allowed (quantized) values. Experiments show that our method allows one to train highly quantized models without much loss in accuracy and improves on the current state of the art on this task.  


Speaker Bio:
Prof. Dr. Max Welling is a research chair in Machine Learning at the University of Amsterdam and a VP Technologies at Qualcomm. He has a secondary appointment as a senior fellow at the Canadian Institute for Advanced Research (CIFAR). He is co-founder of “Scyfer BV” a university spin-off in deep learning which got acquired by Qualcomm in summer 2017. In the past he held postdoctoral positions at Caltech (’98-’00), UCL (’00-’01) and the U. Toronto (’01-’03). He received his PhD in ’98 under supervision of Nobel laureate Prof. G. 't Hooft. Max Welling has served as associate editor in chief of IEEE TPAMI from 2011-2015 (impact factor 4.8). He serves on the board of the NIPS foundation since 2015 (the largest conference in machine learning) and has been program chair and general chair of NIPS in 2013 and 2014 respectively. He was also program chair of AISTATS in 2009 and ECCV in 2016 and general chair of MIDL 2018. He has served on the editorial boards of JMLR and JML and was an associate editor for Neurocomputing, JCGS and TPAMI. He received multiple grants from Google, Facebook, Yahoo, NSF, NIH, NWO and ONR-MURI among which an NSF career grant in 2005. He is recipient of the ECCV Koenderink Prize in 2010 and best paper awards at ICML and ICLR. Welling is in the board of the Data Science Research Center in Amsterdam, he directs the Amsterdam Machine Learning Lab (AMLAB), and co-directs the Qualcomm-UvA deep learning lab (QUVA) and the Bosch-UvA Deep Learning lab (DELTA). Max Welling has over 200 scientific publications in machine learning, computer vision, statistics and physics.