John Wilmes

Thursday, January 11, 2018
Location: Goldsmith 317
John Wilmes, Georgia Tech
The Complexity of Learning Neural Networks

Abstract: The empirical successes of "deep learning" currently lack rigorous theoretical explanation. A first step might be to show that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently.In joint works with Le Song, Santosh Vempala, and Bo Xie, we give comprehensive upper and lower bounds. For inputs drawn from any logconcave distribution, we construct a family of functions that is hard to learn in the sense that any statistical query algorithm (including all known variants of stochastic gradient descent with any loss function, for any model architecture) needs an exponential number of queries even using tolerance inversely proportional to the input dimensionality. Moreover, this hard family of functions is realizable as neural networks using any activation function from a large family (including sigmoid and ReLU) with a small (sublinear in dimension) number of units in the single hidden layer, and whose output is a sum gate. Interestingly, when the data is generated by neural network with unbiased sigmoid gates, the problem becomes tractable. We give a surprisingly general analysis of gradient descent in this case, with a polynomial upper bound on sample and time complexity for approximately low-degree concepts.