We welcome you to participate in the 9th NIPS Workshop on Optimization for Machine Learning, to be held at: Barcelona, Spain on Dec 10th, 2016. Location: Room 112
- Maryam Fazel (University of Washington, US)
- Sebastien Bubeck (Microsoft Research, US)
- Nicolas Boumal (Princeton University, US)
- Ohad Shamir (Weizmann Institute of Science , Israel)
Title: Online Optimization, Smoothing, and Worst-case Competitive Ratio
Abstract: In Online Optimization, the data in an optimization problem is revealed over time and at each step a decision variable needs to be set without knowing the future data. This setup covers online resource allocation, from classical inventory problems to the Adwords problem popular in online advertising. In this talk, we prove bounds on the competitive ratio of two primal-dual algorithms for a broad class of online convex optimization problems. We give a sufficient condition on the objective function that guarantees a constant worst-case competitive ratio for monotone functions. We show how smoothing the objective can improve the competitive ratio of these algorithms, and for separable functions, we show that the optimal smoothing can be derived by solving a convex optimization problem. This result allows us to directly optimize the competitive ratio bound over a class of smoothing functions, and hence design effective smoothing customized for a given cost function.
Title: Kernel-based Methods for Bandit Convex Optimization
Abstract: A lot of progress has been made in recent years on extending classical multi-armed bandit strategies to very large set of actions. A particularly challenging setting is the one where the action set is continuous and the underlying cost function is convex, this is the so-called bandit convex optimization (BCO) problem. I will tell the story of BCO and explain some of the new ideas that we recently developed to solve it. I will focus on three new ideas from our recent work http://arxiv.org/abs/1607.03084 with Yin Tat Lee and Ronen Eldan: (i) a new connection between kernel methods and the popular multiplicative weights strategy; (ii) a new connection between kernel methods and one of Erdos’ favorite mathematical object, the Bernoulli convolution, and (iii) a new adaptive (and increasing!) learning rate for multiplicative weights. These ideas could be of broader interest in learning/algorithm’s design.
Title: Semidefinite Programs with a Dash of Smoothness: Why and When the Low-Rank Approach Works
Abstract: Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via low-rank, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably. In this presentation, we show that the Burer-Monteiro formulation of SDPs in a certain class almost never has any spurious local optima, that is: the non-convexity of the low-rank formulation is benign (even saddles are strict). This class of SDPs covers applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations. The crucial assumption we make is that the low-rank problem lives on a manifold. Then, theory and algorithms from optimization on manifolds can be used.Optimization on manifolds is about minimizing a cost function over a smooth manifold, such as spheres, low-rank matrices, orthonormal frames, rotations, etc. We will present the basic framework as well as parts of the more general convergence theory, including recent complexity results. (Toolbox: http://www.manopt.org) Select parts are joint work with P.-A. Absil, A. Bandeira, C. Cartis and V. Voroninski.
Title: Oracle Complexity of Second-Order Methods for Finite-Sum Problems
Abstract: Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in *second-order* methods, which rely on both gradients and Hessians. In principle, second-order methods can require much fewer iterations than first-order methods, and hold the promise for more efficient algorithms. Although computing and manipulating Hessians is prohibitive for high-dimensional problems in general, the Hessians of individual functions in finite-sum problems can often be efficiently computed, e.g. because they possess a low-rank structure. Can second-order information indeed be used to solve such problems more efficiently? In this talk, I'll provide evidence that the answer -- perhaps surprisingly -- is negative, at least in terms of worst-case guarantees. However, I'll also discuss what additional assumptions and algorithmic approaches might potentially circumvent this negative result. Joint work with Yossi Arjevani.