Site menu:
OPT2020
We welcome you to participate in the 12th OPT Workshop on Optimization for Machine Learning. This year's OPT workshop will be run as a virtual event together with NeurIPS. This year we particularly encourage submissions in the area of Adaptive stochastic methods and generalization performance.
We are looking forward to an exciting OPT 2020!
Accepted Papers
- Poster Sessions will be held on our gather.town.
- The poster ID (in bracket) will help you to locate the poster in gather.town.
- Spotlight Talks: A 10-minute video is available on the NeurIPS website to registered users (on-demand). The authors of spotlight presentations will be available for questions during a live Q&A session. Please watch the video before the session (or at the beginning of the sessions). You can also talk to the authors in gather.town.
- Posters: Will be posted all day in a virtual room on gather.town. Authors are encouraged to present their posters during the slots allocated below, but you might find them also at other times during the day in the poster room.
- (some posters have not yet been submitted on CMT)
Student Paper Award
It is our pleasure to announce the best student papers at OPT 2020. This award recognizes excellent contributions authored primarily by a student and presented by the student in one of the spotlight sessions.
- (35) Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems [Poster] — Preetum Nakkiran (Harvard)
- (37) Constraint-Based Regularization of Neural Networks [Poster] — Benedict Leimkuhler (University of Edinburgh); Timothée Pouchon (University of Edinburgh); Tiffany Vlaar (University of Edinburgh); Amos Storkey (U Edinburgh)
- (44) When Does Preconditioning Help or Hurt Generalization? [Poster] — Shun-ichi Amari (RIKEN); Jimmy Ba (University of Toronto); Roger Grosse (University of Toronto); Xuechen Li (Google); Atsushi Nitanda (The University of Tokyo / RIKEN / JST PRESTO); Taiji Suzuki (The University of Tokyo / RIKEN); Denny Wu (University of Toronto & Vector Institute); Ji Xu (Columbia University)
- (82) A Study of Condition Numbers for First-Order Optimization [Poster] — Charles Guille-Escuret (U. Montreal, Mila); Baptiste Goujaud (Mila); Manuela Girotti (Mila, Universite de Montreal, Concordia University); Ioannis Mitliagkas (Mila, University of Montreal)
Poster Session 1
Spotlight Presentations (Q&A in Session 1)
- (31) Distributed Proximal Splitting Algorithms with Rates and Acceleration [Poster] — Laurent Condat (KAUST); Peter Richtarik (KAUST)
- (37) Constraint-Based Regularization of Neural Networks [Poster] — Benedict Leimkuhler (University of Edinburgh); Timothée Pouchon (University of Edinburgh); Tiffany Vlaar (University of Edinburgh); Amos Storkey (U Edinburgh)
- (47) Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions? [Poster] — Ohad Shamir (Weizmann Institute of Science)
- (74) Employing No Regret Learners for Pure Exploration in Linear Bandits [Poster] — Mohammadi Zaki (Indian Institute of Science); Avinash Mohan (Technion Israel Institute of Technology); Aditya Gopalan (Indian Institute of Science (IISc), Bangalore)
- (76) PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization [Poster] — Zhize Li ( King Abdullah University of Science and Technology (KAUST)); Hongyan Bao (KAUST); Xiangliang Zhang (" King Abdullah University of Science and Technology, Saudi Arabia"); Peter Richtarik (KAUST)
Spotlight Presentations (Q&A in Session 2)
- (9) DDPNOpt: Differential Dynamic Programming Neural Optimizer [Poster] — Guan-Horng Liu (Georgia Institute of Technology); Tianrong Chen (Georgia Institute of Technology ); Evangelos Theodorou (Georgia Institute of Technology)
- (50) Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [Poster] — Samuel Horváth (KAUST); Lihua Lei (Stanford University); Peter Richtarik (KAUST); Michael Jordan (Berkeley)
Additional Poster Presentations in Session 1
- (10) Adaptive Learning of the Optimal Batch Size of SGD [Poster] — Motasem Alfarra (KAUST); Slavomír Hanzely (KAUST); Alyazeed Basyoni (KAUST); Bernard Ghanem (KAUST); Peter Richtarik (KAUST)
- (14) On Stochastic Sign Descent Methods [Poster] — Mher Safaryan (KAUST); Peter Richtarik (KAUST)
- (15) Primal-Dual Sequential Subspace Optimization for Saddle-point Problems [Poster] — Yoni Choukroun (Toga Networks); Michael Zibulevsky (Technion); Pavel Kisilev (Toga )
- (17) ProbAct: A Probabilistic Activation Function for Deep Neural Network [Poster] — Kumar Shridhar (ETH Zurich); Purvanshi Mehta (University of Rochester)
- (22) Least-squares regressions via randomized Hessians [Poster] — Nabil Kahale (ESCP Business School)
- (30) On The Convergence of First Order Methods for Quasar-Convex Optimization [Poster] — Jikai Jin (Peking University)
- (42) Efficient robust optimal transport: formulations and algorithms [Poster] — Pratik Jawanpuria (Microsoft); N. T. V. Satya Dev (Vayve Technologies); Bamdev Mishra (Microsoft)
- (43) Adaptive Hessian-free optimization for training neural networks [Poster] — Tetsuya Motokawa (University of Tsukuba); Taro Tezuka (University of Tsukuba)
- (45) Shallow Physics Informed Neural Networks Using Levenberg-Marquardt Optimization [Poster] — Gaurav Yadav (Indian Institute of Technology, Madras); Balaji Srinivasan (IIT MADRAS)
- (46) Flexible Structured Graphical LASSO with Latent Variables [Poster] — Kazuki Koyama (NTT Communications); Keisuke Kiritoshi (NTT Communications); Tomomi Okawachi (NTT Communications); Tomonori Izumitani (NTT Communications)
- (48) Measuring optimization performance of stochastic gradient descent via neural networks with threshold activation [Poster] — Junyoung Kim (Department of Industrial Engineering, Seoul National University); Kyungsik Lee (Department of Industrial Engineering, Seoul National University)
- (49) Incremental Methods for Weakly Convex Optimization [Poster] — Xiao Li (The Chinese University of Hong Kong, Shenzhen); Zhihui Zhu (University of Denver); Anthony Man-Cho So (The Chinese University of Hong Kong); Jason Lee (Princeton)
- (51) A Variant of Gradient Descent Algorithm Based on Gradient Averaging [Poster] — Saugata Purkayastha (Assam Don Bosco University); Sukannya Purkayastha (IIT Kharagpur)
- (54) Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms [Poster] — Adil Salim (KAUST); Laurent Condat (KAUST); Konstantin Mishchenko (KAUST); Peter Richtarik (KAUST)
- (55) Efficient Optimized Spike Encoding for Spiking Neural Networks [Poster] — Dighanchal Banerjee (Tata Consultancy Services); Sounak Dey (Tata Consultancy Services Ltd.); Arun M. George (TCS Research & Innovation); Arijit Mukherjee (TCS Innovation Labs)
- (59) Error Compensated Proximal SGD and RDA [Poster] — Xun Qian (King Abdullah University of Science and Technology); Hanze Dong (HKUST); Peter Richtarik (KAUST); Tong Zhang (The Hong Kong University of Science and Technology)
- (60) Error Compensated Distributed SGD can be Accelerated [Poster] — Xun Qian (King Abdullah University of Science and Technology); Peter Richtarik (KAUST); Tong Zhang (The Hong Kong University of Science and Technology)
- (61) Error Compensated Loopless SVRG for Distributed Optimization [Poster] — Xun Qian (King Abdullah University of Science and Technology); Hanze Dong (HKUST); Peter Richtarik (KAUST); Tong Zhang (The Hong Kong University of Science and Technology)
- (62) An approximate gradient based hyper-parameter optimization in a neural network architecture [Poster] — Lakshman Mahto (Indian Institute of Information Technology Dharwad ); Arun Chauhan (Indian Institute of Information Technology Dharwad)
- (72) Kernel Distributionally Robust Optimization: A Generalization Theorem [Poster] — Jia-Jie Zhu (MPI for Intelligent Systems, Tübingen); Wittawat Jitkrittum (Google); Moritz Diehl (University of Freiburg); Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)
- (84) Global Convergence Rate of Gradient Flow for Asymmetric Matrix Factorization [Poster] — Tian Ye (Tsinghua University); Simon Du (University of Washington)
- (86) Riemannian optimization on the simplex of positive definite matrices [Poster] — Bamdev Mishra (Microsoft); Hiroyuki Kasai (WASEDA University); Pratik Jawanpuria (Microsoft)
- (97) Nonconvex Robust Synchronization of Rotations [Poster] — Huikang Liu (The Chinese University of Hong Kong); Zengde Deng (The Chinese University of Hong Kong); Xiao Li (The Chinese University of Hong Kong, Shenzhen); Shixiang Chen (Texas A&M University); Anthony Man-Cho So (The Chinese University of Hong Kong)
Poster Session 2
Spotlight Presentations (Q&A in Session 2)
- (32) Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search) [Poster] — Sharan Vaswani (Mila, Université de Montréal); Issam Hadj Laradji (Element AI); Frederik Kunstner (University of British Columbia); Si Yi Meng (University of British Columbia); Mark Schmidt (University of British Columbia); Simon Lacoste-Julien (Mila, Université de Montréal)
- (33) How to make your optimizer generalize better [Poster] — Sharan Vaswani (Mila, Université de Montréal); Reza Babanezhad (Samsung); Jose Gallego-Posada (Mila, Université de Montréal); Aaron Mishkin (University of British Columbia); Simon Lacoste-Julien (Mila, Université de Montréal); Nicolas Le Roux (Google)
- (88) Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence [Poster] — Nicolas Loizou ( Mila, Université de Montréal ); Sharan Vaswani (Mila, Université de Montréal); Issam Hadj Laradji (Element AI); Simon Lacoste-Julien (Mila, Université de Montréal)
Spotlight Presentations (Q&A in Session 3)
- (19) Variance Reduction on Adaptive Stochastic Mirror Descent [Poster] — Wenjie Li (Purdue University); Zhanyu Wang (Purdue University); Yichen Zhang (New York University); Guang Cheng (Purdue University)
- (35) Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems [Poster] — Preetum Nakkiran (Harvard)
- (65) Incremental Greedy BFGS: An Incremental Quasi-Newton Method with Explicit Superlinear Rate [Poster] — Zhan Gao (University of Pennsylvania); Alec Koppel (U.S. Army Research Laboratory); Alejandro Ribeiro (University of Pennsylvania)
Additional Poster Presentations in Session 2
- (7) A termination criterion for stochastic gradient descent for binary classification [Poster] — Sina Baghal (University of Waterloo); Paquette Courtney (Ohio State); Stephen Vavasis (University of Waterloo )
- (11) Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate [Poster] — Jingfeng Wu (Johns Hopkins University); Difan Zou (UCLA); Vladimir Braverman (Johns Hopkins University); Quanquan Gu (University of California, Los Angeles)
- (13) A Homotopy Algorithm for Optimal Transport [Poster] — Roozbeh Yousefzadeh (Yale University)
- (18) Reduced-Memory Kalman Based Stochastic Gradient Descent [Poster] — Jinyi Wang (University of Wisconsin-Madison); Vivak Patel (University of Wisconsin-Madison)
- (26) SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions [Poster] — Jing Wang (New York University); Anna Choromanska (NYU)
- (27) Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates [Poster] — Cong Xie (UIUC); Sanmi Koyejo (Illinois/Google); Indranil Gupta (UIUC); Haibin Lin (Amazon Web Service)
- (34) Stochastic mirror descent for fast distributed optimization and federated learning [Poster] — Anastasia Borovykh (Imperial College); Nikolas Kantas (Imperial College London); Panos Parpas (Imperial College London); Grigorios Pavliotis (Imperial College London)
- (36) Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization [Poster] — Stanislaw Jastrzebski (New York University); Devansh Arpit (Salesforce Research); Oliver Astrand (.); Giancarlo Kerg (MILA); Huan Wang (Salesforce Research); Caiming Xiong (Salesforce Research); Richard Socher (Salesforce); Kyunghyun Cho (New York University); Krzysztof Geras (NYU)
- (38) Variance Reduced Stochastic Proximal Algorithm for AUC Maximization [Poster] — Soham Dan (University of Pennsylvania); Dushyant Sahoo (University of Pennsylvania)
- (40) Efficient Designs of SLOPE Penalty Sequences in Finite Dimension [Poster] — Yiliang Zhang (University of Pennsylvania); Zhiqi Bu (UPenn)
- (41) Fair and Interpretable Decision Rules for Binary Classification [Poster] — Connor Lawless (Cornell University); Oktay Gunluk (Cornell University)
- (52) Quasi-Newton’s method in the class gradient defined high-curvature subspace [Poster] — Mark Tuddenham (University of Southampton); Adam Prugel-Bennett (apb@ecs.soton.ac.uk); Jonathon Hare (University of Southampton)
- (53) A Decentralized Proximal Point-type Method for Non-convex Non-concave Saddle Point Problems [Poster] — Weijie Liu (Zhejiang University); Aryan Mokhtari (UT Austin); Asuman Ozdaglar (MIT); Sarath Pattathil (Massachusetts Institute of Technology); Zebang Shen (University of Pennsylvania); Nenggan Zheng (Zhejiang University)
- (56) Non-Negative Matrix Factorization Meets Time-Inhomogeneous Markov Chains [Poster] — Ievgen Redko (Laboratoire Hubert Curien); Amaury Habrard (University of St-Etienne, H. Curien Lab.); Marc Sebban (Jean Monnet University)
- (57) Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability [Poster] — Jeremy Cohen (Carnegie Mellon University); Simran Kaur (Carnegie Mellon University); Yuanzhi Li (CMU); Zico Kolter (Carnegie Mellon University); Ameet Talwalkar (CMU)
- (58) Local Gradient Aggregation for Decentralized Learning from Non-IIDdata [Poster] — Yasaman Esfandiari (Iowa State University); Sin Yong Tan (Iowa State University); Zhanhong Jiang (Johnson Controls International); Aditya Balu (Iowa State University); Chinmay Hegde (New York University); Soumik Sarkar (Iowa State University)
- (63) Two-Level K-FAC Preconditioning for Deep Learning [Poster] — Nikolaos Tselepidis (ETH); Jonas Kohler (ETHZ); Antonio Orvieto (ETH Zurich)
- (64) Adaptive Gradient Tracking In Stochastic Optimization [Poster] — Zhanhong Jiang (Johnson Controls International); Xian Yeow Lee (Iowa State University); Sin Yong Tan (Iowa State University); Aditya Balu (Iowa State University); Young M Lee (Johnson Controls International); Chinmay Hegde (New York University); Soumik Sarkar (Iowa State University)
- (68) Generalised Perceptron Learning [Poster] — Xiaoyu Wang (University of Cambridge); Martin Benning (Queen Mary University of London)
- (69) Identifying Efficient Sub-networks using Mixed Integer Programming [Poster] — Mostafa ElAraby (Universite de Montreal); Guy Wolf (Université de Montréal); Margarida Carvalho (Université de Montréal)
- (70) A Unifying View on Implicit Bias in Training Linear Neural Networks [Poster] — Chulhee Yun (MIT); Shankar Krishnan (Google); Hossein Mobahi (Google Research)
- (78) Second-order optimization for tensors with fixed tensor-train rank [Poster] — Michael Psenka (Princeton University); Nicolas Boumal (EPFL)
- (89) SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation [Poster] — Robert Gower (Telecom Paris Tech); Othmane Sebbouh (ENS Paris); Nicolas Loizou ( Mila, Université de Montréal )
Poster Session 3
Spotlight Presentations (Q&A in Session 3)
- (44) When Does Preconditioning Help or Hurt Generalization? [Poster] — Shun-ichi Amari (RIKEN); Jimmy Ba (University of Toronto); Roger Grosse (University of Toronto); Xuechen Li (Google); Atsushi Nitanda (The University of Tokyo / RIKEN / JST PRESTO); Taiji Suzuki (The University of Tokyo / RIKEN); Denny Wu (University of Toronto & Vector Institute); Ji Xu (Columbia University)
- (71) TenIPS: Inverse Propensity Sampling for Tensor Completion [Poster] — Chengrun Yang (Cornell University); Lijun Ding (Cornell University); Ziyang Wu (Cornell University); Madeleine Udell (Cornell University)
Spotlight Presentations (Q&A in Session 4)
- (39) Convex Programs for Global Optimization of Convolutional Neural Networks in Polynomial-Time [Poster] — Tolga Ergen (Stanford University); Mert Pilanci (Stanford)
- (67) On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [Poster] — Dongruo Zhou (UCLA); Jinghui Chen (UCLA); Yuan Cao (UCLA); Yiqi Tang (OSU); Ziyan Yang (University of Virginia); Quanquan Gu (University of California, Los Angeles)
- (80) Stochastic Damped L-BFGS with Controlled Norm of the Hessian Approximation [Poster] — Sanae Lotfi (Polytechnique Montréal); Tiphaine Bonniot (Polytechnique Montréal); Dominique Orban (Polytechnique Montréal); Andrea Lodi (Polytechnique Montréal)
- (82) A Study of Condition Numbers for First-Order Optimization [Poster] — Charles Guille-Escuret (U. Montreal, Mila); Baptiste Goujaud (Mila); Manuela Girotti (Mila, Universite de Montreal, Concordia University); Ioannis Mitliagkas (Mila, University of Montreal)
- (94) Affine-Invariant Analysis of Frank-Wolfe on Strongly Convex Sets [Poster] — Thomas Kerdreux (INRIA/ ENS); Lewis Liu (Mila & DIRO); Simon Lacoste-Julien (Mila, Université de Montréal); Damien Scieur (Samsung - SAIT AI Lab Montreal)
Additional Poster Presentations in Session 3
- (12) On Regularization of Gradient Descent,Layer Imbalance and Flat Minima. [Poster] — Boris Ginsburg (NVIDIA)
- (20) Online nonnegative CP tensor factorization for Markovian data [Poster] — Christopher Strohmeier (UCLA); Hanbaek Lyu (UCLA); Deanna Needell (UCLA)
- (21) CADA: Communication-Adaptive Distributed Adam [Poster] — Tianyi Chen (Rensselaer Polytechnic Institute); Ziye Guo (Rensselaer Polytechnic Institute); Yuejiao Sun (University of California, Los Angeles); Wotao Yin (University of California, Los Angeles)
- (24) Efficient Hyperparameter Tuning with Dynamic Accuracy Derivative-Free Optimization [Poster] — Matthias Ehrhardt (University of Bath); Lindon Roberts (Australian National University)
- (25) Data augmentation as stochastic optimization [Poster] — Boris Hanin (Texas A&M); Yi Sun (The University of Chicago)
- (28) Asynchronous Federated Optimization [Poster] — Cong Xie (UIUC); Sanmi Koyejo (Illinois/Google); Indranil Gupta (UIUC)
- (29) Hessian Inverse Approximation as Covariance for Random Perturbation in Black-Box Problems [Poster] — Jingyi Zhu (Alibaba Group)
- (66) Retrospective Approximation for Smooth Stochastic Optimization [Poster] — David Newton (Purdue University); Raghu Bollapragada (The University of Texas at Austin); Raghu Pasupathy (Purdue University); Nung Kwan Yip (Purdue University)
- (75) Heuristic Prototype Selection for Regression [Poster] — Debraj Basu (Adobe Inc.); Deepak Pai (Adobe); Joshua Sweetkind-Singer (Adobe)
- (77) Decoupled Greedy Learning of Graph Neural Networks [Poster] — Yewen Wang (UCLA); Jian Tang (HEC Montreal & MILA); Yizhou Sun (UCLA); Guy Wolf (Université de Montréal)
- (79) A FISTA-type average curvature accelerated composite gradient method for nonconvex optimization problems [Poster] — Jiaming Liang (Georgia Institute of Technology); Renato Monteiro (Georgia Institute of Technology)
- (81) Asymptotic Analysis of Sparse Group LASSO via Approximate Message Passing Algorithm [Poster] — Kan Chen (University of Pennsylvania); Shiyun Xu (University of Pennsylvania); Zhiqi Bu (UPenn)
- (83) On Monotonic Linear Interpolation of Neural Network Parameters [Poster] — James Lucas (University of Toronto); Juhan Bae (University of Toronto); Michael Zhang (University of Toronto ); Richard Zemel (University of Toronto); Jimmy Ba (University of Toronto); Roger Grosse (University of Toronto)
- (85) Escaping Saddle Points with Compressed SGD [Poster] — Dmitrii Avdiukhin (Indiana University, Bloomington); Grigory Yaroslavtsev (Indiana University, Bloomington)
- (87) Learning To Combine Quasi-Newton Methods [Poster] — MAOJIA LI (Rochester Institute of Technology); Wotao Yin (University of California, Los Angeles); Jialin Liu (University of California, Los Angeles (UCLA))
- (90) kFW: A Frank-Wolfe style algorithm with stronger subproblem oracles [Poster] — Lijun Ding (Cornell University); Jicong Fan (The Chinese University of Hong Kong (Shenzhen)); Madeleine Udell (Cornell University)
- (91) Trade-offs of Local SGD at Scale: An Empirical Study [Poster] — Jose Javier Gonzalez Ortiz (MIT); Jonathan Frankle (MIT); Ari Morcos (Facebook AI Research); Nicolas Ballas (Facebook FAIR); Mike Rabbat (Facebook FAIR)
- (92) Infinite-Dimensional Game Optimization via Variational Transport [Poster] — Lewis Liu (Mila & DIRO); Yufeng Zhang (Northwestern University); Zhuoran Yang (Princeton); Reza Babanezhad (Samsung); Zhaoran Wang (Northwestern U)
- (93) LEAD: Least-Action Dynamics for Min-Max Optimization [Poster] — Reyhane Askari Hemmat (Mila, Université de Montréal); Amartya Mitra (University of California, Riverside); Guillaume Lajoie (Mila, Université de Montréal); Ioannis Mitliagkas (Mila, University of Montreal)
- (95) Generalization of Quasi-Newton Methods: Application to Robust Symmetric Multisecant Updates [Poster] — Damien Scieur (Samsung - SAIT AI Lab Montreal); Lewis Liu (Mila & DIRO); Thomas Pumir (Princeton); Nicolas Boumal (EPFL)
- (96) One-Point Gradient Estimators for Zeroth-Order Stochastic Gradient Langevin Dydamics [Poster] — Lewis Liu (Mila & DIRO); Zhaoran Wang (Northwestern U)
- (98) A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks [Poster] — Zhiqi Bu (UPenn); Shiyun Xu (University of Pennsylvania); Kan Chen (University of Pennsylvania)
- (100) Deep Residual Partitioning [Poster] — Neal Lawton (UNIV OF SOUTHERN CALIFORNIA); Greg Ver Steeg (USC Information Sciences Institute); Aram Galstyan (USC Information Sciences Institute)
- (101) Optimal Nonsmooth Frank-Wolfe method for Stochastic Regret Minimization [Poster] — Kiran Thekumparampil (University of Illinois at Urbana-Champaign); Prateek Jain (Microsoft Research); Praneeth Netrapalli (Microsoft Research); Sewoong Oh (University of Washington)