Site menu:
OPT2021
We welcome you to participate in the 13th International (Virtual) OPT Workshop on Optimization for Machine Learning, to be held as a part of the NeurIPS 2021 conference. This year we particularly encourage (but not limit) submissions in the area of Beyond Worst-case Complexity.
We are looking forward to an exciting OPT 2021!
Accepted Papers
- Poster Sessions will be held on our gather.town.
- Link to the Poster Session: (room 1) (room 2)
- (some links do not work, as the papers have not yet been submitted on CMT)
Poster Session 1 - (room 1) 8:30-10:00
- Gaussian Graphical Models as an Ensemble Method for Distributed Gaussian Processes — Hamed Jalali (University of Tuebingen); Gjergji Kasneci (University of Tuebingen)
- DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning — Robert Hönig (University of Cambridge); Yiren Zhao (University of Cambridge); Robert Mullins (University of Cambridge)
- Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training — Maximus Mutschler (University of Tübingen); Andreas Zell (University of Tuebingen)
- COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization — Manuel Madeira (Instituto Superior Técnico); Renato Negrinho (Carnegie Mellon University); João Xavier (Institute for Systems and Robotics (ISR/IST), LARSyS, Instituto Superior Técnico, Univ Lisboa); Pedro Aguiar (Instituto Superior Técnico)
- Random-reshuffled SARAH does not need full gradient computations — Aleksandr Beznosikov (Moscow Institute of Physics and Technology); Martin Takac (Mohamed bin Zayed University of Artificial Intelligence)
- Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes — Abdurakhmon Sadiev (Moscow Institute of Physics and Technology); Ekaterina Borodich (MIPT); Darina Dvinskikh (Weierstrass Institute for Applied Analysis and Stochastics); Aleksandr Beznosikov (Moscow Institute of Physics and Technology); Alexander Gasnikov (Moscow Institute of Physics and Technology)
- Shifted Compression Framework: Generalizations and Improvements — Egor Shulgin (KAUST); Peter Richtarik (KAUST)
- Faking Interpolation Until You Make It — Alasdair J Paren (University of Oxford); Rudra Poudel (Toshiba Research); M. Pawan Kumar (University of Oxford)
- Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds — Pascal M Esser (Technical University of Munich); Frank Nielsen (Sony CS Labs Inc.)
- Spherical Perspective on Learning with Normalization Layers — Simon W Roburin (valeo.ai/imagine ENPC); Yann de Mont-Marin (Inria); Andrei Bursuc (valeo.ai); Renaud Marlet (École des Ponts ParisTech); Patrick Pérez (Valeo.ai); Mathieu Aubry (École des Ponts ParisTech)
- Adaptive Optimization with Examplewise Gradients — Julius Kunze (University College London); James Townsend (University College London); David Barber (University College London)
- On the Relation between Distributionally Robust Optimization and Data Curation — Agnieszka Słowik (Department of Computer Science and Technology University of Cambridge); Leon Bottou (Facebook)
- Fast, Exact Subsampled Natural Gradients and First-Order KFAC — Frederik Benzing (ETH Zurich)
- Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation — Futong Liu (EPFL); Tao Lin (EPFL); Martin Jaggi (EPFL)
- Community-based Layerwise Distributed Training of Graph Convolutional Networks — Hongyi Li (Xidian University); Junxiang Wang (Emory University); Yongchao Wang (Xidian University); Yue Cheng (George Mason University); Zhao Liang (Emory University)
- A New Scheme for Boosting with an Avarage Margin Distribution Oracle — Ryotaro Mitsuboshi (Kyushu University); Kohei Hatano (Kyushu University/RIKEN AIP); Eiji Takimoto (Kyushu University)
- Better Linear Rates for SGD with Data Shuffling — Grigory Malinovsky (KAUST); Alibek Sailanbayev (KAUST); Peter Richtarik (KAUST)
- Structured Low-Rank Tensor Learning — Jayadev Naram (IIT Hyderabad); Tanmay K Sinha (IIT Hyderabad); Pawan Kumar (IIIT Hyderabad)
- ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method — Zhize Li (KAUST)
- EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback — Peter Richtarik (KAUST); Igor Sokolov (KAUST); Ilyas Fatkhullin (Technical University of Munich); Eduard Gorbunov (Moscow Institute of Physics and Technology); Zhize Li (KAUST)
- On Server-Side Stepsizes in Federated Optimization: Theory Explaining the Heuristics — Grigory Malinovsky (KAUST); Konstantin Mishchenko (Inria); Peter Richtarik (KAUST)
Poster Session 2 - (room 2) 14:30-16:00
- Barzilai and Borwein conjugate gradient method equipped with a non-monotone line search technique — Sajad Fathi Hafshejani (University of Lethbridge); Daya Gaur (University of Lethbridge); Shahadat Hossain (University of Lethbridge); Robert Benkoczi (University of Lethbridge)
- Optimum-statistical Collaboration Towards Efficient Black-boxOptimization — Wenjie Li (Purdue University); Chi-Hua Wang (Purdue University); Guang Cheng (Purdue University)
- Integer Programming Approaches To Subspace Clustering With Missing Data — Akhilesh Soni (University of Wisconsin-Madison); Jeff Linderoth (University of Wisconsin-Madison); Jim Luedtke (University of Wisconsin-Madison); Daniel L Pimentel-Alarcon (University of Wisconsin-Madison)
- On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging — Chris Junchi Li (University of California, Berkeley); Yaodong Yu (University of California, Berkeley); Nicolas Loizou ( Mila, Université de Montréal); Gauthier Gidel (Mila, Université de Montréal); Yi Ma (University of California, Berkeley); Nicolas Le Roux (Microsoft); Michael Jordan (University of California, Berkeley)
- Stochastic Learning Equation using Monotone Increasing Resolution of Quantization — Jinwuk Seok (ETRI); Jeong-Si Kim (ETRI)
- Sign-RIP: A Robust Restricted Isometry Property for Low-rank Matrix Recovery — Jianhao Ma (University of Michigan); Salar Fattahi (University of Michigan)
- A Novel Convergence Analysis for Algorithms of the Adam Family — Zhishuai Guo (The University of Iowa); Yi Xu (Alibaba Group); Wotao Yin (Alibaba US, DAMO Academy); Rong Jin (alibaba group); Tianbao Yang (University of Iowa)
- Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks — Suhaas Bhat (Harvard University); Jeffery Kline (American Family Insurance); Glenn M Fung (American Family Insurance)
- Towards Robust and Automatic Hyper-Parameter Tunning — Mathieu Tuli (University of Toronto); Mahdi S. Hosseini (University of New Brunswick); Konstantinos N Plataniotis (UofT)
- The Geometric Occam Razor Implicit in Deep Learning — Benoit Dherin (Google); Michael Munn (Google); David Barrett (DeepMind)
- Escaping Local Minima With Stochastic Noise — Harsh Vardhan (UCSD); Sebastian Stich (CISPA)
- Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective — Neha S Wadia (University of California, Berkeley); Michael Jordan (University of California, Berkeley); Michael Muehlebach (Max Planck Institute for Intelligent Systems)
- High Probability Step Size Lower Bound for Adaptive Stochastic Optimization — Billy Jin (Cornell University); Katya Scheinberg (Cornell University); Miaolan Xie (Cornell University)
- Stochastic Polyak Stepsize with a Moving Target — Robert M Gower (Telecom Paris Tech); Aaron Defazio (Facebook FAIR); Mike Rabbat (Facebook FAIR)
- Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations — Tatjana Chavdarova (University of California, Berkeley); Michael Jordan (University of California, Berkeley); Manolis Zampetakis (MIT)
- Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent — Sharan Vaswani (University of Alberta); Benjamin Dubois-Taine (Universite Paris-Saclay); Reza Babanezhad (Samsung)
- A Stochastic Momentum Method for Min-max Bilevel Optimization — Quanqi Hu (University of Iowa); Bokun Wang (The University of Iowa); Tianbao Yang (University of Iowa)
- Deep Neural Networks pruning via the Structured Perspective Regularization — Matteo Cacciola (Polytechnique Montreal); Andrea Lodi (Polytechnique Montréal); Antonio Frangioni (University of Pisa); Xinlin Li (Huawei Noah's Ark Lab)
- Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization — Difan Zou (University of California, Los Angeles); Yuan Cao (The University of Hong Kong); Yuanzhi Li (CMU); Quanquan Gu (University of California, Los Angeles)
- Efficient Calibration of Multi-Agent Market Simulators from Time Series with Bayesian Optimization — Yuanlu Bai (Columbia University); Svitlana Vyetrenko (J. P. Morgan Chase); Henry Lam (Columbia University); Tucker Balch (JP Morgan)
- DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization — Boyue Li (Carnegie Mellon University); Zhize Li (KAUST); Yuejie Chi (Carnegie Mellon University)
- Faster Perturbed Stochastic Gradient Methods for Finding Local Minima — Zixiang Chen (University of California, Los Angeles); Dongruo Zhou (University of California, Los Angeles); Quanquan Gu (University of California, Los Angeles)
- Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence — Wenhao Zhan (Princeton University); Shicong Cen (CMU); Baihe Huang (Peking University); Yuxin Chen (Princeton University); Jason Lee (Princeton University); Yuejie Chi (Carnegie Mellon University)
- Adam vs. SGD: Closing the generalization gap on image classification — Aman Gupta (LinkedIn Corporation); Rohan Ramanath (LinkedIn Corporation); Jun Shi (LinkedIn Corporation); S. Sathiya Keerthi (LinkedIn Corporation)
- Simulated Annealing for Neural Architecture Search — Shentong Mo (Carnegie Mellon University); Jingfei Xia (Carnegie Mellon University); Pinxu Ren (Carnegie Mellon University)
- Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers — Jacques Chen (University of British Columbia); Frederik Kunstner (University of British Columbia); Mark Schmidt (University of British Columbia)
- Acceleration and Stability of the Stochastic Proximal Point Algorithm — Junhyung Lyle Kim (Rice University); Panos Toulis (Chicago Booth School of Business); Anastasios Kyrillidis (Rice University)
- Faster Quasi-Newton Methods for Linear Composition Problems — Betty Shea (University of British Columbia); Mark Schmidt (University of British Columbia)