OPT 2022: Optimization for Machine Learning

OPT2022

We welcome you to participate in the 14th International OPT Workshop on Optimization for Machine Learning, to be held as a part of the NeurIPS 2022 conference. This year we particularly encourage (but not limit) submissions in the area of Reliable Optimization Methods for ML.

We are looking forward to an exciting in-person OPT!

Schedule

The schedule is available on the NeurIPS virtual platform.

Time	Speaker	Title
	Session 1 (Moderator: Courtney Paquette)
8:50am-9:00am	Organizers	Welcome Remarks
9:00am-9:30am	Katya Scheinberg (Cornell)	Stochastic Oracles and Where to Find Them	[abstract]
Continuous optimization is a mature field, which has recently undergone major expansion and change. One of the key new directions is the development of methods that do not require exact information about the objective function. Nevertheless, the majority of these methods, from stochastic gradient descent to "zero-th order" methods use some kind of approximate first order information. We will introduce a general definition of a stochastic oracle and show how this definition applies in a variety of familiar settings, including simple stochastic gradient via sampling, traditional and randomized finite difference methods, but also more specialized settings, such as robust gradient estimation. We will also overview several stochastic methods and how the general definition extends to the oracles used by these methods.
9:30am-10:00am	Contributed talks	Tian Li: Differentially Private Adaptive Optimization with Delayed Preconditioners Guy Kornowski: On the Complexity of Finding Small Subgradients in Nonsmooth Optimization	[papers]
10:00am-11:00am	Poster Session 1		[posters]
	Session 2 (Moderator: Quanquan Gu)
11:00am-11:30am	Contributed talks	Aaron Defazio: Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass Jiajin Li: Nonsmooth Composite Nonconvex-Concave Minimax Optimization	[papers]
11:30am-12:00pm	Niao He (ETH Zurich)	Simple Fixes for Adaptive Gradient Methods for Nonconvex Min-Max Optimization	[abstract]
Adaptive gradient methods such as AdaGrad and Adam have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner and are successful in nonconvex minimization. When it comes to nonconvex minimax optimization, direct extensions of such adaptive optimizers without proper time-scale separation may fail to work in practice. In fact, even for a quadratic example, the naive combination of Gradient Descent Ascent with any existing adaptive stepsizes is proven to diverge if the initial primal-dual stepsize ratio is not carefully chosen. We introduce two simple fixes for these adaptive methods, allowing automatic adaptation to the time-scale separation necessary for fast convergence. The resulting algorithms are fully parameter-agnostic and achieve near-optimal complexities in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems, without a priori knowledge about problem-specific parameters. This is based on joint work with Junchi Yang and Xiang Li.
12:00pm-02:00pm	Lunch
	Session 3 (Moderator: Cristóbal Guzmán)
02:00pm-02:30pm	Zico Kolter (CMU)	Adapt like you train: How optimization at training time affects model finetuning and adaptation	[abstract]
With the growing use of large-scale machine learning models pretrained on massive datasets, it is becoming increasingly important to understand how we can efficiently adapt these models to downstream tasks at test time. In this talk, I will discuss our recent work that highlights an important but often overlooked factor in this process: specifically, we have found in several cases that the _loss function_ used to train the model has important implications as to the best way to finetune or adapt the model. I will highlight two specific examples of this phenomenon: 1) illustrating that using contrastive loss outperforms alternatives for fine-tuning contrastively-pretrained vision-language models; and 2) showing how we can leverage the convex conjugate of the training loss to perform label-free test time adaptation. I will end by highlighting open questions and directions for this work.
02:30pm-03:15pm	Contributed talks	Fangshuo Liao: Strong Lottery Ticket Hypothesis with ε–Perturbation Vishwak Srinivasan: Sufficient conditions for non-asymptotic convergence of Riemannian optimization methods Zhiyuan Li: How Does Sharpness-Aware Minimization Minimizes Sharpness?	[papers]
03:15pm-03:45pm	Aaron Sidford (Stanford)	Efficiently Minimizing the Maximum Loss	[abstract]
In this talk I will discuss recent advances in the fundamental robust optimization problem of minimizing the maximum of a finite number of convex loss functions. In particular I will show how to develop stochastic methods for approximately solving this problem with a near-optimal number of gradient queries. Along the way, I will cover several optimization techniques of broader utility, including accelerated methods for using ball-optimization oracles and stochastic bias-reduced gradient methods. This talk will include joint work with Hilal Asi, Yair Carmon, Arun Jambulapati, and Yujia Jin including https://arxiv.org/abs/2105.01778 and https://arxiv.org/abs/2106.09481.
03:45pm-03:50pm	Courtney Paquette	Closing Remarks
03:50pm-04:50pm	Poster Session 2		[posters]

OPT2022

Site menu:

OPT2022

Schedule

Session 1 (Moderator: Courtney Paquette)

Session 2 (Moderator: Quanquan Gu)

Session 3 (Moderator: Cristóbal Guzmán)

OPTML Links

Sponsors

Past Speakers