Site menu:
OPT2022
We welcome you to participate in the 14th International OPT Workshop on Optimization for Machine Learning, to be held as a part of the NeurIPS 2022 conference.
This year we particularly encourage (but not limit) submissions in the area of Reliable Optimization Methods for ML.
We are looking forward to an exciting in-person OPT!
Schedule
The schedule is available on the
NeurIPS virtual platform.
Time |
Speaker |
Title |
|
| Session 1 (Moderator: Courtney Paquette)
|
8:50am-9:00am | Organizers
| Welcome Remarks |
9:00am-9:30am | Katya Scheinberg (Cornell)
| Stochastic Oracles and Where to Find Them | [abstract] |
Continuous optimization is a mature field, which has recently undergone major expansion and change. One of the key new directions is the development of methods that do not require exact information about the objective function. Nevertheless, the majority of these methods, from stochastic gradient descent to "zero-th order" methods use some kind of approximate first order information. We will introduce a general definition of a stochastic oracle and show how this definition applies in a variety of familiar settings, including simple stochastic gradient via sampling, traditional and randomized finite difference methods, but also more specialized settings, such as robust gradient estimation. We will also overview several stochastic methods and how the general definition extends to the oracles used by these methods.
|
9:30am-10:00am | Contributed talks
|
- Tian Li: Differentially Private Adaptive Optimization with Delayed Preconditioners
- Guy Kornowski: On the Complexity of Finding Small Subgradients in Nonsmooth Optimization
| [papers] |
10:00am-11:00am | Poster Session 1
|
| [posters] |
| Session 2 (Moderator: Quanquan Gu)
|
11:00am-11:30am | Contributed talks
|
- Aaron Defazio: Parameter Free Dual Averaging: Optimizing Lipschitz Functions in a Single Pass
- Jiajin Li: Nonsmooth Composite Nonconvex-Concave Minimax Optimization
| [papers] |
11:30am-12:00pm | Niao He (ETH Zurich)
| Simple Fixes for Adaptive Gradient Methods for Nonconvex Min-Max Optimization | [abstract] |
Adaptive gradient methods such as AdaGrad and Adam have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner and are successful in nonconvex minimization. When it comes to nonconvex minimax optimization, direct extensions of such adaptive optimizers without proper time-scale separation may fail to work in practice. In fact, even for a quadratic example, the naive combination of Gradient Descent Ascent with any existing adaptive stepsizes is proven to diverge if the initial primal-dual stepsize ratio is not carefully chosen. We introduce two simple fixes for these adaptive methods, allowing automatic adaptation to the time-scale separation necessary for fast convergence. The resulting algorithms are fully parameter-agnostic and achieve near-optimal complexities in deterministic and stochastic settings of nonconvex-strongly-concave minimax problems, without a priori knowledge about problem-specific parameters. This is based on joint work with Junchi Yang and Xiang Li.
|
12:00pm-02:00pm | Lunch
|
| |
| Session 3 (Moderator: Cristóbal Guzmán)
|
02:00pm-02:30pm | Zico Kolter (CMU)
| Adapt like you train: How optimization at training time affects model finetuning and adaptation | [abstract] |
With the growing use of large-scale machine learning models pretrained on massive datasets, it is becoming increasingly important to understand how we can efficiently adapt these models to downstream tasks at test time. In this talk, I will discuss our recent work that highlights an important but often overlooked factor in this process: specifically, we have found in several cases that the _loss function_ used to train the model has important implications as to the best way to finetune or adapt the model. I will highlight two specific examples of this phenomenon: 1) illustrating that using contrastive loss outperforms alternatives for fine-tuning contrastively-pretrained vision-language models; and 2) showing how we can leverage the convex conjugate of the training loss to perform label-free test time adaptation. I will end by highlighting open questions and directions for this work.
|
02:30pm-03:15pm | Contributed talks
|
- Fangshuo Liao: Strong Lottery Ticket Hypothesis with ε–Perturbation
- Vishwak Srinivasan: Sufficient conditions for non-asymptotic convergence of Riemannian optimization methods
- Zhiyuan Li: How Does Sharpness-Aware Minimization Minimizes Sharpness?
| [papers] |
03:15pm-03:45pm | Aaron Sidford (Stanford)
| Efficiently Minimizing the Maximum Loss | [abstract] |
In this talk I will discuss recent advances in the fundamental robust optimization problem of minimizing the maximum of a finite number of convex loss functions. In particular I will show how to develop stochastic methods for approximately solving this problem with a near-optimal number of gradient queries. Along the way, I will cover several optimization techniques of broader utility, including accelerated methods for using ball-optimization oracles and stochastic bias-reduced gradient methods.
This talk will include joint work with Hilal Asi, Yair Carmon, Arun Jambulapati, and Yujia Jin including https://arxiv.org/abs/2105.01778 and https://arxiv.org/abs/2106.09481.
|
03:45pm-03:50pm | Courtney Paquette
| Closing Remarks |
03:50pm-04:50pm | Poster Session 2
|
| [posters] |