6:15am-6:55am | Welcome social event in gather.town (lounge).
| Login and say hello! |
| Session 1 (Moderator: Sebastian Stich, co-moderator: Martin Takáč)
|
6:55am-7:00am | Organizers
| Opening Remarks to Session 1 |
7:00am-7:30am | Shai Shalev-Shwartz (Hebrew University of Jerusalem) (video stream)
| Deep Learning: Success, Failure, and the Border between them | [abstract] |
Deep learning revolutionized computer vision, speech recognition, natural language understanding, and more. However, from the theoretical
perspective we know that training neural networks is computationally hard and one can construct distributions on which deep learning fails. The talk will propose several parameters of distributions that can move them from being easy-to-train to being hard-to-train.
|
7:30am-8:00am | Martin Jaggi (EPFL) (video stream)
| Learning with Strange Gradients | [abstract] |
Gradient methods form the foundation of current machine learning. A vast literature covers the use of stochastic gradients being simple unbiased estimators of the full gradient of our objective. In this talk, we discuss four applications motivated from practical machine learning, where this key assumption is violated, and show new ways to cope with gradients which are only loosely related to the original objective. We demonstrate that algorithms with rigorous convergence guarantees can still be obtained in such settings, for
1) federated learning on heterogeneous data,
2) personalized collaborative learning,
3) masked training of neural networks with partial gradients,
4) learning with malicious participants, in the sense of Byzantine robust training.
|
8:00am-8:30am | Contributed talks (live stream)
|
Oral (10min):
- Futong Liu (EPFL): Understanding Memorization from the Perspective of Optimization via Efficient Influence Estimation
Spotlights (5min):
- Abdurakhmon Sadiev (MIPT): Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes
- Frederik Benzing (ETH Zurich): Fast, Exact Subsampled Natural Gradients and First-Order KFAC
- Simon Roburin (valeo.ai / imagine ENPC): Spherical Perspective on Learning with Normalization Layers
| [papers] |
8:30am-9:55am | Poster Session 1 and Break
| Join for the poster session on gather.town (room 1).
Authors will present their work.
| [posters] |
| Session 2 (Moderator: Katya Scheinberg, co-moderator: Courtney Paquette)
|
9:55am-10:00am | Organizers
| Opening Remarks |
10:00am-10:30am | Coralia Cartis (University of Oxford) (video stream)
| The global optimization of functions with low effective dimension - better than worst-case? | [abstract] |
We investigate the unconstrained and constrained global optimization of functions with low effective dimension, which are constant along an (unknown) linear subspace and only vary over the effective (complement) subspace. Known also as multi-ridge or active subspace objectives, these functions appear frequently in applications, such as in those involving some complex engineering simulations, overly parametrised models, and more. We aim to implicitly explore the intrinsic low dimensionality of the constrained/unconstrained landscape using (feasible) random embeddings, in order to understand and improve the scalability of algorithms for the global optimization of these special-structure problems. We show that for these functions, the convergence and complexity of a random embedding algorithmic framework
can potentially be exponentially better than if no special structure was present, and furthermore, under certain assumptions/problems, they are independent of the ambient dimension. Our numerical experiments on functions with low effective dimension illustrate the improved scalability of the framework when coupled with state-of-the-art global — and even local — optimization solvers for the subproblems. Our analysis uses tools from random matrix theory, stochastic optimization and conic integral geometry.
Authors: Coralia Cartis, Estelle Massart and Adilet Otemissov (Mathematical Institute, University of Oxford and Alan Turing Institute, London)
Papers:
https://arxiv.org/pdf/2003.09673.pdf,
https://arxiv.org/pdf/2009.10446.pdf,
https://arxiv.org/pdf/2107.12102.pdf
|
10:30am-11:00am | Cristóbal Guzmán (University of Twente) (video stream)
| Non-Euclidean Differentially Private Stochastic Convex Optimization | [abstract] |
Ensuring privacy of users' data in machine learning models has become a crucial requirement in multiple domains. In this respect, differential privacy (DP) is the gold standard, due to its general and rigorous privacy guarantees, as well as its high composability. For the particular case of stochastic convex optimization (SCO), recent efforts have established optimal rates for the excess risk under differential privacy in Euclidean setups. These bounds suffer a polynomial degradation of accuracy with respect to the dimension, which limits their applicability in high-dimensional settings. In this talk, I will present nearly-dimension independent rates on the excess risk for DP-SCO in the $\ell_1$ setup, as well as the investigation of more general $\ell_p$ setups, where $1\leq p\leq \infty$. Based on joint work with Raef Bassily and Anupama Nandi.
|
11:00am-11:30am | Contributed talks (live stream)
|
Oral (10min):
- Junchi Li (UC Berkeley): On the convergence of stochastic extragradient for bilinear games using restarted iteration averaging
Spotlights (5min):
- Jeffery Kline (American Family Insurance): Farkas' Theorem of the Alternative for Prior Knowledge in Deep Networks
- Lyle Kim (Rice University): Fast, Acceleration and Stability of the Stochastic Proximal Point Algorithm
- Pascal Esser (Technical University of Munich): Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds
| [papers] |
11:30am-12:55pm | Break
| Join the gather.town lounge, or the poster rooms,
(room 1)
(room 2) for discussions with other participants.
| [posters] |
| Session 3 (Moderator: Oliver Hinder, co-moderator: Courtney Paquette)
|
12:55pm-1:00pm | Organizers
| Opening Remarks |
1:00pm-1:30pm | Damek Davis (Cornell University) (video stream)
| Avoiding saddle points in nonsmooth optimization | [abstract] |
We introduce a geometrically transparent strict saddle property for nonsmooth functions. When present, this property guarantees that simple randomly initialized proximal algorithms on weakly convex problems converge only to local minimizers. We argue that the strict saddle property may be a realistic assumption in applications since it provably holds for generic semi-algebraic optimization problems. Finally, we close the talk with an extension of the result to "perturbed" subgradient methods.
|
1:30pm-2:00pm | Jelena Diakonikolas (University of Wisconsin-Madison) (video stream)
| Faster Empirical Risk Minimization | [abstract] |
Empirical Risk Minimization (ERM) problems are central to machine learning, and their
efficient optimization has been studied from different perspectives, often taking advantage of
the finite sum structure present in typical problem formulations. In particular, tight oracle
complexity bounds have been established under fairly general assumptions about the loss
functions. In this talk, I will present a rather surprising and general result that takes advantage
of the separability of nonsmooth convex loss functions with efficiently computable proximal
operators -- such as, e.g., the hinge loss and the sum of absolute errors -- to obtain an algorithm
that exhibits significantly lower complexity than what is predicted by the lower bounds for
general nonsmooth convex losses. I will then talk about how this result can be further improved
for problems that can be stated in a form close to a linear program and discuss how these
results relate to robust empirical risk minimization. The talk is based on joint results with
Chaobing Song, Eric Lin, and Stephen Wright.
|
2:00pm-2:30pm | Contributed talks (live stream)
|
Oral (10min):
- Wenhao Zhan (Princeton University): Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence
Spotlights (5min):
- Akhilesh Soni (University of Wisconsin-Madison): Integer Programming Approaches To Subspace Clustering With Missing Data
- Boyue Li (Carnegie Mellon University): DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization
- Grigory Malinovsky (KAUST): Better Linear Rates for SGD with Data Shuffling
| [papers] |
2:30pm-15:55pm | Poster Session 2 and Break
| Join for the poster session on gather.town (room 2).
Authors will present their work.
| [posters] |
| Session 4 (Moderator: Quanquan Gu, co-moderator: Oliver Hinder)
|
3:55pm-4:00pm | Organizers
| Opening Remarks |
4:00pm-4:30pm | Yinyu Ye (Stanford University) (video stream)
| Online Learning via Linear Programming | [abstract] |
A natural optimization model that formulates many online resource allocations and decision-making problems is online linear programming (OLP) where the constraint matrix, along with the objective coefficients and decision variables, are revealed and decided column by column sequentially. We review the near optimal algorithms and theories for solving this surprisingly general class of online problems under the assumption of random order of arrival and/or iid distributions of the input data. Then we present few recent applications of the model/algorithm, including a fast online algorithm as a pre-solver for solving large-scale offline (binary) LPs, an interior-point online algorithm to address “fairness” for resource allocation, and a provable logarithmic regret bound for the Bandits with Knapsacks (BwK) problem.
|
4:30pm-5:00pm | Michael Mahoney (UC Berkeley) (video stream)
| Putting Randomized Matrix Algorithms in LAPACK, and Connections with Second-order Stochastic Optimization | [abstract] |
LAPACK (Linear Algebra PACKage) is a widely-used high-quality software library for numerical linear algebra that provides routines for solving systems of linear equations, linear least squares, eigenvalue problems, singular value decomposition, and matrix factorizations such as LU, QR, Cholesky and Schur decomposition. Randomized Numerical Linear Algebra (RandNLA) is an interdisciplinary research area that exploits randomization as a computational resource to develop improved algorithms for large-scale linear algebra problems. In addition to providing some of the best linear algebra algorithms (in worst-case theory, numerical implementations, and non-trivial implicit statistical properties and machine learning implementations), RandNLA techniques are the basis for the best stochastic second-order optimization algorithms (such as SubSampled Newton's methods, Iterative Hessian Sketch, and Newton-LESS). The time has come to put RandNLA methods into the next generation of LAPACK, and we have begun to do that. We will present our high level plan to introduce RandNLA algorithms into LAPACK. The RandLAPACK library will implement state of the art randomized algorithms for problems such as low-rank approximation and least-squares, but its full scope will be larger than linear algebra per se. It will include certain higher-level primitives in optimization and machine learning that require judicious use of RandNLA. We will describe building blocks, the modular design that will help RandLAPACK evolve with the field of RandNLA (as well as the needs of machine learning, scientific computing, and other users) over time, as well as connections and implications for machine learning optimization. Joint work with Riley Murray, Jim Demmel, and others.
|
5:00pm-5:30pm | Contributed talks (live stream)
|
Oral (10min):
- Agnieszka Słowik (University of Cambridge): On the Relation between Distributionally Robust Optimization and Data Curation
Spotlights (5min):
- Jacques Chen (University of British Columbia): Heavy-tailed noise does not explain the gap between SGD and Adam on Transformers
- Neha Wadia (UC Berkeley): Optimization with Adaptive Step Size Selection from a Dynamical Systems Perspective
- Difan Zou (UC Los Angeles): Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
| [papers] |
| Closing Remarks
|
5:30pm-5:35pm | Organizers
| Closing Remarks
| |