Site menu:
OPT2024
We welcome you to participate in the 16th International OPT Workshop on Optimization for Machine Learning, to be held as a part of the NeurIPS 2024 conference. This year we particularly encourage (but not limit) submissions with a focus on "scaling up optimization".
We are looking forward to an exciting OPT!
Accepted Papers
Accepted Papers (oral)
- On the Inherent Privacy of Two Point Zeroth Order Projected Gradient Descent — Devansh Gupta (University of Southern California), Meisam Razaviyayn (University of Southern California), Vatsal Sharan (University of Southern California)
- The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization — Matan Schliserman (Tel Aviv University), Uri Sherman (Tel Aviv University), Tomer Koren (Tel Aviv University)
- SOAP: Improving and Stabilizing Shampoo using Adam — Nikhil Vyas (Harvard University), Depen Morwani (Harvard University, Harvard University), Rosie Zhao (Harvard University, Harvard University), Itai Shapira (Harvard University, Harvard University), David Brandfonbrener (Harvard University), Lucas Janson (Harvard University), Sham M. Kakade (Harvard University)
- $\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers — Benjamin Thérien (Université de Montréal), Charles-Étienne Joseph (Université de Montréal), Boris Knyazev (Samsung), Edouard Oyallon (CNRS), Irina Rish (University of Montreal), Eugene Belilovsky (Concordia University, Montreal)
- MindFlayer: Efficient Asynchronous Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times — Arto Maranjyan (King Abdullah University of Science and Technology), Omar Shaikh Omar (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Provable non-accelerations of the heavy-ball method — Baptiste Goujaud (École polytechnique), Adrien Taylor (Inria), Aymeric Dieuleveut (École polytechnique)
Accepted Papers (poster)
- Fast decentralized gradient tracking for federated learning with local updates: From mini to minimax optimization — Chris Junchi Li (University of California Berkeley)
- Batch size invariant Adam — Xi Wang (University of Massachusetts, Amherst), Laurence Aitchison (University of Bristol)
- Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time — Yingyu Liang (University of Hong Kong), Zhizhou Sha (Tsinghua University, Tsinghua University), Zhenmei Shi (Google), Zhao Song (University of California, Berkeley), Yufa Zhou (University of Pennsylvania)
- Second-Order Forward-Mode Automatic Differentiation for Optimization — Adam D. Cobb (SRI International), Atilim Gunes Baydin (University of Oxford), Barak A. Pearlmutter (Maynooth University), Susmit Jha (SRI International)
- An Elementary Predictor Obtaining 2\sqrt{T} Distance to Calibration — Eshwar Ram Arunachaleswaran (University of Pennsylvania, University of Pennsylvania), Natalie Collina (University of Pennsylvania), Aaron Roth (Amazon), Mirah Shi (University of Pennsylvania, University of Pennsylvania)
- Stochastic Quasi-Variational Inequalities: Convergence Analysis Beyond Strong Monotonicity — Zeinab Alizadeh (University of Arizona), Afrooz Jalilzadeh (University of Arizona)
- On the Hardness of Meaningful Local Guarantees in Nonsmooth Nonconvex Optimization — Guy Kornowski (Weizmann Institute of Science), Swati Padmanabhan (Massachusetts Institute of Technology), Ohad Shamir (Weizmann Institute)
- Nonmonotone Line Searches Operate at the Edge of Stability — Curtis Fox (University of British Columbia), Leonardo Galli (Ludwig-Maximilians-Universität München), Mark Schmidt (University of Alberta), Holger Rauhut (Ludwig-Maximilians-Universität München)
- Dual Feature Reduction for the Sparse-Group Lasso and its Adaptive Variant — Fabio Feser (Imperial College London), Marina Evangelou (Imperial College London)
- Multimodal Federated Learning with Model Personalization — Ratun Rahman (University of Alabama at Huntsville), Dinh C.Nguyen (University of Alabama at Huntsville)
- DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction — Xinwei Zhang (University of Southern California), Zhiqi Bu (Amazon), Borja Balle (DeepMind), Mingyi Hong (Amazon), Meisam Razaviyayn (University of Southern California), Vahab Mirrokni (Google Research)
- A Stochastic Algorithm for Sinkhorn Distance-Regularized Distributionally Robust Optimization — Yufeng Yang (Texas A&M University - College Station), Yi Zhou (Texas A&M University - College Station), Zhaosong Lu (University of Minnesota - Twin Cities)
- DADA: Dual Averaging with Distance Adaptation — Mohammad Moshtaghifar (Sharif University of Technology), Anton Rodomanov (CISPA), Daniil Vankov (Arizona State University), Sebastian U Stich (CISPA Helmholtz Center for Information Security)
- Multi Objective Regionalized Bayesian Optimization via Entropy Search — Thomas James (Digital University Kerala), Sinnu Thomas (Digital University Kerala (formerly IIITMK))
- High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems — Andrew Nicholas Cheng (Harvard University, Harvard University), Kiwon Lee (McGill University, McGill University), Courtney Paquette (McGill University)
- Applications of fractional calculus in learned optimization — Teodor Alexandru Szente (Institute of Mathematics of the Romanian Academy), James Harrison (Google), Mihai Zanfir (Newton), Cristian Sminchisescu (Lund University)
- Efficient Levenberg-Marquardt for SLAM — Amir Belder (Technion, Technion), Refael Vivanti (Facebook)
- Estimating Vote Choice in U.S. Elections with Approximate Poisson-Binomial Logistic Regression — Nic Fishman (Oxofrd, University of Oxford), Evan Rosenman (Claremont McKenna College)
- Online Nonconvex Bilevel Optimization with Bregman Divergences — Jason Bohne (Bloomberg), David S Rosenberg (Bloomberg), Gary Kazantsev (Columbia University), Pawel Polak (State University of New York at Stony Brook)
- Hierarchical Simplicity Bias of Neural Networks — Zhehang Du (The Wharton School, University of Pennsylvania)
- The Crucial Role of Samplers in Online Direct Preference Optimization — Ruizhe Shi (Tsinghua University, Tsinghua University), Runlong Zhou (Department of Computer Science, University of Washington), Simon Shaolei Du (University of Washington)
- AdEMAMix: Better and Faster Training with Older Gradients — Matteo Pagliardini (Swiss Federal Institute of Technology Lausanne), Pierre Ablin (Apple), David Grangier (Apple)
- Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior — Anming Gu (Boston University), Edward Chien (Boston University), Kristjan Greenewald (MIT-IBM Watson AI Lab, IBM Research)
- Nonlinear tomographic reconstruction via nonsmooth optimization — Vasileios Charisopoulos (University of Chicago), Rebecca Willett (University of Chicago)
- Aligned Multi-Objective Optimization — Yonathan Efroni (Meta), Daniel R. Jiang (Meta), Ben Kretzu (Technion - Israel Institute of Technology, Technion), Jalaj Bhandari (Facebook), Zheqing Zhu (Meta (Facebook) AI), Karen Ullrich (Meta AI)
- A fast and efficient randomized quasi-Newton method — Danny Duan (University of Wisconsin - Madison), Hanbaek Lyu (University of Wisconsin, Madison)
- Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback — Jing Dong (The Chinese University of Hong Kong, Shenzhen), Baoxiang Wang (The Chinese University of Hong Kong, Shenzhen), Yaoliang Yu (University of Waterloo)
- Spurious Stationarity and Hardness Results for Mirror Descent — He Chen (Chinese University of Hong Kong, The Chinese University of Hong Kong), Jiajin Li (University of British Columbia), Anthony Man-Cho So (The Chinese University of Hong Kong)
- Linear Attention Sequence Parallelism — Weigao Sun (Shanghai Artificial Intelligence Laboratory), Zhen Qin (TapTap), Dong Li (Shanghai AI Lab), Xuyang Shen (Shanghai AI Lab), Yu Qiao (Shanghai Aritifcal Intelligence Laboratory), Yiran Zhong (Shanghai AI Lab)
- On the Hypomonotone Class of Variational Inequalities — Khaled Alomar (Universität des Saarlandes), Tatjana Chavdarova (University of California, Berkeley)
- Distributionally Robust Linear Regression With Block Lewis Weights — Naren Sarayu Manoj (Toyota Technological Institute at Chicago), Kumar Kshitij Patel (Toyota Technological Institute at Chicago)
- On the Crucial Role of Initialization for Matrix Factorization — Bingcong Li (Department of Computer Science, ETHZ - ETH Zurich), Liang Zhang (Department of Computer Science, ETHZ - ETH Zurich), Aryan Mokhtari (University of Texas, Austin), Niao He (Swiss Federal Institute of Technology)
- On the Convergence of DP-SGD with Adaptive Clipping — Egor Shulgin (KAUST), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Memory-Efficient Large Language Model (LLM) Training and Fine-Tuning via Gradient Subspace Tracking — Sahar Rajabi (University of Waterloo), Sirisha Rambhatla (University of Waterloo)
- WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average — Louis Fournier (Université Pierre et Marie Curie - Paris 6, Sorbonne Université - Faculté des Sciences (Paris VI)), Adel Nabli (Université Pierre et Marie Curie - Paris 6, Computer Science Lab - Pierre and Marie Curie University, Paris, France), Masih Aminbeidokhti (École de technologie supérieure, Université du Québec), Marco Pedersoli (École de technologie supérieure, Université du Québec), Eugene Belilovsky (Concordia University, Montreal), Edouard Oyallon (CNRS)
- Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks — Louis Fournier (Université Pierre et Marie Curie - Paris 6, Sorbonne Université - Faculté des Sciences (Paris VI)), Edouard Oyallon (CNRS)
- Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models — Zeman Li (University of Southern California), Xinwei Zhang (University of Southern California), Peilin Zhong (Google), Yuan Deng (Google Research), Meisam Razaviyayn (University of Southern California), Vahab Mirrokni (Google Research)
- Intuitive Analysis of the Quantization based Optimization : From establishing a SDE to Quantum Mechanical Perspective — Jinwuk Seok (Artificial Intelligence Research Lab.), Changsik Cho (ETRI)
- Glocal Smoothness: Line Search can really help! — Curtis Fox (University of British Columbia), Mark Schmidt (University of Alberta)
- Don't Be So Positive: Negative Step Sizes in Second-Order Methods — Betty Shea (, University of British Columbia), Mark Schmidt (University of Alberta)
- From Gradient Clipping to Normalization for Heavy Tailed SGD — Florian Hübler (ETH Zurich), Ilyas Fatkhullin (ETHZ - ETH Zurich), Niao He (Swiss Federal Institute of Technology)
- ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training — Adel Nabli (Université Pierre et Marie Curie - Paris 6, Computer Science Lab - Pierre and Marie Curie University, Paris, France), Louis Fournier (Université Pierre et Marie Curie - Paris 6, Sorbonne Université - Faculté des Sciences (Paris VI)), Pierre ERBACHER (Université Pierre et Marie Curie - Paris 6, Sorbonne Université - Faculté des Sciences (Paris VI)), Louis Serrano (Université Pierre et Marie Curie - Paris 6, Sorbonne Université - Faculté des Sciences (Paris VI)), Eugene Belilovsky (Concordia University, Montreal), Edouard Oyallon (CNRS)
- A Unified Convergence Theory for Large Language Model Efficient Fine-tuning — Zhanhong Jiang (Iowa State University), Nastaran Saadati (Iowa State University), Aditya Balu (Iowa State University), Minh Pham (New York University), Joshua Russell Waite (Iowa State University), Nasla Saleem (Iowa State University), Chinmay Hegde (New York University), Soumik Sarkar (Iowa State University)
- Remove Symmetries to Control Model Expressivity and Improve Optimization — Liu Ziyin (Massachusetts Institute of Technology), Yizhou Xu (EPFL - EPF Lausanne), Isaac L. Chuang (Massachusetts Institute of Technology)
- Communication-Efficient Loss Minimization over Heterogeneous Data with Federated Hierarchical Ensemble Aggregation via Distillation — Sayantan Chowdhury (University of Toronto, University of Toronto), Ben Liang (University of Toronto), Ali Tizghadam (University of Toronto), Ilijc Albanese (TELUS Communications)
- Deconstructing What Makes a Good Optimizer for Language Models — Rosie Zhao (Harvard University, Harvard University), Depen Morwani (Harvard University, Harvard University), David Brandfonbrener (Harvard University), Nikhil Vyas (Harvard University), Sham M. Kakade (Harvard University)
- Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions — Edward Tansley (University of Oxford), Coralia Cartis (University of Oxford)
- Role of Parametrization in Learning Dynamics of Recurrent Neural Networks — Adwait Datar (Technische Universität Hamburg), Chinmay Datar (Technische Universität München), Zahra Monfared (Heidelberg University, Ruprecht-Karls-Universität Heidelberg), Felix Dietrich (Technical University Munich)
- Stochastic Proximal Point Methods for Monotone Inclusions under Expected Similarity — Abdurakhmon Sadiev (King Abdullah University of Science and Technology), Laurent Condat (KAUST), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Dimensionality Reduction Techniques for Global Bayesian Optimisation — Luo Long (University of Oxford), Coralia Cartis (University of Oxford), Paz Fink Shustin (University of Oxford)
- Graph Neural Networks for Hyperparameter Inference in Ising Solvers — Edward Jiang (University of Waterloo), Sam Reifenstein (University of California, Santa Cruz), Milin Doppalapudi (University of California, Santa Cruz), Timothee Leleu (NTT Research)
- Discrete-Continuous Variational Optimization with Local Gradients — Jonathan H Warrell (NEC), Francesco Alesiani (NEC), Cameron Smith (Harvard University), Anja Mösch (Nagasaki University), Martin Renqiang Min (NEC Laboratories America)
- Neural Entropic Multimarginal Optimal Transport — Dor Tsur (Ben-Gurion University of the Negev), Ziv Goldfeld (Cornell University), Kristjan Greenewald (MIT-IBM Watson AI Lab, IBM Research), Haim H. Permuter (Ben Gurion University of the Negev)
- Solving hidden monotone variational inequalities with surrogate losses — Ryan D'Orazio (University of Montreal, University of Montreal), Danilo Vucetic (Mila - Quebec Artificial Intelligence Institute), Zichu Liu (Mila - Quebec Artificial Intelligence Institute), Junhyung Lyle Kim (Rice University), Ioannis Mitliagkas (Athena Research Center), Gauthier Gidel (University of Montreal)
- Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition — Robert Joseph George (California Institute of Technology), David Pitt (California Institute of Technology), Jiawei Zhao (California Institute of Technology), Jean Kossaifi (NVIDIA AI), Cheng Luo (Microsoft), Yuandong Tian (Meta AI (FAIR)), Anima Anandkumar (California Institute of Technology)
- Revisiting the Initial Steps in Adaptive Gradient Descent Optimization — Abulikemu Abuduweili (Carnegie Mellon University), Changliu Liu (Carnegie Mellon University)
- Learning Morphisms with Gauss-Newton Approximation for Growing Networks — Neal Gregory Lawton (CapitalOne), Aram Galstyan (Information Sciences Institute), Greg Ver Steeg (University of California, Riverside)
- On the Convergence of FedProx with Extrapolation and Inexact Prox — Hanmin Li (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- SICNN: Sparsity-induced Input Convex Neural Network for Optimal Transport — Peter Chen (Columbia University), Yue Xie (University of Hong Kong), Qingpeng Zhang (City University of Hong Kong)
- Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks — Shikai Qiu (New York University), Atish Agarwala (Google), Jeffrey Pennington (Google), Lechao Xiao (Google DeepMind)
- Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training — Hiroki Naganuma (University of Montreal, Mila), Xinzhi Zhang (University of Washington), Man-Chung Yue (The University of Hong Kong), Ioannis Mitliagkas (Athena Research Center), Russell J. Hewett (Research, Microsoft), Philipp Andre Witte (Microsoft), Yin Tat Lee (University of Washington)
- BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks — Amrutha Varshini Ramesh (University of British Columbia), Vignesh Ganapathiraman (Yahoo), Issam H. Laradji (ServiceNow), Mark Schmidt (University of Alberta)
- Consensus Based Optimization Accelerates Gradient Descent — Anagha Satish (California Institute of Technology), Ricardo Baptista (Deparment of Computing + Mathematical Sciences, California Institute of Technology), Franca Hoffmann (California Institute of Technology)
- Fast Convergence of Softmax Policy Mirror Ascent for Bandits & Tabular MDPs — Reza Asad (Simon Fraser University), Reza Babanezhad Harikandeh (Samsung), Issam H. Laradji (ServiceNow), Nicolas Le Roux (Microsoft), Sharan Vaswani (Simon Fraser University)
- Policy Optimization for Strictly Batch Imitation Learning — Rishabh Agrawal (University of Southern California), Nathan Dahlin (State University of New York at Albany), Rahul Jain (Google DeepMind), Ashutosh Nayyar (University of Southern California)
- Understanding Adam Requires Better Rotation Dependent Assumptions — Tianyue H. Zhang (Montreal Institute for Learning Algorithms, University of Montreal, Université de Montréal), Lucas Maes (Mila), Alexia Jolicoeur-Martineau (Samsung - SAIT AI Lab, Montreal), Ioannis Mitliagkas (Athena Research Center), Damien Scieur (Samsung), Simon Lacoste-Julien (University of Montreal), Charles Guille-Escuret (Mila)
- Structured Regularization on the SPD Manifold — Andrew Nicholas Cheng (Harvard University, Harvard University), Melanie Weber (Harvard University)
- In the Search for Optimal Portfolios of Counterstrategies in the Large Imperfect Information Games — Karolina Drabent (Czech Technical Univeresity in Prague, Czech Technical University of Prague), David Milec (Czech Technical Univeresity in Prague, Czech Technical University of Prague), Ondrej Kubicek (Czech Technical Univeresity in Prague, Czech Technical University of Prague), Viliam Lisý (Czech Technical University in Prague)
- Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials — August Y Chen (Department of Computer Science, Cornell University), Ayush Sekhari (Massachusetts Institute of Technology), Karthik Sridharan (Cornell University)
- Tight Lower Bounds and Improved Convergence in Performative Prediction — Pedram Khorsandi (Université de Montréal), Rushil Gupta (Université de Montréal), Mehrnaz Mofakhami (Université de Montréal), Simon Lacoste-Julien (University of Montreal), Gauthier Gidel (University of Montreal)
- Memory Efficient Adaptive Stochastic Optimization via Subset-Norm — Thien Hang Nguyen (Northeastern University), Huy Nguyen (Northeastern University)
- Old Optimizer, New Norm: An Anthology — Jeremy Bernstein (Massachusetts Institute of Technology), Laker Newhouse (Massachusetts Institute of Technology)
- Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts — Ashwinee Panda (University of Maryland, College Park), Vatsal Baherwani (University of Maryland, College Park), Zain Sarwar (University of Chicago), Benjamin Thérien (Université de Montréal), Stephen Rawls (CapitalOne), Sambit Sahu (CapitalOne), Supriyo Chakraborty (Capital One), Tom Goldstein (University of Maryland, College Park)
- How Does Critical Batch Size Scale in Pre-training? — Hanlin Zhang (Harvard University), Depen Morwani (Harvard University, Harvard University), Nikhil Vyas (Harvard University), Jingfeng Wu (University of California, Berkeley), Difan Zou (University of Hong Kong), Udaya Ghai (Amazon), Dean Foster (Amazon), Sham M. Kakade (Harvard University)
- Local Curvature Descent: Squeezing More Curvature out of Standard and Polyak Gradient Descent — Peter Richtárik (King Abdullah University of Science and Technology (KAUST)), Simone Maria Giancola (King Abdullah University of Science and Technology), Dymitr Lubczyk (University of Amsterdam), Robin Yadav (University of British Columbia)
- Improving Deep Learning Speed and Performance through Synaptic Neural Balance — Antonios Alexos (University of California, Irvine), ian domingo (University of California, Irvine), Pierre Baldi (University of California, Irvine)
- Dueling in the Dark: An Efficient and Optimal Mirror Descent Approach for Online Optimization with Adversarial Preferences — Aadirupa Saha (Apple), Yonathan Efroni (Meta), Barry-John Theobald (Apple)
- Differentially Private Random Block Coordinate Descent — Arto Maranjyan (King Abdullah University of Science and Technology), Abdurakhmon Sadiev (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Adaptive Partitioning Schemes for Black-Box Optimization — Raja Sunkara (Missouri University of Science and Technology), Ardhendu Tripathy (Missouri University of Science and Technology)
- Aggregating Data for Optimal and Private Learning — Sushant Agarwal (University of Waterloo), Yukti Makhija (Google DeepMind), Rishi Saket (Google), Aravindan Raghuveer (Google)
- Incentivizing Truthful Collaboration in Heterogeneous Federated Learning — Dimitar Chakarov (Toyota Technological Institute at Chicago), Nikita Tsoy (Sofia University), Kristian Minchev (Sofia University "St. Kliment Ohridski"), Nikola Konstantinov (INSAIT, Sofia University)
- Optimizing Attention — Hanno Ackermann (Qualcomm Inc, QualComm), Hong Cai (Qualcomm AI Research), Markus Nagel (Qualcomm AI Research), Leyla Mirvakhabova (Qualcomm AI Research), Farhad G. Zanjani (Qualcomm AI Research), Fatih Porikli (QualComm)
- Amplitude Modulated Riemannian Optimization for QAP — Timothee Leleu (NTT Research), Aron Vizkeleti (University of Notre Dame), Sam Reifenstein (University of California, Santa Cruz)
- Normalization Matters for Optimization Performance on Graph Neural Networks — Alan Milligan (, University of British Columbia), Frederik Kunstner (INRIA), Hamed Shirzad (University of British Columbia), Mark Schmidt (University of Alberta), Danica J. Sutherland (University of British Columbia)
- u-$\mu$P: The Unit-Scaled Maximal Update Parametrization — Charlie Blake (Graphcore), Constantin Eichenberg (Aleph Alpha), Josef Dean (University of Warwick), Lukas Balles (Aleph Alpha ), Luke Yuri Prince (Graphcore), Björn Deiseroth (Technische Universität Darmstadt), Andres Felipe Cruz-Salinas (Cohere), Carlo Luschi (Graphcore), Samuel Weinbach (Aleph Alpha GmbH), Douglas Orr (Graphcore)
- LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression — Laurent Condat (KAUST), Arto Maranjyan (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Optimal Transport for Probabilistic Circuits — Adrian Ciotinga (Arizona State University), YooJung Choi (Arizona State University)
- Extra-Gradient and Optimistic Gradient Descent Converge in Iterates Faster than $O(1/\sqrt{T})$ in All Monotone Lipschitz Variational Inequalities — Kimon Antonakopoulos (EPFL - EPF Lausanne)
- Communication-efficient Algorithms Under Generalized Smoothness Assumptions — Sarit Khirirat (King Abdullah University of Science and Technology), Abdurakhmon Sadiev (King Abdullah University of Science and Technology), Artem Riabinin (King Abdullah University of Science and Technology), Eduard Gorbunov (Mohamed bin Zayed University of Artificial Intelligence), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Weak to Strong Learning from Aggregate Labels — Yukti Makhija (Google DeepMind), Rishi Saket (Google)
- A Continuous Variable Optimization method for the Quadratic Assignment Problem — Aron Vizkeleti (University of Notre Dame), Timothee Leleu (NTT Research)
- Neural Networks with Complex-Valued Weights Have No Spurious Local Minima — Xingtu Liu (Simon Fraser University)
- SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Nonconvex Cross-Device Federated Learning — Avetik Karagulyan (Centrale Supélec), Egor Shulgin (KAUST), Abdurakhmon Sadiev (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- A theoretical study of the $(L_0,L_1)$-smoothness condition in deep learning — Y Cooper (University of Notre Dame)
- Connections between Schedule-Free SGD, Accelerated SGD Variants, and Weight Averaging — Depen Morwani (Harvard University, Harvard University), Nikhil Vyas (Harvard University), Hanlin Zhang (Harvard University), Sham M. Kakade (Harvard University)
- Statistical Inference in Latent Convex Objectives with Stream Data — Rohan Chauhan (University of California, Irvine), Emmanouil-Vasileios Vlatakis-Gkaragkounis (University of Wisconsin - Madison), Michael I. Jordan (University of California, Berkeley)
- Modularity aided consistent attributed graph clustering via coarsening — Samarth Bhatia (Indian Institute of Science, Indian institute of science, Bangalore), Yukti Makhija (Google DeepMind), Manoj Kumar (Indian Institute of Technology Delhi), Sandeep Kumar (Indian Institute of Technology Delhi)
- Simple and Scalable Federated Learning with Uncertainty via Improved Variational Online Newton — Shivam Pal (IIT Kanpur, Indian Institute of Technology, Kanpur), Aishwarya Gupta (IIT Kanpur, Indian Institute of Technology, Kanpur), Saqib Sarwar (Indian Institute of Technology, Kanpur), Piyush Rai (IIT Kanpur, IIT Kanpur)
- Lion's sign noise can make training more stable — Simon Elistratov (Lomonosov Moscow State University), Andrey Podivilov (St. Petersburg State University), Timofei Iuzhakov (constructor university), Dmitry Vetrov (Constructor University)
- Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion — Max McGuinness (University of Cambridge), Eirik Fladmark (University of Cambridge), Francisco Vargas (University of Cambridge)
- Personalized Federated Learning via Low-Rank Matrix Factorization — Ali Dadras (Umea University), Sebastian U Stich (CISPA Helmholtz Center for Information Security), Alp Yurtsever (Umeå University)