Site menu:
OPT2023
We welcome you to participate in the 15th International OPT Workshop on Optimization for Machine Learning, to be held as a part of the NeurIPS 2023 conference. This year we particularly encourage (but not limit) submissions inspired by the spirit of "Optimization in the Wild".
We are looking forward to an exciting OPT!
Accepted Papers
Accepted Papers (oral)
- An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization — Guy Kornowski (Weizmann Institute of Science), Ohad Shamir (Weizmann Institute)
- Dueling Optimization with a Monotone Adversary — Avrim Blum (Toyota Technological Institute at Chicago), Meghal Gupta (University of California, Berkeley), Gene Li (Toyota Technological Institute at Chicago), Naren Sarayu Manoj (EPFL - EPF Lausanne), Aadirupa Saha (Apple), Yuanyuan Yang (University of Washington)
- Escaping mediocrity: how two-layer networks learn hard generalized linear models — Luca Arnaboldi (EPFL - EPF Lausanne), Florent Krzakala (Swiss Federal Institute of Technology Lausanne), Bruno Loureiro (Ecole Normale Supérieure, Ecole Normale Supérieure de Paris), Ludovic Stephan (EPFL - EPF Lausanne)
- High-Dimensional Unbiased Prediction for Sequential Decision Making — Georgy Noarov (School of Engineering and Applied Science, University of Pennsylvania), Ramya Ramalingam (University of Pennsylvania), Aaron Roth (Amazon), Stephan Xie (University of Pennsylvania)
- Last Iterate Convergence of Popov Method for Non-monotone Stochastic Variational Inequalities — Daniil Vankov (Arizona State University), Angelia Nedich (Arizona State University), Lalitha Sankar (Arizona State University)
- Practical Principled Policy Optimization for Finite MDPs — Michael Lu (Simon Fraser University), Matin Aghaei (Simon Fraser University), Anant Raj (INRIA), Sharan Vaswani (Simon Fraser University)
Accepted Papers (poster)
- K-Spin Ising Model for Combinatorial Optimizations over Graphs: A Reinforcement Learning Approach — Xiao-Yang Liu (Columbia University), Ming Zhu (Institute of Automation, Chinese Academy of Sciences)
- GUC: Unsupervised non-parametric Global Clustering and Anomaly Detection — Chris Solomou (University of York)
- Information-Theoretic Trust Regions for Stochastic Gradient-Based Optimization — Philipp Dahlinger (Karlsruhe Institute of Technology), Philipp Becker (FZI Forschungszentrum Informatik ), Maximilian Hüttenrauch (Karlsruhe Institute of Technology), Gerhard Neumann (Karlsruhe Institute of Technology)
- Enhancing the Misreport Network for Optimal Auction Design — Haiying Wu (Tianjin University), shuyuan you (Tianjin University), Zhiqiang Zhuang (Tianjin University), Kewen Wang (Griffith University), Zhe Wang (Griffith University)
- Accelerating Inexact HyperGradient Descent for Bilevel Optimization — Haikuo Yang (Fudan University), Luo Luo (Fudan University), Chris Junchi Li (University of California Berkeley), Michael Jordan (University of California, Berkeley), Maryam Fazel (University of Washington, Seattle)
- Towards a Better Theoretical Understanding of Independent Subnetwork Training — Egor Shulgin (KAUST), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Revisiting Random Weight Perturbation for Efficiently Improving Generalization — Tao Li (Shanghai Jiao Tong University), Weihao Yan (Shanghai Jiaotong University), Qinghua Tao ((ESAT) Department of Electrical Engineering, KU Leuven, Belgium, KU Leuven), Zehao Lei (Shanghai Jiaotong University), Yingwen Wu (Shanghai Jiao Tong University), Kun Fang (Shanghai Jiao Tong University), Mingzhen He (Shanghai Jiaotong University), Xiaolin Huang (Shanghai Jiao Tong University, Tsinghua University)
- Exploring Modern Evolution Strategies in Portfolio Optimization — Ramin Hasani (Massachusetts Institute of Technology), Etan A Ehsanfar (Stevens Institute of Technology), Greg A Banis, Rusty Bealer, Amir Soroush Ahmadi
- Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded Geometric Penalties — David Martínez-Rubio (Zuse Institute Berlin), Christophe Roux (Zuse Institute Berlin), Christopher Criscitiello (EPFL - EPF Lausanne), Sebastian Pokutta (ZIB)
- The Sharp Power Law of Local Search on Expanders — Simina Branzei (Purdue University), Davin Choo (National University of Singapore), Nicholas Recker (Purdue University)
- Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization — Hanmin Li (King Abdullah University of Science and Technology), Avetik Karagulyan (King Abdullah University of Science and Technology), Peter Richtárik (King Abdullah University of Science and Technology (KAUST))
- Regret Bounds for Optimistic Follow The Leader: Applications in Portfolio Selection and Linear Regression — Sudeep Raja Putta (Columbia University), Shipra Agrawal (Columbia University)
- Non-Uniform Sampling and Adaptive Optimizers in Deep Learning — Thibault Lahire (Dassault Aviation)
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation — Eric Zelikman (Stanford University), Eliana Lorch (University of Oxford), Lester Mackey (Microsoft Research New England), Adam Tauman Kalai (Microsoft)
- Adaptive Quasi-Newton and Anderson Acceleration Framework with Explicit Global (Accelerated) Convergence Rates — Damien Scieur (Samsung SAIL and Mila)
- On Optimization Formulations of Finite Horizon MDPs — Rajat Vadiraj Dwaraknath (Stanford University), Lexing Ying (Stanford University)
- A Predicting Clipping Asynchronous Stochastic Gradient Descent Method in Distributed Learning — Haoxiang Wang (Tsinghua University, Tsinghua University), Zhanhong Jiang (Iowa State University), Chao Liu (Tsinghua University), Soumik Sarkar (Iowa State University), Dongxiang Jiang (Tsinghua University, Tsinghua University), Young M Lee (Johnson Controls)
- Average-Constrained Policy Optimization — Akhil Agnihotri (University of Southern California), Rahul Jain (University of Southern California), Haipeng Luo (University of Southern California)
- How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization — Nuoya Xiong (Tsinghua University, Tsinghua University), Lijun Ding (University of Wisconsin - Madison), Simon Shaolei Du (University of Washington)
- Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs — Zihan Zhou (IIIS, Tsinghua University), Honghao Wei (Washington State University ), Lei Ying (University of Michigan, Ann Arbor)
- Stochastic Optimization under Hidden Convexity — Ilyas Fatkhullin (ETHZ - ETH Zurich), Niao He (Swiss Federal Institute of Technology), Yifan Hu (EPFL - EPF Lausanne)
- On the Parallel Complexity of Multilevel Monte Carlo in Stochastic Gradient Descent — Kei Ishikawa (Tokyo Institute of Technology)
- Fair Minimum Representation Clustering — Connor Lawless (Cornell University), Oktay Gunluk (Cornell University)
- FaDE: Fast DARTS Estimator on Hierarchical NAS Spaces — Simon Neumeyer (Universität Passau), Julian Stier (University of Passau), Michael Granitzer (Universität Passau)
- Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets — Wu Lin (Vector Institute), Felix Dangel (Vector Institute, Toronto), Runa Eschenhagen (University of Cambridge), Kirill Neklyudov (Vector Institute), Agustinus Kristiadi (Vector Institute), Richard E Turner (University of Cambridge), Alireza Makhzani (Vector Institute)
- Level Set Teleportation: the Good, the Bad, and the Ugly — Aaron Mishkin (Computer Science Department, Stanford University), Alberto Bietti (Flatiron Institute), Robert M. Gower (Flatiron Institute)
- Unnormalized Density Estimation with Root Sobolev Norm Regularization — Mark Kozdoba (Technion, Technion), Binyamin Perets (Technion - Israel Institute of Technology, Technion - Israel Institute of Technology), Shie Mannor (Technion)
- Adagrad Promotes Diffuse Solutions In Overparameterized Regimes — Andrew Rambidis (McGill University, McGill University), Jiayi Wang (McGill University)
- MSL: An Adaptive Momentem-based Stochastic Line-search Framework — Chen Fan (University of British Columbia), Sharan Vaswani (Simon Fraser University), Christos Thrampoulidis (University of British Columbia), Mark Schmidt (University of Alberta)
- SGD batch saturation for training wide neural networks — Chaoyue Liu (University of California, San Diego), Dmitriy Drusvyatskiy (University of Washington, Seattle), Mikhail Belkin (University of California, San Diego), Damek Davis (Cornell University), Yian Ma (University of California, San Diego)
- Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem — Robin Yadav (University of British Columbia), Frederik Kunstner (University of British Columbia), Mark Schmidt (University of Alberta), Alberto Bietti (Flatiron Institute)
- DynaLay: An Introspective Approach to Dynamic Layer Selection for Deep Networks — Mrinal Mathur (Georgia State University), Sergey Plis (Georgia State University)
- How to Guess a Gradient — Utkarsh Singhal (University of California Berkeley), Brian Cheung (Massachusetts Institute of Technology), Kartik Chandra (Massachusetts Institute of Technology), Jonathan Ragan-Kelley (Massachusetts Institute of Technology), Joshua B. Tenenbaum (Massachusetts Institute of Technology), Tomaso A Poggio (Massachusetts Institute of Technology), Stella X. Yu (University of Michigan - Ann Arbor)
- Fair Representation in Submodular Subset Selection: A Pareto Optimization Approach — Adriano Fazzone (CENTAI Institute), Yanhao Wang (East China Normal University), Francesco Bonchi (Centai)
- Global CFR: Meta-Learning in Self-Play Regret Minimization — David Sychrovský (Charles University Prague), Michal Sustr (Czech Technical University), Michael Bowling (Department of Computing Science, University of Alberta), Martin Schmid (Google)
- (Un)certainty selection methods for Active Learning on Label Distributions — James Spann (University of Rochester), Pratik Sanjay Bongale (Rochester Institute of Technology), Christopher M Homan (Rochester Institute of Technology)
- Cup Curriculum: Curriculum Learning on Model Capacity — Luca Scharr (Rheinische Friedrich-Wilhelms Universität Bonn), Vanessa Toborek (Rheinische Friedrich-Wilhelms Universität Bonn)
- Optimal Transport for Kernel Gaussian Mixture Models — Jung Hun Oh (Memorial Sloan Kettering Cancer Centre), Rena Elkin (Memorial Sloan Kettering Cancer Centre), Anish Kumar Simhal (Memorial Sloan Kettering Cancer Centre), Jiening Zhu (Stony Brook University), Joseph O Deasy (Memorial Sloan Kettering Cancer Centre), Allen Tannenbaum (Stony Brook University)
- On the Convergence of Local SGD Under Third-Order Smoothness and Hessian Similarity — Ali Zindari (CISPA Helmholtz Center for Information Security), Ruichen Luo (University of Hong Kong), Sebastian U Stich (CISPA Helmholtz Center for Information Security)
- Testing Approximate Stationarity Concepts for Piecewise Affine Functions and Extensions — Lai Tian (The Chinese University of Hong Kong), Anthony Man-Cho So (The Chinese University of Hong Kong)
- Generalisable Agents for Neural Network Optimisation — Kale-ab Tessera (University of Edinburgh), Callum Rhys Tilbury (InstaDeep), Sasha Abramowitz (InstaDeep), Ruan John de Kock (InstaDeep), Omayma Mahjoub (InstaDeep), Benjamin Rosman (University of the Witwatersrand), Sara Hooker (Cohere For AI), Arnu Pretorius (InstaDeep)
- Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation — Krishna C Kalagarla (University of Southern California), Rahul Jain (University of Southern California), Pierluigi Nuzzo (University of Southern California)
- An alternative approach to train neural networks using monotone variational inequality — Chen Xu (Georgia Institute of Technology), Xiuyuan Cheng (Duke University), Yao Xie (Georgia Institute of Technology)
- Nesterov Meets Robust Multitask Learning Twice — Yifan Kang (Clemson University), Kai Liu (Clemson University)
- Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches — Michal Derezinski (University of Michigan - Ann Arbor)
- Adam through a Second-Order Lens — Ross M Clarke (University of Cambridge), Baiyu Su (University of Cambridge), José Miguel Hernández-Lobato (University of Cambridge)
- Large-scale Non-convex Stochastic Constrained Distributionally Robust Optimization — Qi Zhang (State University of New York at Buffalo), Yi Zhou (University of Utah), Ashley Prater-Bennette (Air Force Research Laboratory), Lixin Shen (Syracuse University), Shaofeng Zou (State University of New York, Buffalo)
- On the convergence of warped proximal iterations for solving nonmonotone inclusions and applications — Dimitri Papadimitriou (Math. and Computational Optimization Inst.), Bang Công Vu ()
- Multi-head CLIP: Improving CLIP with Diverse Representations and Flat Minima — Mo Zhou (Duke University), Xiong Zhou (Amazon), Li Erran Li (Amazon), Stefano Ermon (Stanford University), Rong Ge (Duke University)
- New Horizons in Parameter Regularization: A Constraint Approach — Jörg K.H. Franke (Universität Freiburg), Michael Hefenbrock (RevoAI), Gregor Koehler (German Cancer Research Center (DKFZ)), Frank Hutter (University of Freiburg & Bosch)
- Variance Reduced Model Based Methods: New rates and adaptive step sizes — Robert M. Gower (Flatiron Institute), Frederik Kunstner (University of British Columbia), Mark Schmidt (University of Alberta)
- Learning Multiobjective Program Through Online Learning — Chaosheng Dong (University of Pittsburgh), Yijia Wang (University of Pittsburgh), Bo Zeng (University of Pittsburgh)
- Riemannian Optimization for Euclidean Distance Geometry — Chandler Mack Smith (Tufts University), Samuel P. Lichtenberg (Tufts University), HanQin Cai (University of Central Florida), Abiy Tasissa (Tufts University)
- Oracle Efficient Algorithms for Groupwise Regret — Krishna Acharya (Georgia Institute of Technology), Eshwar Ram Arunachaleswaran (University of Pennsylvania, University of Pennsylvania), Juba Ziani (Georgia Institute of Technology), Aaron Roth (Amazon), Sampath Kannan (University of Pennsylvania)
- Reducing Predict and Optimize to Convex Feasibility — Saurabh kumar Mishra (Simon Fraser University), Sharan Vaswani (Simon Fraser University)
- Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post Fairness — Sruthi Gorantla (Indian Institute of Science), Eshaan Bhansali (University of Wisconsin - Madison), Amit Deshpande (Microsoft Research), Anand Louis (Indian Institute of Science)
- Efficient Learning in Polyhedral Games via Best Response Oracles — Darshan Chakrabarti (Columbia University), Gabriele Farina (Massachusetts Institute of Technology), Christian Kroer (Columbia University)
- Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning — Mohamed Elsayed (University of Alberta), A. Rupam Mahmood (University of Alberta)
- A novel analysis of gradient descent under directional smoothness — Aaron Mishkin (Computer Science Department, Stanford University), Ahmed Khaled (Princeton University), Aaron Defazio (Facebook), Robert M. Gower (Flatiron Institute)
- $f$-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization — Sina Baharlouei (University of Southern California), Shivam Patel (Indian Institute of Technology Bombay), Meisam Razaviyayn (University of Southern California)
- Follow the flow: Proximal flow inspired multi-step methods — Yushen Huang (State University of New York at Stony Brook), Yifan Sun (State University of New York, Stony Brook)
- Noise Stability Optimization For Flat Minima With Tight Rates — Haotian Ju (Northeastern University), Dongyue Li (Northeastern University), Hongyang R Zhang (Computer Science, Northeastern University)
- Surrogate Minimization: An Optimization Algorithm for Training Large Neural Networks with Model Parallelism — Reza Asad (Simon Fraser University), Reza Babanezhad Harikandeh (Samsung), Issam H. Laradji (ServiceNow), Nicolas Le Roux (Microsoft), Sharan Vaswani (Simon Fraser University)
- Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm — Peiyuan Zhang (Yale University), Jingzhao Zhang (Tsinghua University, Tsinghua University), Suvrit Sra (Massachusetts Institute of Technology)
- Diversity-adjusted adaptive step size — Parham Yazdkhasti (CISPA Helmholtz Center for Information Security), Xiaowen Jiang (CISPA Helmholtz Center for Information Security), Sebastian U Stich (CISPA Helmholtz Center for Information Security)
- Stochastic FISTA Step Search Algorithm for Convex Optimization — Trang H. Tran (Cornell University), Lam M. Nguyen (IBM Research, Thomas J. Watson Research Center), Katya Scheinberg (Cornell University)
- From 6235149080811616882909238708 to 29: Vanilla Thompson Sampling Revisited — Bingshan Hu (University of Alberta), Tianyue H. Zhang (Montreal Institute for Learning Algorithms, University of Montreal, Université de Montréal)
- Almost multisecant BFGS quasi-Newton method — Mokhwa Lee (State University of New York, Stony Brook), Yifan Sun (State University of New York, Stony Brook)
- Online Covariance Matrix Estimation in Stochastic Inexact Newton Methods — Wei Kuang (University of Chicago), Sen Na (University of California, Berkeley), Mihai Anitescu (University of Chicago)
- Accelerated gradient descent: A guaranteed bound for a heuristic restart strategy — Walaa Moursi (University of Waterloo), Stephen A. Vavasis (University of Waterloo), Viktor Pavlovic (University of Waterloo)
- Noise-adaptive (Accelerated) Stochastic Heavy-Ball Momentum — Anh Quang Dang (Computing Science, Simon Fraser University), Reza Babanezhad Harikandeh (Samsung), Sharan Vaswani (Simon Fraser University)
- The Expressive Power of Low-Rank Adaptation — Yuchen Zeng (University of Wisconsin, Madison), Kangwook Lee (University of Wisconsin, Madison)
- On the Interplay Between Stepsize Tuning and Progressive Sharpening — Vincent Roulet (Google), Atish Agarwala (Google), Fabian Pedregosa (Google)
- Greedy Newton: Newton's Method with Exact Line Search — Betty Shea (University of British Columbia), Mark Schmidt (University of Alberta)
- Decentralized Learning Dynamics in the Gossip Model — John Lazarsfeld (Yale University), Dan Alistarh (Institute of Science and Technology)
- Pruning Neural Networks with Velocity-Constrained Optimization — Donghyun Oh (Pohang University of Science and Technology), Jinseok Chung (Pohang University of Science and Technology), Namhoon Lee (Pohang University of Science and Technology)
- Federated Learning with Convex Global and Local Constraints — Chuan He (University of Minnesota - Twin Cities), Le Peng (University of Minnesota, Minneapolis), Ju Sun (University of Minnesota, Twin Cities)
- Risk Bounds of Accelerated SGD for Overparameterized Linear Regression — Xuheng Li (University of California, Los Angeles), Yihe Deng (University of California, Los Angeles), Jingfeng Wu (University of California, Berkeley), Dongruo Zhou (Indiana University), Quanquan Gu (University of California, Los Angeles)
- Contrastive Predict-and-Search for Mixed Integer Linear Programs — Taoan Huang (University of Southern California), Aaron M Ferber (University of Southern California), Arman Zharmagambetov (Meta AI (FAIR)), Yuandong Tian (Meta AI (FAIR)), Bistra Dilkina (University of Southern California)
- Feature Selection in Generalized Linear models via the Lasso: To Scale or Not to Scale? — Anant Mathur (University of New South Wales), Sarat Babu Moka (University of New South Wales), Zdravko Botev (University of New South Wales)
- Noise Injection Irons Out Local Minima and Saddle Points — Konstantin Mishchenko (Samsung), Sebastian U Stich (CISPA Helmholtz Center for Information Security)
- Understanding the Role of Optimization in Double Descent — Chris Yuhao Liu (University of California, Santa Cruz), Jeffrey Flanigan (University of California, Santa Cruz)
- On the Synergy Between Label Noise and Learning Rate Annealing in Neural Network Training — Stanley Wei (Princeton University), Tongzheng Ren (University of Texas, Austin), Simon Shaolei Du (University of Washington)
- Bandit-Driven Batch Selection for Robust Learning under Label Noise — Michal Lisicki (University of Guelph), Mihai Nica (University of Guelph), Graham W. Taylor (University of Guelph)
- DIRECT Optimisation with Bayesian Insights: Assessing Reliability Under Fixed Computational Budgets — Fu Wang (University of Exeter), Zeyu Fu (University of Exeter), Xiaowei Huang (University of Liverpool), Wenjie Ruan (University of Exeter)
- Continually Adapting Optimizers Improve Meta-Generalization — Wenyi Wang (King Abdullah University of Science and Technology), Louis Kirsch (IDSIA), Francesco Faccio (The Swiss AI Lab IDSIA), Mingchen Zhuge (King Abdullah University of Science and Technology), Jürgen Schmidhuber (King Abdullah University of Science and Technology)
- Parameter-Agnostic Optimization under Relaxed Smoothness — Florian Hübler (ETH Zurich), Junchi YANG (ETHZ - ETH Zurich), Xiang Li (ETHZ - ETH Zurich), Niao He (Swiss Federal Institute of Technology)
- Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle — Dániel Rácz (Institute for Computer Science and Control, Eötvös Loránd Research Network), Mihaly Petreczky (CNRS), Balint Daroczy (Université catholique de Louvain), András Csertán (Eotvos Lorand University, Eötvös Lorand University)