We welcome you to participate in the 7th NIPS Workshop on Optimization for Machine Learning, to be held at:
Montreal, Quebec, Canada, Dec 12th, 2014
Room 513e,f on Level 5
LOCATION: Room 513e,f on Level 5
|8:30-9:30||Amir Beck||On the Minimization over Sparse Symmetric Sets||[abstract] [slides]||[video]|
We consider the problem of minimizing a smooth function over the intersection of a closed convex set and a sparsity constraint. We begin by showing how the orthogonal projection operator can be computed efficiently when the underlying convex set has certain symmetry properties. A hierarchy of optimality conditions for the problem is then established, and the theoretical superiority of coordinate-wise type conditions is established. Finally, based on the derived optimality conditions, we present several coordinate descent type algorithms for solving the sparsity constrained problem, and illustrate their performance on several numerical examples. This is joint work with Nadav Hallak.
|9:30-11:30||POSTER SESSION (including coffee break 10:00 - 10:30)|
|11:30-12:30||Jean Bernard Lasserre||The moment-LP and moment-SOS approaches in polynomial optimization||[abstract] [slides]||[video]|
For a certain class of optimization problems with positivity constrants, powerful positivity certificates of Real Algebraic Geometry allow one to express difficult positivity constraints in a way that can be exploited for efficient numerical computation by defining a hierarchy of convex relaxations which provides a monotone sequence of upper bounds that eventually tends to the solution of the original unrelaxed problem. We discuss the relative merits and drawbacks of both hierarchies of LP- and SDP-relaxations and their impact not only in optimization (and particularly combinatorial optimization) but also in many areas for solving instances of the so-called Genaralized Problem of Moments (GMP) with polynomial data (of which Global Polynomial Optimization is in fact the simplest instance). (link to full abstract)
|12:30-15:00||lunch break (+ Vowpal Wabbit tutorial by John Langford, 1:30-2:45pm, same room but independent of OPT)||[video]|
|15:00-15:20||Ohad Shamir||A Stochastic PCA Algorithm with an Exponential Convergence Rate||[slides]||[video]|
|15:20-15:40||Mark Schmidt, Ann Clifton and Anoop Sarkar||Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields||[slides]||[video]|
|15:40-16:00||Amir Ali Ahmadi, Dmitry Malioutov and Ronny Luss||Robust minimum volume ellipsoids and higher-order polynomial level sets||[slides]||[video]|
|16:00-17:00||POSTER SESSION (and coffee break)|
|17:00-18:00||Yoshua Bengio||Optimization challenges for deep learning||[abstract] [slides]||[video]|
Optimizing deep and/or recurrent networks is a fundamentally hard challenge because deep representations are obtained as the composition of many non-linearities, making the objective function a highly non-linear function of the parameters, which in turn makes it difficult for standard gradient-based methods to find good solutions. For deeper networks trying to learn tasks that involve the composition of subtasks, gradient-based optimization can get stuck in very poor solutions, even though we know that very good solutions exist. A benchmark task illustrating this problem is presented. The next question is why. We present results suggesting that it is not mostly because of local minima but rather because of saddle points with a flat landscape around the saddle points. Finally, we discuss potential solutions to this challenge, either based on new gradient-based optimization approaches aimed at dealing with saddle points, or by completely abandoning gradients and exploring approaches that generalize gradient propagation to target propagation, with the potential to handle the optimization of highly non-linear functions which may even involve discrete computation (e.g. neurons with discrete outputs).