TURING OPTIMIZATION CLUB

Upcoming meeting :
TBA

Past and Upcoming events

Feb 10, 202011:30
Estelle Massart University of Oxford Title: A quotient geometry on the manifold of fixed-rank positive-semidefinite matrices

Riemannian optimization aims to design optimization algorithms for constrained problems, where the constraints enforce the variables to belong to a Riemannian manifold. Classical examples of Riemannian manifolds include, e.g., the set of orthogonal matrices, the set of subspaces of a given dimension (called the Grassman manifold), and the set of fixed-rank matrices.

After a quick introduction to Riemannian optimization, and more specifically Riemannian gradient descent (RGD), we will present the tools needed to run RGD on the manifold of fixed-rank positive-semidefinite matrices, seen as a quotient of the set of full-rank rectangular matrices by the orthogonal group. We will also present recent results about related geometrical tools on that manifold. This manifold is particularly relevant when dealing with low-rank approximations of large positive-(semi)definite matrices.

Nov 11, 201912:30
Shenglong Zhou University of Southampton Title: Bilevel Optimisation with Applications into Hyper-parameter Tuning in Machine Learning

Hyper-parameters frequently appear in methods like Support Vector Machines (SVM), kernel approaches and statistical learning regularisation, while their importance often is ignored in practice. The most commonly used and widely accepted method for selecting these hyper-parameters (also known as model selection) is the so-called cross validation, which suffers from inefficiencies and expenses of scenarios where the number or the dimension of hyper-parameters are relatively large. Interestingly, when those hyper-parameters are treated as some variables, the idea of cross validation allows us to do model selection through the bilevel optimisation. There are fruitful theoretical results for bilevel optimisation for the past several decades, but few numerical tools that can be used. In this talk, we will briefly introduce three ways to solve a general bilevel optimisation through the semi-smooth Newton-type methods, and then apply them into selecting hyper-parameters in Lasso and SVM problems.

Mar 12, 201910:30
Dominic Richards University of Oxford Title: Optimal Statistical Rates for Non-parametric Decentralised Regression with Distributed Gradient Descent

Due to bandwidth limitations, privacy concerns or network instability, it is often required to fit statistical models on data sets stored across multiple computers without a central server to coordinate computation and disseminate information i.e. star topology. This has motivated the investigation of decentralised methods which solve the problem in a more robust manner by not relying on a single computer. In this talk we investigate the statistical performance of a simple synchronous decentralised iterative gradient descent method (Distributed Gradient Descent) in the homogeneous distributed non-parametric regression setting i.e. computers hold samples from the same distribution. By utilising the concentration of quantities held by individual computers, we show there are a number of settings where computers can save on computational and communication costs without any loss in statistical accuracy. Given computers hold sufficiently many samples with respect to the network topology, we show that Distributed Gradient Descent yields optimal statistical rates with the same numbers of iterations as Centralised algorithm. (Joint work with P. Rebeschini)

Feb 12, 201910:30
Erlend S Riis University of Cambridge Title: A geometric integration approach to nonsmooth, nonconvex optimisation

Discrete gradient methods are popular numerical methods from geometric integration for solving systems of ODEs. They are well-known for preserving structures of the continuous system such as energy dissipation/conservation. The preservation of dissipation makes discrete gradient methods interesting for optimisation problems. In this talk, we consider a derivative-free discrete gradient applied to dissipative ODEs such as gradient flow, thereby obtaining optimisation schemes that simultaneously are implementable in a black-box setting and retain favourable properties of gradient flow. We give a theoretical analysis of these schemes in the nonsmooth, nonconvex setting, and conclude with numerical results for the bilevel optimisation of regularisation parameters in image processing.

Jan 29, 201910:30
Giuseppe Ughi University of Oxford Title: Derivative Free Optimisation for Generating Adversarial Examples

Neural Network algorithms have achieved unprecedented performance in image recognition over the past decade. However, their application to real world applications, such as self driving cars, raises the question of whether it is safe to rely on them.

We generally associate the robustness of these algorithms with how easy it is to generate an adversarial example: a tiny perturbation of an image which leads it to be misclassified by the Neural Net (which classifies the original image correctly). In this talk, we consider the generation of adversarial examples when the architecture of the target neural net is unknown. This leads us to consider the solution of the problem via derivative free optimisation methods. In this talk we will introduce the generation of the adversarial image through a model-based method and compare this to results achieved in the previous years.

Nov 21, 201810:30
Amartya Sanyal University of Oxford and The Alan Turing Institute Title: Intriguing Properties of Learned Representations

A key feature of neural networks, particularly deep convolutional neural networks, is their ability to learn useful representations from data. The very last layer of a neural network is then simply a linear model trained on these learned representations. Despite their numerous applications in other tasks such as classification, retrieval, clustering etc., a.k.a. transfer learning, not much work has been published that investigates the structure of these representations or indeed whether structure can be imposed on them during the training process.

In this paper, we study the effective dimensionality of the learned representations by models that have proved highly successful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank structure on learned activations. Empirically, we show that this has implications for robustness to adversarial examples and compression.

Nov 7, 201810:30
Stéphane Chrétien National Physical Laboratory Title: Who's afraid of the incremental gradient method?

Jun 25, 201814:00
Stephen Wright University of Wisconsin, Madison Title: Practical Nonconvex Optimization Algorithms With Complexity Guarantees.

Several widely used paradigms in data analysis can be formulated as smooth nonconvex optimization problems with many variables. Classical algorithms for this problem include L-BFGS, nonlinear conjugate gradient, and Newton-conjugate gradient ("Newton-CG"). Alternative approaches have been proposed more recently, motivated more by favorable worst-case complexity guarantees than by practical performance. After surveying the data science context, we present a variant of Newton-CG for which worst-case complexity guarantees can be proved. We present computational results for a variety of algorithms on several applications from data science, including robust statistics and matrix and tensor completion. This talk is based in part on joint research with Clement Royer and Michael O'Neill.

Mar 21, 201811:00
Jingwei Liang University of Cambridge Title: Identifiability of Forward—Backward-type Splitting Methods.

Forward-Backward splitting method, a.k.a. proximal gradient descent is one of the most widely used algorithms in the literature. Over the decades, many variants of the method are proposed under different application scenarios, such that the accelerated FISTA scheme for image processing and the stochastic Forward—Backward for learning tasks. In this talk, I will talk about the local property of the class of Forward-Backward splitting methods and show that under mild condition these schemes can identify the local geometry associated to the solution. The benefits of such local geometry will be highlighted through computational complexity and first-/higher-order acceleration.

Mar 6, 201812:00
Coralia Cartis University of Oxford and The Alan Turing Institute Title: Complexity of Nonconvex Optimization: The Story So Far.

Establishing the global rate of convergence of standard algorithms or their global evaluation complexity for nonconvex smooth optimization problems is a natural but challenging aspect of algorithm analysis. In the last decade, substantial progress has been made and continues to be made, in this direction. We review some of the key developments, illustrating the crucial role played by cubic regularisation methods, a credible alternative to linesearch and trust-region.

We then focus on two recent results. Firstly, we consider a general/new class of adaptive regularization methods, that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size. We investigate the worst-case evaluation complexity of these algorithms, when the level of sufficient smoothness of the objective and its derivatives may be unknown or may even be absent, and we find that the methods accurately reflect in their complexity the (unknown) degree of smoothness of the objective/derivatives and satisfy increasingly better bounds with the order of the derivatives available. We then focus on the antipodal context, when no derivatives of the objective are available, and the local models in adaptive cubic regularisation methods are constructed for example, by sampling of function values. We show that the evaluation complexity of the ensuing methods do not change in the order of the accuracy from their deterministic counterparts, increasing only by a constant factor which depends on the probability of the sampled models being occasionally, sufficiently accurate.

Time permitting, we will also discuss the stochastic case when even function evaluations may be inaccurate. This work is joint with Nick Gould (Rutherford Appleton Laboratory, UK), Philippe Toint (University of Namur, Belgium) and Katya Scheinberg (Lehigh University, USA).

Nov 28, 201713:00
Adilet Otemissov University of Oxford and The Alan Turing Institute Title: Dimensionality reduction techniques for global optimization.

(Joint work with Coralia Cartis) The problem of finding the most extreme value of a function, also known as global optimization, is a challenging task. The difficulty is associated with the exponential increase in the computational time for a linear increase in the dimension. This is known as the ``curse of dimensionality''. In this talk, we demonstrate that such challenges can be overcome for functions with low effective dimensionality – functions which are constant along certain linear subspaces. Such functions can often be found in applications, for example, in hyper-parameter optimization for neural networks, heuristic algorithms for combinatorial optimization problems and complex engineering simulations.

We propose the use of random subspace embeddings within a(ny) global minimisation algorithm, extending the approach in Wang et al. (2013). We introduce two techniques (REGO and AREGO) that transform the high-dimensional optimization problem into a low-dimensional problem. REGO is formulated by adding constraints in the low-dimensional embedded space and AREGO by including constraints in the original space. Using results from random matrix theory and conic integral geometry we derive probabilistic bounds on the success of the dimension-reduced problems. Finally, we present encouraging numerical results.

Nov 14, 201713:00
Bunel Rudy University of Oxford Title: Formal Verification of Piecewise Linear Neural Network: A comparative Study

The success of Deep Learning and its potential use in many important safety critical applications has motivated research on formal verification of Neural Network (NN) models. Despite the reputation of learned NN models to behave as black boxes and the theoretical hardness of proving their properties, researchers have been successful in verifying some classes of models by exploiting their piecewise linear structure. This talk will present current strategies available to perform formal verification of models, as well as newly developped methods.

Oct 31, 201710:00
Gianluca Detommaso University of Bath and The Alan Turing Institute Title: Stein variational Quasi-Newton algorithm: sampling by sequential transport of particles.

In many statistical applications and real-world situations, it is of fundamental importance being able to quantify the uncertainty related to estimates of interest. However, whenever the underlying probability distribution is difficult or unknown, this cannot be done directly and sampling algorithms are typically needed. A recently introduced sampling algorithm is the Stein variational gradient descent [Q. Liu, D. Wang, 2016], where a cloud of particles are sequentially transported towards the target distribution. This is accomplished by a functional gradient descent which minimises the Kullback–Leibler divergence between the current distribution of the particles and the target one.

In collaboration with Dr. A. Spantini (MIT) and T. Cui (Monash, AU), we are currently working on accelerating this algorithm. From a transport maps perspective, we work out second-order information to replace gradient descent with Quasi-Newton algorithms, with potentially huge convergence accelerations. Furthermore, we substitute the simple kernel used in the original algorithm by more sophisticated ones, better representing the interaction between the particles and accelerating their spread along the target distribution support.

Oct 18, 201709:00
Pankaj Pansari University of Oxford and The Alan Turing Institute Title: Optimal Submodular Extensions for Marginal Estimation.

In this talk, I will briefly summarize our recent work on using submodular functions for variational inference. We show a connection between submodular extension of energy functions and LP relaxations of MAP estimation problem. This enables us to find worst-case optimal submodular extensions for Potts and hierarchical Potts models. Importantly, it helps us to use a fast Gaussian-filtering algorithm to operationalize our algorithm for dense CRFs, providing the first practical method to compute upper-bound on log-partition for such models.

Address

The Alan Turing Institute
96 Euston Road, London, NW1 2DB

Contacts

Email: toc_head@turing.ac.uk       Phone: +44 (0) 7378 173765

Organizers:

Coralia Cartis
Florentin Goyens
Adilet Otemissov