October 26, 2020



Determinantal Point Processes are useful point processes for subsampling. They come in two variants: fixed size and varying size, with the latter being more tractable but somewhat less practical. We show that the two variants have essentially the same marginals in large ground sets, and give numerically stable algorithms for computing inclusion probabilities.


How to separate different odorants in a certain class of electronic noses.

Coresets are small weighted subsets that provide a (provably) good summary of a large dataset. We show how to use DPPs to build coresets. 


Assume a signal that lives on the nodes of a graph, and suppose you cannot measure the signal at every node. How do you pick nodes such as to be able to reconstruct the full signal? We suggest using Determinantal Point Processes for that task. See also (in French):

EP is a popular method for variational inference which can be remarkably effective despite the fact that there’s very little theory supporting it. Our main contribution is to show that EP is asymptotically exact, meaning that when the number of datapoints goes to infinity you’re guaranteed to recover the exact Gaussian posterior. It turns out to be quite hard to prove and we introduce some new theoretical tools that help analysing EP formally, including a simpler algorithm that’s asymptotically equivalent to EP (aEP).    


The olfactory system faces the problem of having to detect specific odorants that are never present in isolation, but rather in a complex, ever-changing olfactory soup. How does the brain do it?

What to do when your analysis depends on a certain distance function, but that distance function is not uniquely defined? Look at how distance patterns change. Software package at


We prove that EP is remarkably accurate (under strong assumptions), in the sense that the approximation given by EP converges very fast to the optimal approximation as the dataset grows.

A follow-up to our JASA paper on EP-ABC, to appear in the Handbook of Approximate Bayesian Computation, edited by S. Sisson, L. Fan, and M. Beaumont. We explain how to parallelise the algorithm effectively, and we illustrate with an application to spatial extremes.

In inference for unnormalised statistical models, you have a likelihood function whose normalisation constant is too hard to compute (for example, an Ising model). It’s an important class of models in machine learning, computer vision and statistics. We show that there is a principled way of treating the missing normalisation constant as a parameter to estimate, via a connection to point process estimation. Guttman & Hyvärinen’s “contrastive divergence” can be viewed as a practical approximation of that technique.


Eye movements in visual scenes cluster at small scales (fixations have nearer neighbours than chance would predict). Why? Sequential dependencies.

How to speed up inference for Gaussian process models over sets of related functions (e.g. the latent rate of spike trains over repeated trials). 

Likelihood-free inference is what you end up doing when you have a model whose likelihood function is very hard or impossible to compute. We show that Thomas Minka’s expectation-propagation algorithm can be wonderfully effective in a likelihood-free context, given a few modifications. Using pseudo-likelihood techniques and EP-ABC you could estimate essentially any kind of model.


Statistical tools for the analysis of eye movement data. Also, an attempt at a user-friendly introduction to spatial point processes.


Minor comment on a Read Paper of the RSS.


How complex are the mechanisms the visual system uses to evaluate its uncertainty? The most simple strategy is to follow an obvious cue to visual uncertainty, like contrast. We find that people do something more complicated than that. 


How do we know when to trust our visual sense? That is, how do we know when we are getting reliable information out of our visual system? This paper looks at the issue from a Bayesian point of view. We set up a visual task with a well-defined objective uncertainty: for every stimulus we show subjects, we have a measure on how much information the stimulus actually contains. We show that  observers’ subjective uncertainty correlates with the objective uncertainty in the task. We describe and compare two simple computational models that explain how subjective and objective uncertainty could be linked.

PLoS Comp Bio has done a rather terrible job with the layout on that one (you’d think they could do a little better considering the $2,200 they charge for publication), so I’ve made an alternative PDF with better looking equations, figures that are actually centred on the page plus the Supplementary Information (in which a couple of minor mistakes and typos have been fixed).

In classification image experiments (and related techniques like Bubbles), you show subjects random stimuli and kindly ask them to categorise these stimuli: for example you might show them random faces, to be classified as male or female. The hope is to be able to characterise what parts of the stimuli are used by the subject in their judgement. The way this is usually done is by assuming that the behaviour of the subject is roughly linear in stimulus space, and characterising the observer’s behaviour boils down to running a regression, identifying which dimensions of the stimulus influence subjects’ responses. If you describe stimuli as a set of pixels than there usually are far too many dimensions to estimate anything reliably. In this paper we suggest that using sparse priors in the right basis yields much better estimates of the observer’s strategy than traditional techniques. This amounts to assuming that most dimensions are irrelevant to how subjects classify stimuli, so that we can focus on the dimensions that matter.