Bernhard Schölkopf:  

Simple Explanations for Complex Things - Statistical Learning Theory as a Model for Science


Statistical learning theory is a novel field at the intersection of statistics and artificial intelligence research. It studies the following scenario: assume you are given a number of training observations, (X1,Y1),(X2,Y2),..., of two observables X and Y. Assume, moreover, that X and Y are interrelated by an unknown dependency - mathematically, this is modelled by a probability distribution P(X,Y). The simplest form of statistical learning tries to infer a function f which, given values of X, allows the prediction of values of Y such that on average, the error is small. Clearly, any function that will generally have a small error should in particular have small errors on the training observations. In this sense, it is necessary to be consistent with (i.e. to "explain") the training observations. However, the main insight of statistical learning theory, founded in the 1960s in Russia by Vapnik and Chervonenkis, is that this is not sufficient: in order to generalize to new observations, one should not only explain the training observations, but explain them with a model which is simple in a sense that can be stated mathematically. Intuitively, this result, that can be thought of as a formalization of aspects of the Popperian philosophy of science, can be understood as follows: using a sufficiently complex model, it is possibly to explain any set of observations. If, vice versa, it turns out to be possible to explain a large set of training observations using a simple model, then it is very likely that the explanation must have captured aspects of the underlying dependency.

We can think of this process of learning f from observations as a simple model of science, f being a "law of nature" inferred from the observations. Of course, science not only studies laws of nature that can be cast into the simple form of functional dependencies. Often, physical laws are formulated as differential equations. The procedure of inferring a physical law from observations, however, is not fundamentally different from the above. If this is true, then what does this teach us about the nature of a physical law?

Physical laws have to be simple insofar as they are inferred from a limited number of observations. This is a methodological feature of the scientific way of approaching reality, and not necessarily a reflection of an underlying reality being simple. Vice versa, if there are dependences in nature that are very complex, then we might not be able to identify them at all from limited data, and the phenomena might appear random to us. The question why there seem to be so many natural laws that are simple in a sense that coincides with our notions of simplicity, i.e. why we are able to identify structure in the world, has yet to be answered.

12 February 2000