Bernhard Schölkopf:

Simple Explanations for Complex Things - Statistical
Learning Theory as a Model for Science

Statistical learning theory is a novel field at the intersection of
statistics and artificial intelligence research. It studies the following
scenario: assume you are given a number of training observations, *(X1,Y1),(X2,Y2),...,*
of two observables *X* and *Y*. Assume, moreover, that *X*
and *Y* are interrelated by an unknown dependency - mathematically,
this is modelled by a probability distribution *P(X,Y)*. The simplest
form of statistical learning tries to infer a function *f* which,
given values of *X*, allows the prediction of values of *Y*
such that on average, the error is small. Clearly, any function that
will generally have a small error should in particular have small errors
on the training observations. In this sense, it is *necessary*
to be consistent with (i.e. to "explain") the training observations.
However, the main insight of statistical learning theory, founded in
the 1960s in Russia by Vapnik and Chervonenkis, is that this is not
*sufficient:* in order to generalize to new observations, one should
not only explain the training observations, but explain them with a
model which is *simple* in a sense that can be stated mathematically.
Intuitively, this result, that can be thought of as a formalization
of aspects of the Popperian philosophy of science, can be understood
as follows: using a sufficiently complex model, it is possibly to explain
any set of observations. If, vice versa, it turns out to be possible
to explain a large set of training observations using a simple model,
then it is very likely that the explanation must have captured aspects
of the underlying dependency.

We can think of this process of learning *f* from observations
as a simple model of science, *f* being a "law of nature" inferred
from the observations. Of course, science not only studies laws of nature
that can be cast into the simple form of functional dependencies. Often,
physical laws are formulated as differential equations. The procedure
of inferring a physical law from observations, however, is not fundamentally
different from the above. If this is true, then what does this teach
us about the nature of a physical law?

Physical laws have to be simple insofar as they are inferred from
a limited number of observations. This is a methodological feature of
the scientific way of approaching reality, and not necessarily a reflection
of an underlying reality being simple. Vice versa, if there are dependences
in nature that are very complex, then we might not be able to identify
them at all from limited data, and the phenomena might appear random
to us. The question why there seem to be so many natural laws that are
simple in a sense that coincides with our notions of simplicity, i.e.
why we are able to identify structure in the world, has yet to be answered.

12 February 2000