The speaker at the 2003 Inaugural Kolmogorov Lecture was Professor Ray Solomonoff, Principal Scientist at Oxbridge Research, Massachusetts, and one of the worlds foremost researchers in the field of Kolmogorov Complexity. Professor Solomonoff first discovered the Universal Probability Distribution in 1960, and used it as the basis of a general theory of Inductive Inference and Scientific Discovery. Five years later, A. Kolmogorov discovered Kolmogorov Complexity. Though it is very closely related to the universal distribution, Kolmogorov's initial interest was in defining randomness and in complexity itself - he was surprised to learn of the earlier application of this idea to inductive inference.
Solomonoff's lecture was entitled: "A General System for Incremental Learning". He discussed important properties of the Universal Distribution, and how we can use it to create a program for machine learning of great power and intelligence. An abstract of this talk is included below and a more complete manuscript on the subject- "Progress in Incremental Machine Learning" - is available at Professor Solomonoff's website. The lecture itself was published in the Computer Journal.
|Solomonoff Lecturing||Kolomogorov Medal Presention|
We will describe recent developments in a system for machine learning that we've been working on for some time. The system is designed to learn to solve two kinds of problems. Most, but not all problems in science and engineering are of these two kinds.
The first kind is function inversion. These are the P and NP problems of computational complexity theory. They include theorem proving, solution of equations, symbolic integration, etc.
The second kind of problem is time limited optimization. Inductive inference, surface reconstruction, and image restoration are a few examples of this kind of problem. Designing an automobile in 6 months satisfying certain specifications and having minimal cost, is another.
In the following discussion, we will be using the term "probability" in a special sense: i.e. the estimate given by the best probabilistic model that we can find in the available time.
Our system starts out with a small set of Problem Solving Techniques (PST's) and a simple General Conditional Probability Distribution (GCPD). When the system is given a problem, the description of this problem is the "Condition" for the GCPD. Its output is a probability distribution on PST's - the likelihood that each of them will be the best way to solve the problem. It uses these PST's and their associated probabilities to solve the first problem.
Next, it executes its update algorithm: The PST's are modified, new ones may be added, some may be deleted. The GCPD is modified. These changes attempt to incorporate into the system, information obtained in solving recent problems, and prepare it to solve harder problems.
The next problem presentation and the system's solution is followed by the corresponding update and so on. After the nth update, the system is usually able to work problems more difficult than the nth. Giving the system a suitable training sequence of problems of progressive difficulty, makes it possible for the system to eventually solve very hard problems.
One critical component in the system is the initial set of PST's. These will be Levin's Universal Search Algorithm (Lsearch). Simple Lsearch is only practical for very small problems, since its solution time is exponential in problem size. In the update phase, we modify the probability distribution that is used to guide Lsearch. This effectively reduces the sizes of more difficult problems, so that Lsearch becomes feasible for them.
The update algorithm is another critical component. It has been designed to facilitate "transfer learning", so that the system can utilize any similarities between problems in the same or disparate domains. It has been possible to express this algorithm as an inductive inference problem of a type the system normally solves - so we can simply ask the system to update itself as one of its regular problems.
Perhaps the most critical component of all is the training sequence - To devise a sequence of problems so that each of them challenges the system to devise new concepts to solve it, yet keeping the challenges within the capacity of the machine. For PST's that use simple Lsearch, this is not hard to do. It is always possible to get a good estimate of the time needed for the system to find a known solution to a problem. For certain modifications of Lsearch this is not possible and training sequence design may become as difficult as teaching people!
The system's generality invites comparison with Genetic Programming. We will discuss differences in the two approaches to optimization problems, and suggest a way in which Genetic Programming might be made a part of the present system.
Another way to look at the system is to think of it as a "Wrapper" system - a system that modifies its parameters in view of its experience with earlier problem solving. The present system is an extreme Wrapper system in that - through the magic of Universal Search - it is possible for the entire system to effectively replace itself by one that is more efficient.