Algorithmic probability
In algorithmic information theory, algorithmic (Solomonoff) probability is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s.[1] It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the prior obtained by this formula, in Bayes' rule for prediction.[2]
In the mathematical formalism used, the observations have the form of finite binary strings, and the universal prior is a probability distribution over the set of finite binary strings. The prior is universal in the Turing-computability sense, i.e. no string has zero probability. It is not computable, but it can be approximated.[3]
Overview
Algorithmic probability deals with the questions: Given a body of data about some phenomenon that we want to understand, how can we select the most probable hypothesis of how it was caused from among all possible hypotheses and how can we evaluate the different hypotheses? How can we predict future data and how can we measure the likelihood of that prediction being the right one?
Four principle inspirations for Solomonoff's algorithmic probability were: Occam's razor; Epicurus' principle of multiple explanations; modern computing theory (e.g. use of a universal Turing machine) and Bayes rule for prediction.[4]
Occam's razor and Epicurus' principle are essentially two different nonmathematical approximations of the universal prior.
Occam's razor means 'among the theories that are consistent with the observed phenomena, one should select the simplest theory'.[5]
Epicurus' principle of multiple explanations proposes that if more than one theory is consistent with the observations, keep all such theories.[6]
At the heart of the universal prior is an abstract model of a computer, such as a universal Turing machine.[7] Any abstract computer will do, as long as it is Turing-complete, i.e. every finite binary string has at least one program that will compute it on the abstract computer.
The abstract computer is used to give precise meaning to the phrase `simple explanation'. In the formalism used, explanations, or theories of phenomena are computer programs that generate observation strings when run on the abstract computer. A simple explanation is a short computer program; a complex explanation is a long computer program. Simple explanations are more likely, so a high-probability observation string is one generated by a short computer program, or perhaps by any of a large number of slightly longer computer programs. A low-probability observation string is one that can only be generated by a long computer program.
These ideas can be made specific and the probabilities used to construct a prior probability distribution for the given observation. Solomonoff's main reason for inventing this prior is so that it can be used in Bayes' rule when the actual prior is unknown, enabling prediction under uncertainty. It predicts the most likely continuation of that observation, and provides a measure of how likely this continuation will be.
Although the universal probability of an observation (and its extension) is incomputible, there is a computer algorithm, Levin Search, [8] which, when run for longer and longer periods of time, will generate a sequence of approximations which converge to the Universal probability distribution.
Solomonoff proved this distribution to be machine-invariant within a constant factor (called the invariance theorem).[9]
Solomonoff invented the concept of algorithmic probability with its associated invariance theorem around 1960,[10] publishing a report on it: "A Preliminary Report on a General Theory of Inductive Inference."[11] He clarified these ideas more fully in 1964 with "A Formal Theory of Inductive Inference," Part I[12] and Part II.[13]
Further Discussion
Solomonoff described a universal computer with a randomly generated input program. The program computes some possibly infinite output. The universal probability distribution is the probability distribution on all possible output strings with random input.[14]
The algorithmic probability of any given finite output prefix q is the sum of the probabilities of the programs that compute something starting with q. Certain long objects with short programs have high probability.
Algorithmic probability is the main ingredient of Solomonoff's theory of inductive inference, the theory of prediction based on observations; it was invented with the goal of using it for machine learning; given a sequence of symbols, which one will come next? Solomonoff's theory provides an answer that is optimal in a certain sense, although it is incomputable. Unlike, for example, Karl Popper's informal inductive inference theory, Solomonoff's is mathematically rigorous.
Algorithmic probability is closely related to the concept of Kolmogorov complexity. Kolmogorov's introduction of complexity was motivated by information theory and problems in randomness, while Solomonoff introduced algorithmic complexity for a different reason: inductive reasoning. A single universal prior probability that can be substituted for each actual prior probability in Bayes’s rule was invented by Solomonoff with Kolmogorov complexity as a side product.[15]
Solomonoff's enumerable measure is universal in a certain powerful sense, but the computation time can be infinite. One way of dealing with this issue is a variant of Leonid Levin's Search Algorithm,[16] which limits the time spent computing the success of possible programs, with shorter programs given more time. Other methods of limiting the search space include training sequences.
Key people
See also
- Solomonoff's theory of inductive inference
- Algorithmic information theory
- Bayesian inference
- Inductive inference
- Inductive probability
- Kolmogorov complexity
- Universal Turing machine
- Information-based complexity
References
- ↑ Solomonoff, R., "A Preliminary Report on a General Theory of Inductive Inference", Report V-131, Zator Co., Cambridge, Ma. (Nov. 1960 revision of the Feb. 4, 1960 report).
- ↑ Li, M. and Vitanyi, P., An Introduction to Kolmogorov Complexity and Its Applications, 3rd Edition, Springer Science and Business Media, N.Y., 2008
- ↑ Hutter, Scholarpedia, 2(8):2572, 2007.
- ↑ Li and Vitanyi, 2008, p. 347
- ↑ Li and Vitanyi, 2008, p. 341
- ↑ Li and Vitanyi, 2008, p. 339.
- ↑ Hutter, M., "Algorithmic Information Theory", Scholarpedia, 2(3):2519.
- ↑ Levin, L.A., "Universal Search Problems", in Problemy Peredaci Informacii 9, pp. 115–116, 1973
- ↑ Solomonoff, R., "Complexity-Based Induction Systems: Comparisons and Convergence Theorems," IEEE Trans. on Information Theory, Vol. IT-24, No. 4, pp. 422-432, July 1978
- ↑ Solomonoff, R., "The Discovery of Algorithmic Probability", Journal of Computer and System Sciences, Vol. 55, No. 1, pp. 73-88, August 1997.
- ↑ Solomonoff, R., "A Preliminary Report on a General Theory of Inductive Inference", Report V-131, Zator Co., Cambridge, Ma. (Nov. 1960 revision of the Feb. 4, 1960 report).
- ↑ Solomonoff, R., "A Formal Theory of Inductive Inference, Part I". Information and Control, Vol 7, No. 1 pp 1-22, March 1964.
- ↑ Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224–254, June 1964.
- ↑ Solomonoff, R., "The Kolmogorov Lecture: The Universal Distribution and Machine Learning" The Computer Journal, Vol 46, No. 6 p 598, 2003.
- ↑ Gács, P. and Vitányi, P., "In Memoriam Raymond J. Solomonoff", IEEE Information Theory Society Newsletter, Vol. 61, No. 1, March 2011, p 11.
- ↑ Levin, L.A., "Universal Search Problems", in Problemy Peredaci Informacii 9, pp. 115–116, 1973
Sources
- Li, M. and Vitanyi, P., An Introduction to Kolmogorov Complexity and Its Applications, 3rd Edition, Springer Science and Business Media, N.Y., 2008
Further reading
- Rathmanner, S and Hutter, M., "A Philosophical Treatise of Universal Induction" in Entropy 2011, 13, 1076-1136: A very clear philosophical and mathematical analysis of Solomonoff's Theory of Inductive Inference