I postulate that human or other intelligent agents function or should function as follows. They store all sensory observations as they come - the data is holy. At any time, given some agent's current coding capabilities, part of the data is compressible by a short and hopefully fast program / description / explanation / world model. In the agent's subjective eyes, such data is more regular and more "beautiful" than other data. It is well-known that knowledge of regularity and repeatability may improve the agent's ability to plan actions leading to external rewards. In absence of such rewards, however, known beauty is boring. Then "interestingness" becomes the first derivative of subjective beauty: as the learning agent improves its compression algorithm, formerly apparently random data parts become subjectively more regular and beautiful. Such progress in compressibility is measured and maximized by the curiosity drive: create action sequences that extend the observation history and yield previously unknown / unpredictable but quickly learnable algorithmic regularity. We discuss how all of the above can be naturally implemented on computers, through an extension of passive unsupervised learning to the case of active data selection: we reward a general reinforcement learner (with access to the adaptive compressor) for actions that improve the subjective compressibility of the growing data. An unusually large breakthrough in compressibility deserves the name "discovery". The "creativity" of artists, dancers, musicians, pure mathematicians can be viewed as a by-product of this principle. Several qualitative examples support this hypothesis.
Deep Dive into Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity.
I postulate that human or other intelligent agents function or should function as follows. They store all sensory observations as they come - the data is holy. At any time, given some agent’s current coding capabilities, part of the data is compressible by a short and hopefully fast program / description / explanation / world model. In the agent’s subjective eyes, such data is more regular and more “beautiful” than other data. It is well-known that knowledge of regularity and repeatability may improve the agent’s ability to plan actions leading to external rewards. In absence of such rewards, however, known beauty is boring. Then “interestingness” becomes the first derivative of subjective beauty: as the learning agent improves its compression algorithm, formerly apparently random data parts become subjectively more regular and beautiful. Such progress in compressibility is measured and maximized by the curiosity drive: create action sequences that extend the observation history and yi
A human lifetime lasts about 3 × 10 9 seconds. The human brain has roughly 10 10 neurons, each with 10 4 synapses on average. Assuming each synapse can store not more than 3 bits, there is still enough capacity to store the lifelong sensory input stream with a rate of roughly 10 5 bits/s, comparable to the demands of a movie with reasonable resolution. The storage capacity of affordable technical systems will soon exceed this value.
Hence, it is not unrealistic to consider a mortal agent that interacts with an environment and has the means to store the entire history of sensory inputs, which partly depends on its actions. This data anchors all it will ever know about itself and its role in the world. In this sense, the data is ‘holy. ’ What should the agent do with the data? How should it learn from it? Which actions should it execute to influence future data? Some of the sensory inputs reflect external rewards. At any given time, the agent’s goal is to maximize the remaining reward or reinforcement to be received before it dies. In realistic settings external rewards are rare though. In absence of such rewards through teachers etc., what should be the agent’s motivation? Answer: It should spend some time on unsupervised learning, figuring out how the world works, hoping this knowledge will later be useful to gain external rewards.
Traditional unsupervised learning is about finding regularities, by clustering the data, or encoding it through a factorial code [2,14] with statistically independent components, or predicting parts of it from other parts. All of this may be viewed as special cases of data compression. For example, where there are clusters, a data point can be efficiently encoded by its cluster center plus relatively few bits for the deviation from the center. Where there is data redundancy, a non-redundant factorial code [14] will be more compact than the raw data. Where there is predictability, compression can be achieved by assigning short codes to events that are predictable with high probability [3]. Generally speaking we may say that a major goal of traditional unsupervised learning is to improve the compression of the observed data, by discovering a program that computes and thus explains the history (and hopefully does so quickly) but is clearly shorter than the shortest previously known program of this kind.
According to our complexity-based theory of beauty [15,17,25], the agent’s currently achieved compression performance corresponds to subjectively perceived beauty: among several sub-patterns classified as ‘comparable’ by a given observer, the subjectively most beautiful is the one with the simplest (shortest) description, given the observer’s particular method for encoding and memorizing it. For example, mathematicians find beauty in a simple proof with a short description in the formal language they are using. Others like geometrically simple, aesthetically pleasing, low-complexity drawings of various objects [15,17].
Traditional unsupervised learning is not enough though-it just analyzes and encodes the data but does not choose it. We have to extend it along the dimension of active action selection, since our unsupervised learner must also choose the actions that influence the observed data, just like a scientist chooses his experiments, a baby its toys, an artist his colors, a dancer his moves, or any attentive system its next sensory input.
Which data should the agent select by executing appropriate actions? Which are the interesting sensory inputs that deserve to be targets of its curiosity? I postulate [25] that in the absence of external rewards or punishment the answer is: Those that yield progress in data compression. What does this mean? New data observed by the learning agent may initially look rather random and incompressible and hard to explain. A good learner, however, will improve its compression algorithm over time, using some application-dependent learning algorithm, making parts of the data history subjectively more compressible, more explainable, more regular and more ‘beautiful.’ A beautiful thing is interesting only as long as it is new, that is, as long as the algorithmic regularity that makes it simple has not yet been fully assimilated by the adaptive observer who is still learning to compress the data better. So the agent’s goal should be: create action sequences that extend the observation history and yield previously unknown / unpredictable but quickly learnable algorithmic regularity or compressibility. To rephrase this principle in an informal way: maximize the first derivative of subjective beauty.
An unusually large compression breakthrough deserves the name discovery. How can we motivate a reinforcement learning agent to make discoveries? Clearly, we cannot simply reward it for executing actions that just yield a compressible but boring history. For example, a vision-based agent that always stays in the dark will experience an extremely compressi
…(Full text truncated)…
This content is AI-processed based on ArXiv data.