Subjective functions

Reading time: 5 minute
...

📝 Original Info

  • Title: Subjective functions
  • ArXiv ID: 2512.15948
  • Date: 2025-12-17
  • Authors: Samuel J. Gershman

📝 Abstract

Where do objective functions come from? How do we select what goals to pursue? Human intelligence is adept at synthesizing new objective functions on the fly. How does this work, and can we endow artificial systems with the same ability? This paper proposes an approach to answering these questions, starting with the concept of a subjective function, a higher-order objective function that is endogenous to the agent (i.e., defined with respect to the agent's features, rather than an external task). Expected prediction error is studied as a concrete example of a subjective function. This proposal has many connections to ideas in psychology, neuroscience, and machine learning.

💡 Deep Analysis

Figure 1

📄 Full Content

Subjective functions Samuel J. Gershman Department of Psychology and Center for Brain Science Kempner Institute for the Study of Natural and Artificial Intelligence Harvard University Abstract Where do objective functions come from? How do we select what goals to pursue? Human intelligence is adept at synthesizing new objective functions on the fly. How does this work, and can we endow artificial systems with the same ability? This paper proposes an approach to answering these questions, starting with the concept of a subjective function, a higher-order objective function that is endogenous to the agent (i.e., defined with respect to the agent’s fea- tures, rather than an external task). Expected prediction error is studied as a concrete example of a subjective function. This proposal has many connections to ideas in psychology, neuro- science, and machine learning. 1 Introduction Objective functions are central to all learning systems (both natural and artificial). The way we distinguish learning from other kinds of dynamics is the fact that learning produces (at least in expectation or asymptotically) an improvement in performance as measured by an objective func- tion.1 Many different objective functions have been proposed, and it’s not clear that all intelligence can be subsumed by a single “ultimate” objective, such as reproductive fitness.2 Perhaps the prob- lem is that the quest for a single objective function is misguided. An important characteristic of human-like intelligence may be the ability to synthesize objective functions. This only kicks the can down the road, of course. What principle disciplines the choice of objective function? Wouldn’t any such principle constitute a higher-order objective function? If so, then we would be back to where we started, the quest for a single objective function. A true subjective function (rather than a higher-order objective function) is endogenous to the agent. To understand what this means, consider a typical way to define an objective function: stipulate some reward or supervision signal, then score an agent based on how well it maximizes expected reward or minimizes expected error. These signals are exogenous to the agent in the sense that their definitions do not depend on any feature of the agent; they can be applied uni- formly to any agent. In contrast, a subjective function is endogenous to the agent in the sense that the definition of the signal that the agent is optimizing depends on features of the agent. 1For example, passive wear and tear degrades the function of living organisms and robots over time, but this is not learning, because it cannot be understood in terms of performance improvement over time. 2Even the reasonable argument that all forms of biological intelligence arose from natural selection is not very helpful for elucidating the underlying principles that give rise to intelligent behavior. 1 arXiv:2512.15948v1 [cs.AI] 17 Dec 2025 This note describes a subjective function that can be used to design an agent capable of open- ended learning. It then discusses how it connects to observations from psychology and neuro- science, as well as related ideas in machine learning. 2 Preliminaries We model the world as a Markov decision process (MDP) consisting of the following components: • A state space S. • An action space A. • A transition distribution T(s′|s, a). • The agent chooses actions according to a policy π(a|s). Importantly, we do not assume a fixed reward function. Instead, we allow the agent to choose its own reward function. For concreteness, we will study the case where the reward function is parametrized by a specific goal state g ∈S: Rg(s) = I[s = g]. (1) Thus, the reward is 1 only when the agent has reached the goal state. It’s relatively straightforward to extend this setup (e.g., to reward functions that are linear in some features space), but the goal- based framework is appealingly simple and applicable to many environments that are natural for humans. 3 Design principles Principle 1: agents select policies that maximize expected prediction error A standard assumption in reinforcement learning (RL) theory is that agents seek to maximize expected discounted future reward, or value: V π g (s) = E " ∞ X t=0 γtRg(st) s0 = s, π, g # = Rg(s) + γ X a π(a|s) X s′ T(s′|s, a)V π g (s′), (2) where t indexes time and γ ∈[0, 1) is a discount factor governing how the agent values long-term reward. The second equality is the Bellman equation. We instead adopt the assumption that the agent seeks to maximize expected prediction error (EPE), which replaces the reward with the temporal difference (TD) prediction error δt: π∗ g(·|s) = argmax π(·|s) Uπ g (s), (3) Uπ g (s) = E " ∞ X t=0 γtδt s0 = s, π, g # , (4) δt = Rg(st) + γ ˆV π(st+1) −ˆV π(st), (5) 2 where ˆV π g is the agent’s estimate of the value function. Intuitively, the EPE measures a form of goal progress, because δt ≈˙V π g prior to goal attainment (i.e., the prediction

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut