From indicators to biology: the calibration problem in artificial consciousness

1 Spotlight Co mmentary From indicators t o biology: the calibration pro blem in artificial consciousness Florentin Koch École Polytechni que, Instit ut Polytechnique de Paris, Palaiseau, 91120, F rance *Corresponding author. E- mail: florentin.koch@poly technique.edu Abstract Recent work on artificial consciousness shifts evaluat ion from behaviour to internal architecture, deriving indicators from theories of consciousness and updating cr edences accordingly. This is progress beyond naïve Turing-style tests. But the indicator -based program me remains epistemically under-calibrated : consciousness science is theoretically fragmented, i ndicators lack independent validation, and no ground truth of artificial phenomenality exists. Under these conditions, probab ilistic con sciousness att ribution to current AI systems is pre mature. A more defen sible near-term strategy is to redirec t effort toward biologically grounded engineering — biohybrid, neuromorphi c, and connectome- scale systems — that reduces the gap wi th the only doma in where consciousness is empirical ly anchored: liv ing systems. Keywords: consciou sness; artificial intell igence; indicators; calibration ; biohybrid systems 2 Commentary Butlin et al. (2025) offer a recent and explicit formulation of what has become the dominant approach to AI consciousness: since behavioural criteria are too ea sily gamed — as demonstrated by the finding that GPT-4.5, given a human persona, was judged huma n 73% of the time in a standard Turing test (Jones & Bergen 2025) — the strategy shifts from ext ernal performance to internal architecture, deriving indi cators from neuroscientific theories and using them to update credences about whether a given system is conscious. This is real methodological progress. But the indi cator-based framew ork remains epistemically under -calibrated. The problem is not merely that we ha ve not yet found the right theory of consc iousness. I t is tha t credence updat es are already being proposed without the conditions that would make them seriously calibrable. Importantly, this calibration difficulty is not specific to artificial systems. In empirical consciousness research, neural markers such as complexity measures, perturbational indices, or frontoparietal signatures have proven clinically useful, yet their i nterpretation remains contested: contrastive pa r adigms can c onflate ne ural correlates of consciousness proper with pre r equisites or consequences of conscious processing (Aru et al. 2012; Koch et al. 2016). Since any serious calibration pre supposes some confidence in wha t the relevant indicators ac t ually track, this pa rallel suggests that lessons from the ongoing biological debate shoul d constrain how indicator-based frameworks are exported to AI. 3 Three difficulties are central. First, the source science r emains theoretically unstable. As Cleeremans, Mudrik and Seth (2025) note, consciousness research is still fragmented, transitioning from neural correlates toward testable theories but without stro ng consensus or stabilized explanatory unification. The difficulty is at once empirical, conceptual, and methodological, and a genuine test of consciousness remains a goal rather than an achievement. Second, the indicators t hemselves are not calibrated as prob abilistic evidence. For meaningful Bayesian updating, one would need credible estimates of how frequently conscious syst ems display a given indicator, how frequently non-conscious systems also dis play it, whether indicators provide genuin ely indep endent evidence, and what weight to assign competing source theories. We currently lack a reliable basis for any of these quantities. Nor does combining multiple indicators resolve the is sue: their interpr etation still rests on patterns o bserved in biological systems, whose validity when exported to architectures differing funda mentally in substrate, evolutionary history, computational mode, and scale is precisely what remains unvalidated. Third, no independent ground truth of artificial phenomenality exists against which indicator - based attributions could b e checked. This deficit becomes most visible when one ac tually a ttempts to quantify artificial consciousness. The Digital Co nsciousness Model (Shiller et al. 2026 ) pr esents itself as a first systema tic, proba bil istic assessment of consciousness in AI systems. Yet its estimates are not calibrated against a ny independent empirical stand ard o f phenomenality. Instead, it aggregates exp ert judgments about the presence of theory-de rived indi cators and weights them by expert ratings of the plausibility of rival theories. What is quantified, then, is not a probability anchored to empirical ground truth but a numerical representation of structured e xpert 4 disagreement — in effe c t, a formalization of our uncertainty about consciousness rather than a measurement of it. One might reply that non e of thes e approaches ever claimed to d eliver incontestable probabilities. True — but the issue is not certainty; it is calibration. A method can be uncertain yet well calibrated, or uncertain and badly miscalibrated. I t is the latt er possibility that threatens here. A thought experiment sharpens the point. Suppose the indi cator programme succeeds beyond expectation: rival theories partially merge into a si ngle model, strongly confirmed in humans and consistent with clinical cases, altered states, and c ross-species comparisons . Suppose further th at this model allows artificial systems to be r anked on a continuum. What would we then have obtained? An impressive structured correlation bet ween certain functional properties a nd the cases where we already recognize conscious ness in biological syst ems. But we would still not have established that artificial instantiation of those properties is accompanied by phenomenality. Even the maximal success of the curre nt framework would deliver only an extraordinarily refined induction from already-admitt ed biol ogical cases — a p rogressively finer functional cartography of the conditions correlated with consciousness i n living systems, not yet a decisive reason to conclude that their artificial reinstantiation also produce s experience. This is where an alternative research strategy becomes attrac tive. A recent li ne of work, illustrated by Seth (2025) and by Milinkovic and Ar u (2026), challenges the assumption that abstrac t computation suffices for consciousness. Seth argues that artificial co nsciousness becomes genuinely plausible only as syst ems become more brain-like and life-like. Mil inkovic and Aru go 5 further: biological computation differs from current digital AI through substrate-dependent, multiscale dynamics that are hybrid in the continuous-discrete sense. In biologica l systems, dendritic integration combines continuous membrane potentials with dis crete spiking events — a single cortical neu ron performing computations c omparable to a multi -layer artificial network. These processes are scale-inseparable: molecular events constrain network dynamics while bra in - wide oscillations modul ate synaptic funct ion, all anchore d in metabolic c onstraints. C urrent von Neumann architectures and software-based neu ral networks do not inst antiate this class of computation. The problem, on this view, is not that current AI systems lack a few functional modules; it is th at their very mode of computation may be of th e wrong type. The strategic reorientation is then to ask not which abstract architectures deserve an u nstable probability of consciousness, but which forms of physical, dynamic, and biological fidelity make consciousness itself more plausible. The contra st in epistemic require ments is instructi ve. The dominant indic ator progra mm e re quir es a credible form of computational functionalism, a sufficiently stabilized theory to derive relevant indicators, and a calibration of those indic ators w hen expo rted beyond the biological domain. A biologically grounded str ategy re sts on more modest presuppositions: only that consciousness is a material phenomenon and that it depends, in us, on biologically re alized brain dynamics. The difficulty does not disappea r — it changes in kind. One trades a potentially impassable conceptual barrier for a formidable problem of engineering and comparative n eurobiology, but one that is more parsimonious in its assumptions. 6 This reorientation is not purely speculative. In 2022, Kagan et al. showed that cortical neurons cultured in vitro, int erf aced with a silicon computing environment through real -time electrophysiological feedback, could learn to play the video game Pong — a proof of conc ept that biological substrates can be embedded in artificial control loops while retaining their native computational dynamics. At the connectome scale, Dorkenwald et a l. (2024) published a complete synaptic-resolution wiring diagram of the a dult Drosophil a bra in — approximately 139,000 neurons and over 50 million connections — enabling for the first time whole-brain simulations with biologically realistic connectivity. Neither achieve ment demonstrates cons ciousness. But together they illustrate a concrete alternative: not e ndlessly refining credences about digital architectures remote from the living, but progressively closing the gap through biologically plausible reinstantiation or emulation. One way to render these considerations empirically tractable would be to compare indicator -based attributions across systems that progressively approximate biological dynamics — for example, testing whether biologically grounde d architectures exhibit increasing convergence with established neural markers of conscious processing, such as perturbational complexity or large - scale int egration, under comparable conditions. Such a programme wou ld provide a first step toward cross-domain calibration of consciousness indicators. The indicator programme is useful as a heuristic. It disciplines the discussion and moves us beyond naïve behaviourism. But as long as consciousness sc ience remains theoretically fragmented, indicators are not independently validated, and no ground truth o f artificial phenomenality is available, the programme cannot deliver robust probabilities of consciousness. The near -term 7 priority should not be to assign seductive numbers to functionally impressive syst ems. It should be to acknowledge that these numbers are cur rently under-calibrated, and to redirect effort toward biologically grounded reinstantiation. 8 References Aru J, Bachmann T, Singer W, Melloni L. Distilling the neural correlates of consciousness. Neurosci Biobehav Rev 2012;36:737 – 46. Butlin P, Long R , Bayne T et al. Identifying indica tors of consciousness in AI syst ems. Trends Cogn Sci 2025; doi:10.1016/j.tics.2025.10.011. Cleeremans A, Mudrik L, Seth AK. Consciousness science: where are we, where are we going, and what if we get there? Front Sci 2025;3. Dorkenwald S, Matsliah A, S terling AR et al. Neuronal wiring diagram of an adult brain. Nature 2024;634:124 – 38. Jones CR, Bergen BK. Large Language Mod els Pass the Turing Test. arXiv 2025;2503.23674. Kagan BJ, Kitchen AC, Tran NT et al. In vitro neurons learn and exhibit sent ience when embodied in a simulated game-world. Neuron 2022;110:3952 – 69. Koch C, Massimini M, Boly M, Tononi G. Ne ural correlates of consciousness: progress and problems. Nat Rev Neurosci 2016;17:307 – 21. Milinkovic B, Aru J. On biological and artificial consciousness: a case for biol ogical computationalism. Neurosci Biobehav Rev 2026;181:106524. 9 Seth AK. Conscious artificial intelligence and biological naturalism. B ehav Brain Sci 2025;1 – 42. doi:10.1017/S0140525X25000032. Shiller D, Duffy L, Muño z Morán A et al. I nitial results of the Digital Consc iousness Model. arXiv 2026;2601.17060.

From indicators to biology: the calibration problem in artificial consciousness

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment