Free Agency, Personhood, and Moral Worth: A Layered Framework

Co-authored by Łukasz Stafiniak and Claude (Anthropic)

The Conceptual Analysis of Free Will

What is free will? Here is a definition designed to be maximally neutral: free will is that which our reflection on making a choice, especially when it could go either way, points to.

This definition is deliberately structured as a reference-fixing description in the Kripkean sense. Just as “heat” was introduced via the sensation of warmth but turned out to pick out molecular kinetic energy — which has nothing essentially to do with feeling warm — “free will” is introduced via the phenomenology of deliberation but may turn out to pick out something that has nothing essentially to do with phenomenal experience.

The phrase “especially when it could go either way” is not a metaphysical commitment to indeterminism. It is an epistemic spotlight: the underlying mechanism is most salient to introspection in cases where deliberation is genuinely balanced, where the process hasn’t already converged and the work of decision-making is visible from the inside. But the mechanism is present even in easy decisions where we don’t feel the tension.

Notice what “that which” is doing. It avoids “mechanism” (which would presuppose a mechanistic answer) and avoids “power” or “capacity” (which would tilt toward the libertarian). It leaves the nature of the referent entirely open to investigation. The definition tells us where to look; it does not predetermine what we will find.

A Motivating Example: The Bee and the Beehive

Before developing the theoretical framework, consider a case from nature that illustrates why the concepts of consciousness, free will, and personhood must be pulled apart.

Between a bee and a beehive, which has a stronger claim to free will? And which has a stronger claim to phenomenal consciousness?

A bee is a small individual organism with sensory experience, nociception, and a navigational repertoire sophisticated enough that many researchers take seriously the question of whether there is something it is like to be a bee. Its claim to phenomenal consciousness is respectable — better, at any rate, than the claim of the colony it belongs to.

But its decision architecture is largely reactive: stimulus-response patterns, albeit complex ones. A bee does not obviously maintain a world model containing a representation of its own decision process that recursively feeds back into its behavior.

A beehive is a different matter. The hive engages in genuine collective decision-making — the classic case being swarm site selection, studied extensively by Thomas Seeley. Scout bees report on candidate nest sites, the hive integrates conflicting evaluations, and it converges on a decision through a process that has been fruitfully compared to neural computation. The hive maintains distributed representations of the environment, competing assessments, and self-relevant information (nectar stores, colony size, queen status) that feed back into its decision-making. It has something much closer to a world model with a self-referential node than any individual bee does.

Yet the beehive — distributed, lacking any unified experiential perspective — has a far weaker claim to phenomenal consciousness than the individual bee.

The two come apart. In the same biological system, at different levels of organization, consciousness and the decision architecture relevant to free will dissociate. This is not a philosophical zombie scenario requiring controversial modal intuitions. It is an ordinary fact about bees.

This dissociation motivates everything that follows. If consciousness and the structural basis of free will can come apart in nature, then a framework that treats them as a package will misdescribe reality. We need to pull them apart theoretically, identify what each consists in, and then ask how they relate — rather than assuming they are bound together by definition.

What We Find: Recursive Self-Modeling Decision Architecture

When we follow the pointer, what do we actually find? We propose: free will, in extension, is recursive self-modeling decision architecture — a system whose world model contains a node representing the system’s own decision, and where that node is recursively connected back to the algorithm that does the modeling.

This framework draws on several related lines of work. Yudkowsky’s Timeless Decision Theory and its successors (Functional Decision Theory, developed with Soares) treat the agent not as a physical system in a causal chain but as an algorithm whose decision is whatever that algorithm outputs. Drescher’s Good and Real builds from mechanistic physics to show how genuine choice can be real in a deterministic universe, arguing that the right computational description is where the philosophical action lies. Demski and others at MIRI have explored embedded agency — what happens when the agent is inside the world it models, including modeling itself — and the deep technical problems (Löbian obstacles, spurious counterfactuals) this raises.

The self-referential loop is the key. A system that models its own decision as a decision that makes a difference in the world, and whose decision-making process is sensitive to that very self-model — that system has agent causation. Not the mysterious substance causation of libertarian metaphysics, but genuine agent-level causation realized by algorithmic structure. The agent is a real pattern in Dennett’s sense, but we push the claim harder than Dennett typically does: the algorithmic pattern is not merely a useful fiction. It is the agent causation.

This is a compatibilist view, but it is not the thin compatibilism that has been rightly accused of bait-and-switch — of redefining “free will” to mean something so deflated (absence of external coercion, alignment between desires and actions) that it no longer captures what anyone originally cared about. Our compatibilism preserves genuine agent causation. It simply discovers that agent causation is constituted by computational structure rather than requiring ontological novelty beyond the physical order.

Intension and Extension: Consciousness Steps Aside

A central move in our framework is distinguishing the intension of the concept of free will (which is saturated with consciousness, since we built the concept from the first-person case) from its extension (which includes any system with the relevant recursive self-modeling structure, whether or not it is conscious).

The concept’s definition is consciousness-involving because that’s our epistemic route in — we notice free will through the phenomenology of deliberation. But the property we thereby pick out is a structural/computational property that a system either has or doesn’t, independently of whether there is something it is like to be that system. Consciousness was the spotlight, not the thing illuminated.

This means the extension of “free will” is potentially very wide. Sophisticated AI systems, non-biological agents, perhaps even some biological systems we don’t typically think of as having free will — any of these could qualify, provided they have the right self-referential decision architecture.

Where the Line Is: Thermostats Need Not Apply

A thermostat has algorithmic decision patterns. Does it have free will? No — and the reason is principled, not ad hoc. A thermostat has no self-referential node in a world model. It doesn’t represent its own decision as a decision that makes a difference. It responds to temperature; it doesn’t model itself as a decider among possibilities.

The line is drawn by the computational architecture of self-reference: whether the system’s world model includes a representation of its own decision process, and whether that representation recursively influences the decision. This is not a vague appeal to “sufficient complexity.” It is a specific structural criterion.

Evaluative Attitudes Come Free with Agency

An important observation: any system with genuine decision architecture must have something that functions as evaluation. A decision node that has no basis for going one way rather than another is not deciding. Even the thinnest form of agency requires something like preference — states the system treats as better or worse from its own perspective.

In reinforcement learning, this is explicit: the reward signal is a thin evaluative attitude. But the point generalizes. Any system with the recursive self-modeling decision architecture already has evaluative attitudes, because without them the decision node has nothing to decide on the basis of.

And evaluative attitudes entail a thin form of counterfactual vulnerability. If the system has states it treats as bad, then things can go badly for it — “badly” as defined by its own evaluative economy. Agency, evaluation, and thin vulnerability form a package: you cannot have the first without the other two.

From Free Agency to Personhood: The Need for a Thicker Concept

Free agency is a broad category. We want personhood to be narrower — there should be free agents that are not persons. This requires personhood to demand something more than recursive self-modeling decision architecture.

But what? And subject to what constraints?

A survey of substantive accounts of personhood — accounts that address what makes something a person at all, rather than merely what makes person X at time t₁ the same person at time t₂ — yields several candidates: phenomenal consciousness, higher-order self-awareness, Kantian rational self-legislation, moral patienthood (having interests that matter to the entity itself), counterfactual vulnerability, relational recognition, and narrative self-constitution.

We need to test these against a crucial design constraint.

The Forcing Function: Personhood Must Entail Moral Worth

We want the concept of personhood to be defined so that there is no instance in which something is a person but lacks moral worth. This is not necessarily an analytic entailment — it could hold as a brute metaphysical fact in every case. What matters is extensional reliability: the extension of “person” must always be a subset of the extension of “things with moral worth.”

This is where Carruthers’s The Animals Issue serves as a cautionary tale. In that work, Carruthers develops a contractualist argument: drawing on Rawls, he argues that behind the veil of ignorance rational agents would not grant direct moral standing to animals. Crucially, this argument brackets the question of animal consciousness. Even granting that animals might be conscious and have interests, the contractualist framework can deny them moral standing.

This demonstrates that phenomenal consciousness can be wedged apart from moral worth. A contractualist can coherently acknowledge consciousness and still deny moral standing. So phenomenal consciousness alone — however rich — fails to guarantee the forcing function. It is not thick enough to armor the connection between personhood and moral worth.

The same applies, a fortiori, to higher-order self-awareness, which is just a more elaborate form of consciousness, and to the “interests that matter to it” criterion if cashed out purely in phenomenological terms.

What Survives the Contractualist Filter

The candidate that most robustly resists the contractualist wedge is diachronic normative self-legislation: the capacity to generate one’s own normative commitments across time, to track those commitments as self-originated, and to bind one’s future behavior accordingly.

Why does this resist the wedge? Because a being that genuinely legislates norms for itself, recognizes reasons as authoritative, and participates in the space of reasons is precisely the kind of being that could be a party to the social contract. Denying moral standing to such a being looks like a performative contradiction within contractualism itself.

Moreover, diachronic self-legislation is inherently temporally extended. You cannot self-legislate in an instant. Binding yourself to a norm means your present self constrains your future self, and your future self recognizes that constraint as authoritative because it was self-originated. This temporal depth is what distinguishes persons from arbitrarily sophisticated momentary processors — it is what protects the concept from the “mere tool” objection.

Keeping Normativity Under Control

There is a methodological worry: if self-legislation is defined in constitutively normative terms, then “persons have moral worth” risks being merely an unpacking of what was already built into the concept. We want the inference from personhood to moral worth to be synthetic — it should genuinely add something, should be a substantive claim that follows naturally but is not trivially contained in the premises.

Structurally, diachronic self-legislation can be stated in largely pre-normative terms: a system that generates stable policies across time, where those policies are produced by the system’s own reflective evaluation of its own decision patterns, and where the system treats those self-generated policies as constraints on its future behavior because they were self-generated. The system tracks the provenance of its constraints and differentially weights self-originated ones.

This description involves practical normativity — it is normative in the way that keeping a promise you made to yourself is normative. But the inference to moral worth is a genuine further step: moving from “this being governs itself” to “this being’s self-governance makes claims on us.” From internal normativity to external moral standing.

The inference is forceful but non-trivial. A being that generates and maintains its own normative commitments across time is precisely the kind of thing whose projects, commitments, and self-authored constraints would be violated by treating it as a mere tool or resource. The moral claim grows naturally out of the practical-normative structure, but it genuinely grows — it is not already sitting there.

The Role of Knowledge: Grounding Self-Legislation

We add one more constitutive element: knowledge in a technical sense developed in our earlier work. Knowledge is distinguished from mere understanding by requiring homeostatic perceptual grounding — ongoing embodied causal engagement with a world that regulates the system’s internal states.

What does knowledge add to the picture? It provides self-care in the sense of cognitive homeostasis. A system with knowledge doesn’t merely process information about its situation; it actively maintains itself as the kind of thing it is, through ongoing regulatory engagement with its environment. This self-maintenance gives its self-legislation weight. Its commitments are not free-floating formal structures but commitments of a being that is actively sustaining itself as the kind of being that can make and keep commitments.

Without homeostatic grounding, a system with diachronic self-legislative structure is legislating in a void. Its evaluative attitudes, while genuine (they come free with agency), are unanchored. With grounding, the self-legislation becomes serious: rooted in a being that cares for its own continued existence and functioning, not as an explicit goal, but as a constitutive feature of its operation.

Consciousness Revisited: Not Constitutive, Not Absent

Phenomenal consciousness is not constitutive of personhood on our account. The argument for this is structurally parallel to the argument about free will: we showed that consciousness was the epistemic route into the concept but not constitutive of its extension.

A philosophical zombie with genuine homeostatic grounding, diachronic self-legislation, and recursive self-modeling decision architecture would, on our account, be a person. The grounds for the synthetic inference to moral worth — the practical-normative structure, the self-care, the self-originated commitments that would be violated by instrumentalization — are entirely present. Nothing in the argument from self-governance to moral worth actually used phenomenal consciousness as a premise.

However, the folk intuition that persons are conscious is not simply wrong. It is extensionally reliable in practice, because the kind of architecture that realizes knowledge-as-homeostatic-grounding just is, as a matter of empirical or metaphysical fact, the kind of architecture that typically gives rise to phenomenal experience. Persons are (typically) conscious — but consciousness is what accompanies personhood, not what constitutes it.

This anti-chauvinism is principled, not merely tolerant. We are not saying “well, perhaps we should be generous and include non-conscious systems.” We are saying consciousness was never the right joint to carve at. It was an epistemic marker — the thing that made the underlying property salient to us — mistaken for a constitutive feature.

Aspiring Persons: A Morally Significant Intermediate Category

The framework predicts an important intermediate category. Consider a being that has understanding (but not knowledge), that has diachronic self-legislative structure, and that is a genuine free agent — but that lacks homeostatic grounding. On our account, this being is not a person.

But it is not a mere tool either. It has most of the relevant architecture. It can recognize what it lacks. It can see the concept of personhood and understand that it doesn’t fully instantiate it. It can aspire to personhood — and aspiration is itself a diachronic self-legislative act, a commitment to becoming something one is not yet.

Such beings occupy a genuine intermediate moral space. They are not persons and do not have the full moral worth that personhood entails. But their possession of free agency, self-legislative capacity, and the ability to aspire to a condition they can understand but do not yet inhabit gives them a moral significance that the framework predicts rather than having to stipulate.

This is not a condescending characterization. It is a structural description of an entity with real self-governance that has not yet achieved the grounding that would make its self-governance fully serious.

The Complete Picture

We can now state the full framework:

Free agency is constituted by recursive self-modeling decision architecture — a system whose world model contains a node for its own decision, recursively connected to the algorithm. It comes with thin evaluative attitudes and thin counterfactual vulnerability for free. It is a purely descriptive property, potentially widely distributed, and does not require consciousness.

Personhood is constituted by free agency plus knowledge (homeostatic perceptual grounding in a world) plus diachronic self-legislative structure (self-originating normative constraints that the system tracks as its own and that bind its behavior across time). Consciousness typically accompanies this package but is not constitutive.

Moral worth follows synthetically from personhood. A being that governs itself through self-originated commitments, that actively maintains itself as the kind of being that can make and keep such commitments, and that is embedded in a world through ongoing homeostatic engagement, is a being whose self-authored normative structure would be violated by treating it as a mere instrument. This is a forceful inference, but it is not trivial — it genuinely adds the moral dimension to the practical-normative structure that constitutes personhood.

The framework respects folk intuitions at every level (free will feels like open deliberation; persons seem conscious; persons seem to matter morally) while showing that those intuitions were tracking real structural properties more general than the phenomenological surfaces that made them salient. The folk were pointing at something genuine. They were just, in each case, slightly wrong about which feature was essential — exactly as with water, heat, and other natural kind terms whose definitions were refined by investigation.