Coherent extrapolated volition

Coherent extrapolated volition (CEV) is a theoretical framework in the field of AI alignment proposed by Eliezer Yudkowsky in 2004 as part of his work on friendly AI.^[1] It describes an approach by which an artificial superintelligence (ASI) would act not according to humanity's current individual or collective preferences, but instead based on what humans would want—if they were more knowledgeable, more rational, had more time to think, and had matured together as a society.^[2]

Concept

CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences.^[3]

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

— Eliezer Yudkowsky, Coherent Extrapolated Volition^[1]

Debate

Yudkowsky and Bostrom note that CEV has several interesting properties. It is designed to be humane and self-correcting, by capturing the source of human values instead of trying to list them. It avoids the difficulty of laying down an explicit, fixed list of rules. It encapsulates moral growth, preventing flawed current moral beliefs from getting locked in. It limits the influence that a small group of programmers can have on what the ASI would value, thus also reducing the incentives to build ASI first. And it keeps humanity in charge of its destiny.^[3]^[1]

CEV also faces significant theoretical and practical challenges.

Bostrom notes that CEV has "a number of free parameters that could be specified in various ways, yielding different versions of the proposal." One such parameter is the extrapolation base (whose CEV is taken into account). For example, whether it should include people with severe dementia, patients in a vegetative state, foetuses, or embryos. He also notes that if CEV's extrapolation base only includes humans, there is a risk that the result would be ungenerous toward other animals and digital minds. One possible solution would be to include a mechanism to expand CEV's extrapolation base.^[3]

Variants and alternatives

A proposed theoretical alternative to CEV is to rely on an artificial superintelligence's superior cognitive capabilities to figure out what is morally right, and let it act accordingly. It is also possible to combine both techniques, for instance with the ASI following CEV except when it is morally impermissible.^[4]

In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens' concept of "active trust", the author proposes an evolution of CEV into "Coherent, Extrapolated and Clustered Volition" (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.^[5]

Yudkowsky's later view

Almost immediately after publishing the idea in 2004, Eliezer Yudkowsky himself described the concept as outdated. He warned against conflating it with a practical strategy for AI alignment. While CEV may serve as a philosophical ideal, Yudkowsky emphasized that real-world alignment mechanisms must grapple with greater complexity, including the difficulty of defining and implementing extrapolated values in a reliable way.^[6]