1 Introduction

Artificial intelligence (AI) has transformed from a science fiction concept to a present-day reality with the potential of furthering human prosperity. Generative large language models, such as GPT-4, or generative image models, such a Dall-E, give a glimpse on the cognitive power that AI may yet achieve in the future. AI may one day help us tackling vital problems, such as the development of new drugs [135], and fighting climate change through the development of renewable energy technologies and the optimization of resource use [37].

As AI research is fuelled by such successes and prospects, notions of emerging consciousness in artificial systems elicit growing popularity among the public and scientists. While a former Google engineer asserted that a current artificial intelligence model, Language Model for Dialogue Application (LaMDA), is already conscious and capable of suffering [133], see also [100, 139], most artificial intelligence researchers (including at Google) firmly deny this claim, positing that we are far from achieving the creation of conscious artificial agents. Nonetheless, a clear majority of them do not rule out the possibility of artificial consciousness and go even as far as positing “sparks of general artificial intelligence” in GPT-4 [21]. In philosophy, however, questions of principle remain a subject of debate, with some scholars arguing for the multiple realizability of consciousness and others denying the very possibility of artificial consciousness (see e.g., [48, 54,55,56, 138]).

In this work we take the perspective of computational neuroscience to address some scientific, technical, and ethical aspects of this issue. The primary goal is to contribute to the ethical debate on how to deal with AI, by adding a specific computational neuroscience account to the field of consciousness research. Prominent AI researchers are warning society about the existential risks that AI poses to humanity [15, 32]. The possibility of consciousness arising in AI is also well considered in this community, including the danger of over- or under-attributing it to AI [24]. From the point of view of neuroscience, in turn, arguments on the unreached complexity of biological consciousness prevail [9, 110]. Here, we argue that despite this biological complexity, some forms of consciousness might be possible, if functional and observable criteria of consciousness are satisfied. In view of ethical questions, such as how many rights and protection artificial agents with these forms of consciousness should be granted in comparison to humans [45], we need guiding criteria for awareness and consciousness. To start with a simple functional question, we first ask how agents can learn to distinguish between themselves and the environment, and next, how they can learn to perceive themselves and the environment. Perception involves an additional instance that internally represents the “who” that looks at the content (a discriminator), and the “from where” the content is generated (from inside or outside).

Bearing in mind the ethical dimension, we distinguish between a sensory and affective component of perception, with the affective component referring to the engagement of the whole organism in processing information, originating in survival reactions to threat. Affective (or valenced) experiences would help artificial agents to align with human values and develop, for instance, empathy for humans and among themselves as a basis for respectful interactions.

1.1 Functional correlates of consciousness

A central notion we introduce are functional correlates of consciousness. So far, neuroscientific theories of consciousness directly try to identify the neural correlates [129]. Yet, since multiple brain areas are involved in representing consciousness, it may help to first structure their putative contributions in terms of functions. A functional characterization of the areas, in turn, requires an idea of how consciousness itself can be subdivided in functional sub-modules, and how they map to neuronal correlates.

Functional correlates go beyond indicators of consciousness [108]. An indicator, for instance, can be the degree of information complexity in the activity traces of a conscious brain [26], or the recurrent processing itself [82], without necessarily specifying their function in the context of consciousness. The recent advances in AI push computational functionalism into the foreground, claiming that with the construction of a sufficiently elaborate system that performs certain kind of computations, the phenomenon of consciousness could emerge [24]. Following this hypothesis, we suggest a specific type of computation underlying the formation of awareness and eventually of consciousness. This involves learning to discriminate internally from externally generated activity in sensory areas, while representing the reality judgement in a dedicated neuronal population. We postulate that the assignment of the reality-label by this neuronal population ultimately grounds conscious sensory experience.

This leads us to re-evaluate the classical thought experiment of replacing each neuron within a human brain by an artificial counterpart, resulting in the conundrum that the artificial brain should be capable of expressing consciousness (e.g., [59, 96, 121]). In the spirit of searching for a functional correlate, we expand the thought experiment by considering a neuromorphic implant into the brain of a human infant suffering from a cortical disease, so that the normal motor, cognitive and perceptual functions can develop through a co-evolving chip, including awareness and consciousness. We call this chip a co-evolving neuromorphic twin (enTwin).

The neuromorphic blueprint for artificial consciousness allows us to propose an extension of the well-known Turing test for artificial intelligence, a test that has been superseded by recent developments in the field of AI. To go beyond previous proposals, we introduce a specific model of the neural and functional correlates of consciousness in biological brains, which we call the Conductor Model of Consciousness (CMoC), introducing an analogy between a meta-instance governing the information flow in the brain and the conductor of an orchestra. The conductor in this model represents a neuronal structure that gates cortical activity triggered from outside and inside the brain. It helps the developing subject to learn to distinguish between externally and self-generated mental constructs, and develop a notion of perception, sensory perception and proprio-perception, including awareness. The conductor is merely a distributed population of neurons involved in “teaching” the discrimination network. Although it may resemble the classical homunculus, it enters here in a pure mechanistic way as a class of neurons taking over specific functions within a developing network.

1.2 From creating artificial consciousness to the ethical dilemma

If conscious artificial agents develop cognitive abilities that rival or even surpass those of humans, paired with a form of consciousness, it becomes inevitable to consider granting them legal and political rights (e.g., [45, 62, 95, 111]). Doing so may result in instances where the rights of an artificial agent conflict with those of a human being. Such situations will pose complex ethical dilemmas, particularly when it becomes necessary to consider the potential prioritization of AI rights over those of a human. Additionally, if we equip machines with some form of consciousness, it becomes unavoidable to consider that such agents will be potentially able to experience pain and suffering (e.g., [3]). Such AI suffering would give ground to moral conflicts [94].

Introducing the enTwin and the CMoC gives us a handle for a new perspective on these ethical dilemmas. We consider the possible down-regulation or prevention of negative affective states (such as pain) in artificial agents, while still allowing the experience of positive ones and the possibility of empathy. As we argue, this ensures that creating possibly sentient artificial agents will not result in an unbounded increase in global suffering, and that there is no one-to-one competition between the moral rights of humans and the machines we create. Figure 1 captures schematically the flow of arguments in this work.

Fig. 1
figure 1

Line of arguments on the possibility of artificial consciousness and how to deal with it. a The example of the evolving neuromorphic Twin (enTwin) shows the difficulties to exclude any form of consciousness. b To test for consciousness, we suggest an extended Turing Test that requires the identification of specific neuronal architectures described by the Conductor Model of Consciousness (CMoC) as functional and neuronal correlates of awareness (e.g., of somatic and affective components of pain). c Should a putative artificial consciousness share all features of human consciousness? The stronger the alignment, the better the expected mutual understanding is, but also the more competition between the artificial and human species is expected. This Alignment Dilemma could be approached by what we introduce as the Human-AI Deal: it relieves the conscious artificial agents from the affective component of pain, but gives humans the priority before the law, allowing agents to negotiate for more rights with benevolent behaviour

1.3 Phenomenal consciousness

The terms “consciousness” is widely contested [40, 138], and in this paper the term will be concerned with phenomenal consciousness, which is considered a form of state-consciousness—i.e., a property attributed to certain mental states. If it “feels like something” [98] to be in certain mental state, this state is considered as being phenomenally conscious [25]. Therefore, phenomenal consciousness is often described as the subjective aspect of consciousness that involves experience [18, 31]. We consider an internal state of a system as a mental state if it is intentional with respect to a representation (of a certain state of affairs) that is available in that system. That is, if an internal state "is about”, or “refers to” another object, it is a mental state.

An agent is understood as a system that acts with an intention (imposed internally or externally) upon its environment, e.g., a bee that collects honey, a robot that performs a task in a car factory, or a personalized large language model that suggests email replies in your spirit and style. However, a flood that damages a road, or a black hole that swallows a star are not considered agents. In other words, we understand agents as systems that exhibit goal-directed behaviour. That is, they can formulate or represent a basic reasoning of what their goals are, and by which actions they can be reached [13, 107].

Further, we distinguish between sensory and affective aspects of phenomenal consciousness. The sensory aspect refers to the subjective experience of sensory stimuli, such as sight, sound, touch, taste, and smell. E.g., what does it feel like to be in a mental state that refers to a red object? The affective aspect, on the other hand, refers to the subjective experience of emotions, feelings, and moods. E.g., what does the mental state that refers to pain in your toes feel like?

2 Artificial consciousness at the dawn of the neuromorphic era

2.1 Re-evaluating criticisms against artificial consciousness

Sceptics with respect to artificial consciousness point out that the analogies between digital computers and human brains have many breaking points. Among the things that distinguish machines from (healthy) brains, the following appear to be the most relevant: (i) the lack of embodiment, i.e., of participation in the physical world, (ii) the lack of a centralised “I”, (iii) the lack of evolutionary pressures and feedback, (iv) being based on different physical substrates that behave differently, (v) the use of the von Neumann architecture, and (vi) the digital representation of information. Authors such as Edelman and Tononi [48, 54, 55] have argued that these qualitative differences between the inner workings of computers and brains speak against the very possibility of emergence of artificial consciousness.

At first glance, many of these criticisms seem reasonable. If we consider embodiment as a relevant aspect of conscious experience, it is the case that most computers do not have ways to affect their environment to have a better grasp of it, both in the sense of perceiving physical space or of literally grasping physical objects, and associated notions such as causality. Although robots are reaching the degree of development where this point becomes moot, a large part of the discussion on AI is happening at the level of AI software, where the criticism seems appropriate. Such a lack of embodiment constrains the possibilities to develop self-awareness, as there is no clear separation between an “I” and the world, and the interactions between the machine and the world are ultimately initiated by the machine users (humans). With respect to the architecture, in stark contrast to how the brain works, the usual von Neumann architecture divides information processing between a central processing unit and an external memory, a distinction that may preclude synergies between memory and processing. The difference between “hardware” and “software” is much less clear-cut in the case of biological brains, where brain activity is known to change the strength of the connections between neurons, a key element of both the hardware and software. Likewise, brains do not appear to act like digital machines that run programs sequentially. There is massive parallelism of information processing in the brain, and the architecture of neurons is very different from digital technology.

The lack of a centralized “I” for artificial agents motivates some of the ideas that we present further down. The basic issue is that, even for contemporary robots which have an inner representation of their state within the environment, it is difficult to argue for the presence of a sense of self analogous to the one that humans and some other animals have. Even if there is a higher order module overviewing the state of the system, there is no warranty that this module will have a notion of “self”. This has led critics of AI to point out what seems a vicious regress according to which, to have a sense of self, we need a subset of the artificial brain to already possess such a sense [142]. In neuroscience, it is increasingly clear that the “I” is constructed from multiple, intertwined notions of the self, including body ownership, use of efference copy to distinguish self-induced versus externally caused sensory changes, multisensory integration, agency, episodic life history and social identity (see e.g., [93, 106]).

If it were only about the “I”, we could replicate what natural selection processes brought about in humans (and probably behaviourally evolved animals). For example, we could equip future machines with a “self” module, that overlooks and to a degree controls its own functioning, while having a pre-wired notion of it being itself. Below (Sect. 3), we relate this to a “conductor” module that is arguably available in human brains and might also be implemented in artificial machines. We argue that such a module (or set of modules) has appeared at some point during the phylogenetic, as well as ontogenetic development of our brains [67, 87]. Reengineering these fruits of evolution would turn the problem of a centralized “I” into a technological one, and no longer a matter of principle, in the sense that we have a blueprint (the conductor model) for its implementation. How the artificial “I” will “feel like” for the agent remains up for debate.

This relates to claims that evolution is a prerequisite for the development of consciousness. Our brains and our consciousness are the result of millions of years of evolution—a complex process featuring a plenitude of feedback loops of interactions between our ancestors and their environment. Artificial agents do not undergo such processes, but nothing impedes us from designing these systems as if they were the result of evolution. We could create these systems as if they had an evolutionary history. This retroactively embedded history could in fact be our own history as a species that evolved consciousness, including the embodied traces of evolutionary processes—the kinds of limbs best adapted to bipedal locomotion, for example, or the “innate” sense of self or centralized “I”.

2.2 Architecture and the substrate problem

Criticism regarding architecture and substrate can be addressed by turning to recent advances in neuromorphic hardware. Neuromorphic engineering aims to build hardware that mimics the brain to harness its extreme parallelism and asynchronous nature for power efficiency and computing speed [5, 68, 89, 117, 119]. This multidisciplinary area of research takes direct inspiration from the structure and operations of the brain and its basic units, to develop new kinds of hardware. The implementation of neuromorphic computing on the hardware level can be realized by a wide diversity of substrates, such as transistors, memristors, spintronic memories, threshold switches, among others.

So far, work on neuromorphic designs has focussed on replicating the analogue nature of biological computation and in emulating the spike-based information exchange between neurons that occurs in the brain. Nowadays, neuromorphic chips are not fully analogue, but an increasing portion of their subcomponent are (see, e.g., [105]), and the aim of fully analogue chips seems attainable. Additionally, there is a line of research on implementing this hardware on flexible arrays and flexible chips that can be implanted within biological tissues [74] and to be effectively scalable [35]. On top of this, a lot of effort has been invested in emulating learning and memory using the plasticity of the synaptic weights between different neurons, emulating biological brains. Interestingly, at least in the existing silicon-based neuromorphic hardware, these model neurons have the capacity to operate orders of magnitude faster than their biological counterparts [17, 61], something that will be relevant in Sect. 4 below.

Thus, neuromorphic hardware offers compelling solutions to the traditional objections concerning substrate dependency [48, 54, 55]. In these criticisms, it is not even possible to functionally replace a single neuron (let alone a brain) with an artificial counterpart, as the behaviour of carbon-based analogue neurons is too different from that of silicon-based chips. The next generation of flexible carbon neuromorphic substrates [46, 143, 144] could be moulded to emulate biological neurons to a degree that makes it very difficult to sustain any principled opposition to artificial neurons—or brains. In summary, recent techniques and developments have closed the door to most of the arguments against a principled impossibility of artificial consciousness. While the classical von Neumann computer architecture is arguably inadequate for emulating consciousness, this architecture is no longer the only game in town. In what follows we develop these ideas in detail.

2.3 A co-evolving neuromorphic twin

A very popular way to explore different scenarios for AI and consciousness is by means of thought experiments [20]. Thought experiments are very popular in this domain and have introduced us to notions such as philosophical “zombies” [76], Chinese rooms [121], and Mary’s lockdown room [69]. Our purpose is not to present a novel thought experiment. We instead ground existing thought experiments dealing with the feasibility of systems that closely emulate the human brain in many aspects that are relevant for consciousness, answering the criticisms to artificial consciousness sketched in the previous section. These experiments suggest that consciousness could be realized in various substrates, provided the functionality of its constituent parts, such as neurons, is preserved—most famously, simply replacing each biological neuron in a brain with an artificial counterpart, as we describe in more detail below (see [28, 29, 96, 121]).

The classical neural replacement scenario of Morowitz [96] has been deemed implausible by authors such as [54,55,56, 114] because of the substrate problem. If computer chips are radically different from neurons, then the very premise of them supplanting neurons on a one-by-one basis in a human brain is flawed, because not even the first neuron can be faithfully replaced. To overcome these criticisms, we propose a neuromorphic version of the scenario, grounded in current neuroscience, and trying to be as concrete and detailed as possible. In other words, we present a revised version of the thought experiment, viewed through the lens of neuromorphic engineering. We call it the evolving neuromorphic twin (enTwin), a specific implementation which is realistic considering present day technology and complements the more abstract philosophical insights about neural replacement scenarios.

Assume a human baby is born with a cerebral ataxia syndrome that is linked to cortical degeneration [36], resulting in motor disabilities including articulations and speech. Assume it is possible to help the child with an evolving neuromorphic twin (see Fig. 2). The enTwin is implemented in soft bioelectronic interfaces that can be implanted in human bodies [90], and even in human brains [142]. The enTwin is fed by tactile and proprioceptive information at the extremities, and by an electrography of the speech muscles. To prospectively assist speech formation, it is also supplied by visual and auditory information through latest-generation smart glasses and active ear plugs [14]. For motor and speech assistance it is coupled with muscle stimulation devices. The hypothetical chip is built on flexible neuromorphic arrays with learnable synaptic connectivity and a neuromorphic architecture as outlined below. Blood sugar is measured to modulate the energy supply of the chip, which itself is implemented using neuromorphic technology.

Fig. 2
figure 2

The evolving neuromorphic twin (enTwin) thought experiment. a A neuromorphic chip, able to communicate with biological neurons, is used to help an infant to achieve normal sensorimotor functions. The neuromorphic chip can be implanted inside the body and brain of a human and learns to grow and adjust its synapses in the same way as biological neurons do. These chips are built of soft bioelectronic interfaces. b With time, the neuromorphic chips become trained to achieve higher order functions, including functions that pertain to conscious experiences, such as the perception of sensations, and the associated feelings that they invoke. c By training many such systems and integrated from different human patients, a fully artificial brain can be constructed and embedded within an artificial body. If every piece of such a brain can collaborate in the genesis of conscious experience of human patients, then there is no reason why the fully artificial enTwin would not also be able to develop consciousness.

The chip interprets the sensory information and the host’s internal state online, and with this drives the language module with a functionality comparable to LaMDA or GPT-4, together with various motor modules. The modules learn to decipher, recreate and represent the intended articulation and motor activity of the growing individual and are guiding and supporting them in improving both articulation and motor execution. For performance and survivability reasons, the enTwin could also try to predict and recreate the (representation of) feelings of its host [58], interfaced with the corresponding brain regions. The representation of the postulated subject’s feelings in the neuromorphic hardware offer an analogue of: (i) the amygdala, anterior cingulum, orbitofrontal cortex, insular cortex, central thalamus among other regions, to process the various components of feelings such as pain, (ii) the sensory and motor cortices to represent the sensorimotor transforms, and (iii) the Wernicke and Broca’s area to represent language understanding and articulation, (iv) thalamic and brainstem kernels to represent different wakefulness states [57, 102]. Consequently, an enTwin would mimic the one example we know where consciousness developed (humans and other mammals).

Once this integrated enTwin is working within a host, its information could be copied to a database, helping to design an enTwin embedded within an artificial body—a neuromorphic robot. This robot would be an embodied entity, with components that are (externally) evolved. Thereby, “evolution” stands for many things simultaneously: evolution in the sense that its brain will be the result of co-evolution with human hosts, evolution in that we are copying the results of biological evolutionary history within both artificial brain and body, and of course evolution as the result of iterative technological improvements on things like sensors, limb articulations, materials, and so on.

The timeline for the development of such neuromorphic robots is unknown, as various uncertainties remain, including the ethical question of how far medical aids should interfere with our organs, and specifically with the brain. Nonetheless, it would be hard to argue against their feasibility, just by looking at the state of contemporary neuromorphic research. As such, enTwins flesh out many of the intuitions behind previous thought experiments about artificial consciousness.

3 The conductor model of consciousness (CMoC)

To judge the possibility of a consciousness counterpart in our enTwin, and to infer possible criteria for neural correlates of consciousness, it is helpful to focus on some key ingredients our enTwin is likely composed of. As opposed to existing neuronal theories of consciousness (for reviews see [120], or [129]), the conductor model we propose focusses on network architectures and their functional interpretations that, as we argue, are likely involved in producing phenomenal consciousness.

Given the reality monitoring areas in the brain that judge whether activity in sensory areas is generated from inside or originating from outside [124], we argue that the brain contains the crucial ingredients to implement a form of Generative Adversarial Networks (GANs, [60]). GANs have proven to be cornerstones of powerful network architectures for image recognition, language processing, and translations of image to language [4]. Likely, generative networks are implicated in mental imagery, and discriminative networks must then exist that tell apart imagined sensory activity from externally induced sensory activity. These networks need to be trained, and it is reasonable to assume similar plasticity mechanisms being involved as in the technical version of GANs.

GANs include separated networks, starting with a generative network G that internally generates fake sensory information, an encoding network E that interprets sensory activity (regardless of being triggered externally or generated internally), and a discriminative network D that judges whether a particular sensory activity is produced internally or externally (Fig. 3). In addition, we postulate a conductor network that orchestrates the information flow between G, E and D, and the type of synaptic plasticity within these networks (plasticity on, off, or inverted, see [44]). Based on the feedback from the discriminator network (that may reveal the fake/imagined nature of the sensory representation), the generative network can improve itself to produce a more realistic sensory activity. Additionally, when the sensory activity is internally produced by the generative network, the encoding network can learn to reproduce this activity. It has been postulated that some forms of GANs are implemented in the human brain [58] and support creative dreaming during rapid eye movement sleep (REM) sleep [44].

Fig. 3
figure 3

The Conductor Model of Consciousness (CMoC): The implementation of elaborate forms of learning requires a network instance that organizes the flow of information to keep teacher and student signals apart. Possible ingredients for consciousness to evolve in our enTwin. An encoding (E) and generative (G) network, together with a discriminator network (D) that judges whether the sensory activity is originated from outside (External) or inside (Internal), just as in GANs. The faded background represents the neural circuitry. A conductor network (C) selects the contents of the encoding and generative networks that matches and broadcasts this for further processing

The Conductor Model of Consciousness (CMoC) emphasizes the orchestration of the information flow between encoding network, generative network, discriminator network, and their training (Fig. 3). Learning is about improving a behaviour, and the desired activity is implicitly or explicitly declared as activity to be reproduced. The conductor model makes the distinction between a teacher and student signal explicit by postulating a network instance (the conductor network C) that gates the information flow for teacher and student signals to adapt the student signal. This structure is also present in self-supervised learning, where the teacher is formed by other, more informed parts of the brain that “nudge” the student network [137]. Reality monitoring areas [124] are part of the cortical GANs as suggested in Gershman [58] and may form the teacher instance for the discriminator network. The postulate is that implementing powerful forms of self-supervised learning (such as GANs) in autonomously running networks requires a conductor submodule that is a precursor of a consciousness-enabling network.

GANs intrinsically require a meta-level conductor that orchestrates the information flow. The conductor signals whether the GAN is in the inference or learning mode, and provides the information used for learning whether the activity represented in some higher cortical state is generated from internal sources or external stimuli. Such a conductor must itself be implemented in a submodule of the brain, and it can act on a hierarchy of cortical representations. Architecturally, this role of a conductor resembles the functionality of prefrontal and anterior cingulate areas [124], but it may also be taken over by the gating mechanisms of cortico-thalamic loops via higher order thalamic kernels [131, 141], as elaborated below. When acting on the visual stream, the conductor may signal “this activity represents a certain visual object and is generated from inside”, for instance. When acting on more abstract object representations like our own identity, the conductor may signal: “this activity represents myself and is generated from inside” (see Fig. 3).

3.1 The CMoC extends the Helmholtz view of perception by creative processing

Originating from a computational model to improve the cortical representation of sensory signals [43], the conductor module can be seen as an evolutionary product of actively generating synthetic sensory activity and discriminating this against real sensory input. Awareness, in this view, arises as by-product, sparking off from the need of a meta-level structure that teaches the distinction between different states of sensory activity. Figure 4 shows a progression of ideas on perception and awareness. Helmholz’s insights on the nature of perception (see, e.g., [64] [original work published 1867]) have served as a guiding light for neuroscience, and they stay relevant to this day [27]. The modern theory of predictive coding, including the ideas of [33, 34, 52, 127, 128] can be seen as refinements and extensions of Helmholtz’s active sensing and inference. The conductor model takes these building blocks and adds additional structure in the form of a GAN architecture and the conductor module. The encoder and generator (E, G) are part of the formalization captured by the Helmholtz machine [39], extending the unidirectional flow from the objects to representation by a generator from the representation to sensory activity. Helmholtz machines are able to extract semantic structures in sensory data by trying to recreate sensory data from the internal representation. The CMoC also does this, but adds specific structures that emulate the way perception and awareness work phenomenologically (Fig. 4c, d). Through the discriminator and the adversarial learning the generator is able to creatively produce new sensory activities that potentially integrates in the reality, a procedure that we call creative coding to emphasis the step beyond predictive coding. With creative coding (or more general creative processing) comes the necessity for additional metastructures in the brain that may produce phenoma akin to awareness [44].

Fig. 4
figure 4

The Conductor Model of Consciousness (CMoC) extends the functional catalogue for predictive coding and perception by a module for creative coding and awareness. a The intuition that perception emerges (symbolized by a homunculus) form processing sensory input at the end of an encoding pathway (E). b Helmholtz (building upon ideas of Kant) argued that perception is the product of an active inference that involves predictions of what is expected to be sensed (the generator G), even when we are unconscious of this inference process. Predictive processing, with the representation of the self (sketch of human) that monitors the outcome of its own actions, and prediction error broadcasted to a global neuronal workspace, remains the key ingredient of current theories of consciousness (e.g., [33, 34, 91, 126, 129]). c The CMoC adds an adversarial architecture to the previously suggested hierarchy of active inference, with an additional function of a conductor that orchestrates the adversarial learning and creative—as opposed to only predictive—processing. d The Conductor represents a functional correlate of awareness, emphasizing the structural mapping from the CMoC to the phenomenal consciousness (here restricted to awareness). The specific functions within the CMoC generate predictions on a possible neural or functional correlate of consciousness/awareness.

Functional correlates of awareness within the CMoC can be drawn on multiple levels, captured by a mapping of conductor properties to awareness properties (Fig. 4d): (i) the conductor signalling the discriminator whether sensory activity has to be judged as real or imagined, versus awareness telling the subject to experience sensory activity as real or imagined; (ii) the conductor acting as a teacher for the discriminator network D, versus awareness directing the learning to specific contents. (iii) The conductor modulating plasticity depending on whether sensory activity is produced from inside or outside, versus awareness informing the agent about the sensory activity being real or imagined, and finally (iv) the conductor selecting contents from the different levels of the encoder hierarchy to be learned as real versus imagined (including sensation, and to global states such as affects or the self). The analogy between the conductor and awareness can be extended to distinguish the various states of awareness, namely (a) wakefulness, (b) sleep, (c) dreaming and (d) lucid dreaming [44]. While a match between the conductor and discriminator (C↔D) indicates state-awareness, a match between the generator and encoder (G↔E) indicates content-awareness (see the neuronal implementation below).

3.2 The conductor as a teacher to learn organizing the inner world of autonomous agents

With the functional and neuronal criteria of the CMoC, we can render the architectural constraints of implementing phenomenal consciousness in a more precise way. In line with other suggestions (e.g. [30]), we postulate that consciousness introduces its own quality of existence, that is neither physical, nor abstract, but uniquely experienced by the agent to whom the quality is assigned to. The conditions for this private quality of “consciousness” to appear in an agent are given according to the Conductor Model of Consciousness by the following 3 requirements: A conscious agent, that is capable to sense and interact with the external world,

(CMoC-1) has a representation of the external world (the encoder network), a representation of an inner world (the generative network) and can act on both the external and internal world representations (e.g. via discriminative networks), beside acting on the external world itself.

(CMoC-2) has a mechanism—the conductor—that allows to tell whether the agent acts on the internal or external world representation.

(CMoC-3) is equipped with its own internal sense of self associated to the conductor, modulated by global affective components.

Notice that CMoC-1 introduces a representation for the inner world in addition to the representation of the external world. One may argue that an internal world is a model of the external world. However, the internal world of a conscious agent is different from a mere internal representation of the external world. For example, body interoception (the capacity to sense the internal state of the body) can be seen as part of the internal world that goes beyond an external world model. What we posit here is that consciousness requires more structure within an internal world model than only serving as a model of the external world. It is the distinction between a world (internal or external) and a world model that yields an additional level of abstraction in terms of meta-information: beside informing the content of the internal or external model, respectively, the addition also signals “who” generates this content, and how it should be processed. This is the function assigned to the conductor network, an instance on top of the internal model of the outer world.

Apart from the interface at the sensory areas, the adversarial learning mechanism to create the inner world also goes beyond latent representations of the outer world and surfing in the inner world [33, 34]. In humans, it may be the factor driving genuine innovations in culture and other areas that are not only novel, but also useful in that they integrate into the existing world– a feature that was internally tested by the discriminator in the CMoC. Consciousness is not only about enabling an active sensing of the environment by means of actions and predictions, and not only about creating self-awareness. It is in the first place about offering a neuronal and functional infrastructure for learning to disentangle inside from outside triggered brain activity, while at the same time trying to match these activities. We postulate that the adversarial learning to create an inner world model (potentially including interoception) along these principles does equally exist for other animals. Adversarial learning comes along with a conductor module that labels information related to the internal and external world and governs the information flow between the representations of these worlds (CMoC-2). The conductor can also select and prioritize some sensory information over others and has the power to impose a state of emergency (CMoC-3). The module might use a short-cut circuit to avoid harm that we can associate with experiences such as sensing pain.

Mental scenarios we can think of may never be executed in the external world, and in the internal world (i.e., an imagined world) we can generate new scenarios that so far have never existed in the external world (nor in its representation). An internal world can be richer in possibilities and structure than the external world. This richness goes beyond the mere ability of mind wandering and counterfactual reasoning that include one’s own actions (see e.g. [53]). The CMoC introduces the neuronal learning apparatus that then allows for mind wandering and the like. Providing the substrate for learning the meta-structures is the crucial addition here. The hypothesis underlying the CMoC is that with the learning apparatus for the state-distinctions, that includes providing a learning signal for differentiating “real” from “imagined”, also comes a novel experience for the agent to become aware of “real” or “imagined”. The teaching signal of the conductor is more effectful if it is dominant, which is particularly important when learning the state of life-threatening affects.

3.3 Consciousness as conductor-mediated private experience emerging from functionality

The cortical conductor allows us to further circle on the question of phenomenal consciousness. The conductor that overlooks and gates the various information streams is, on the materialist level, a network with global hub properties—a sub-module in the network that integrates information from the whole. This conductor module is not identified with the agent itself that may have an additional embodiment and is neither identified with the representation of the self. The self may be placed at the top of the encoder hierarchy (E), out of which actions are generated (G). The conductor C instead is a meta-instance that organizes the information flow, including the information flow from and towards the representation of the self. It plays a central role for its owner, the agent, and may ground higher order self-awareness. The conductor may manifest as a very private sense that represents a kind of sensory modality for the owner’s inner world, be it the awareness of a stimulus, or the awareness of the self. The conductor-mediated inner sense only emerges and exists within this individual, is not accessible from the external world, and in fact disappears when seen from the external physical world.

To provide another structural analogy showing how an additional ontological dimension may emerge within an inner world, we look into the mathematics of numbers. At some point in history of mathematics the imaginary unit i=\(\sqrt{-1}\) “emerged”. Within the world of real numbers, i does not exist as there is no real number (x) with square (x2) equal to −1. From the perspective of the ontology of real numbers, i adds a new dimension of being (an “imaginary existence”, in analogy to the “quale”), attached “privately” to i, and not shared by the other real numbers. We can omit the ontological question of i, while still describing its “phenomenology”. The imaginary unit satisfies i2=-1, and this is the only relationship required to build a theory of complex numbers. The ontological dimension of i dissolves within the larger embedding space of complex numbers, where both real and imaginary numbers are simply elements of the wider set of complex numbers, mathematically characterized as a field. To apply this analogy to our problem: what i is in the world of real numbers, is consciousness in the world of physics. Neither exists in its world: i does not exist as a real number, and consciousness does not exist as a physical object. But both help to expand and complete their respective worlds. Extending the real numbers by i makes them complete in the sense that now all algebraic equations (like x2=-1) have a solution. The imaginary and real numbers are both independent dimensions of complex numbers. Extending the physical world by consciousness could make this “complete” as well, with physics and consciousness as independent dimensions.

The main point of the analogy is that the “imaginary existence” emerges from pure functionality, here the functionality of solving algebraic equations. It is a private feature of imaginary numbers as opposed to real numbers but disappears in the wider and abstract perspective of complex numbers. Likewise, consciousness may emerge from the functionality of learning to internally produce sensory activity as close as possible to the one externally produced.

3.4 The neuronal implementation of the CMoC includes state- and content-awareness

We next show how the CMoC can be implemented in neuronal structures. CMoC postulates the existence of an encoding, generative, and discriminator network, together with a conductor module that orchestrates the information flow among them, turns on and off plasticity, and determines which information should be considered as real or imagined. In humans and animals this conductor is active both during wakefulness and sleep. The conductor network is a prerequisite to train the generative network, e.g., during REM sleep through adversarial dreaming [44]. During an adversarial REM dream, conductor neurons adversarially tell the discriminator neurons what they see is “real”, giving the dreamer the incorrect feeling of experiencing reality. The functional reason for the mistaken reality felling is to test the dream against reality. When dreaming of an approaching lion, we should not learn to go caressing him, but instead learn to hide on a save place. Technically, when imposing the reality-target to the discriminator that itself would have demasked the sensory activity as dreamed, an error signal is produced at the output of the discriminator network. This error is backpropagated to the generator network, telling this where to improve the generated sensory activity so that the discriminator the next time in fact will judge it as real. The error-backpropagation itself can be implemented in neuronal terms [122]. Hence, the adversarial teaching is hijacking the error-backpropagation circuitry of the discriminator to provide the generator a helpful indication how to generate more reality-like activity.

In order to also allow the discriminator to improve its job of correctly telling internally or externally induced activity apart, its plasticity needs to be inverted while being given the adversarial target “real” during the dream (Fig. 4d). Plasticity of the generator network in the REM dream keeps its original sign so that it can in fact correct for the error delivered by the discriminator. The encoder may turn its plasticity off as it is not provided helpful information. During wakefulness it is the other way round: plasticity in the generator is turned off, but plasticity in the encoder is turned on [44].

The neurons in the conductor module represent the meta-state about how sensory activity should be interpreted by the network (“real” or “imagined”) and are thus candidates to also mediate becoming aware of the “real” or “imagined” state. This state-awareness in the model is triggered by a match between the conductor signal C and the discriminator signal D within a L5 pyramidal neurons (C↔D), thought to elicit a calcium spike in their apical dendrites (Fig. 5, b1, see also [131]).

Fig. 5
figure 5

Circuit criteria of the extended Turing test (eTT1-3) for consciousness. a An encoding (E) and generative (G) network. Here, E encodes a partially occluded image x in a higher cortical area, E(x), out of which the generative network produces a non-occluded version, G(E(x)). Simulations performed by N. Deperrois based on the model in Deperrois et al. [43]. “re/im?” stands for “real/imagined?”. b1 A discriminative network (D, red) that may convey the awareness of the stimulus. Here, the conductor C teaches the discriminator whether cortical activity should be considered as real (i.e., from the external world) or imagined, i.e., generated from the internal world via G (and then E). Layer 5 pyramidal neurons at the top area represent meta-information such as “a real image” (state-awareness), and at lower areas, such as V1, may signal ‘an edge’ (content-awareness). The conscious percept can be modelled as apical gain modulation of the basal input to the L5 pyramidal neurons (red) from the E network, and the local attentional signal from the G network. b2 Examples of noisy images (x, green squares) and G(E(x)), the cleaned-up version of x after turning through the central areas up with E and down with G. Both activity streams, the encoder E and generator G, do not enter our consciousness. What becomes conscious is the product x \(\cdot\) G(E(x), images in red squares, forming the attention-modulated input and being represented by a specific class of layer 5 pyramidal neurons (red, in b1, see also [8, 131]). c The cortical conductor gates the information flow of the conscious stream (red), whatever is represented in there (sensory or affective components, the self), acting also as a `door keeper’ for a global workspace of consciousness. The affective pain component within the global workspace captures an “existential threat”. A strong apical drive from the conductor makes the reality impression from the corresponding input dominant (“the input is absolutely real”), suppressing other inputs from awareness and potentially triggering a “survival response” (red flash).

While discriminator neurons represent state-awareness, other layer 5 pyramidal neurons of the visual stream are representing the content-awareness. These are the neurons that detect a match between the top-down expectation produced by the generative network, and the bottom-up drive produced by the encoding network via dendritic calcium spikes (G↔E, see Fig. 5, b1). In the primary visual cortex (V1), for instance, the generator may predict an edge that is also present in the image, and hence a calcium spike in the corresponding edge-detecting neuron is elicited. In a higher visual area that responds to faces, for instance [2], the generator may predict my face when I look into the mirror, and a neuron responding to my face will elicit a dendritic calcium spike as it sees my face. This neuronal implementation of the CMoC works just as postulated in dendritic integration theory (DIT, [7, 8]), just that the CMoC also allows for becoming aware of the state to be real or imagined, beside becoming aware of the content.

Figure 5b shows examples of images presented to the sensory area (x, green), what is expected to be seen after the unconscious inference step (G(E(x)), blue, what the Helmholtz-model would tell, see also Fig. 4b), and what the L5 pyramidal neurons make us aware to see in the CMoC (x⋅G(E(x)), red). These L5 pyramidal neurons receive the forward input x on their basal dendrites, while the top-down input G(E(x)) projects to the apical dendrites. The somatic activity represents the gain-modulated sensory input x, multiplicatively modulated by the top-down input (as experimentally described, see [84]).

A candidate for the conductor population is the anterior prefrontal cortex and anterior cingulate cortex (PFC in Fig. 5b1) that is known to be involved in reality monitoring [124], perhaps jointly with the gating mechanisms via higher-order thalamic kernels [131]. The encoder and generative network of the CMoC are themselves postulated to represent unconscious information only, potentially flowing through layer 2/3 pyramidal neurons. A global modulatory network, connected with the conductor network and possibly acting through the release of acetylcholine [73], may push some content into consciousness by facilitating dendritic calcium spikes (Fig. 5c, see [140]). These contents represent sensory features when referring to sensory areas, but they may also represent higher order features such as affects or the self when referring to other cortical regions such as the prefrontal or cingulate cortex [22].

3.5 Relation of the CMoC to other theories of consciousness (ToCs)

Following Seth and Bayne [120], we can divide theories of consciousness (ToCs) among four broad classes (see also [24]): higher order theories, in which a mental state is conscious in virtue of being the target of a certain kind of meta-representational state; global workspace theories, which stipulate that conscious states are those that are “globally available” to a wide range of cognitive processes such as attention, memory and verbal report; information integration theory, which tries to axiomatize consciousness based solely on the statistical notion of information and complexity; and predictive processing, which serves as a general framework in which consciousness can be embedded, the idea being that the brain performs Bayesian inference through the comparison between the top-down perceptual predictions and the bottom-up prediction errors.

We briefly comment how the CMoC relates to these classes of ToCs. The connection with higher order ToCs [50, 58, 85, 86] is very direct, as the conductor module works as a higher order structure and instantiates meta-representations. At the same time, by eliciting the transition into consciousness, the conductor and its associated modulatory network “ignite” consciousness when becoming jointly active, as described by the global neuronal workspace theory [12, 41], Dehaene and Changeux [42, 91].

An active conductor gates the recurrent processing, and likely modulates the complexity of the neuronal activity patterns during consciousness. The recurrent processing between the encoder and generator of the CMoC directly relates to recurrent processing theory of consciousness [83, 129]. It also relates to integrated information theory (IIT, [92, 134]) that exploits the recurrencies to generate complexities in brain activity, leading to clinical measures of the levels of consciousness [26]. Yet, our approach does not build on an abstract notion of information, although there is of course information flow in the encoder, generator and discriminator network, and in the conductor module. Instead, the CMoC focusses on function and content, and their organization across hierarchies. The use of generative models connects the CMoC with predictive processing theories [33, 34, 65]. In fact, the principles behind the CMoC stem from studies of predictive processing within neurons [75, 84, 122, 137]. The specific implementation of the CMoC in the neuronal circuitries closely follows the ideas of dendritic integration theory (DIT, [8]) and may be seen as an extension of DIT to include state-awareness beside content-awareness (see Fig. 6 and [44]).

Fig. 6
figure 6

The extended Turing test (eTT). A list of criteria to be satisfied is indicative for the presence of some form of consciousness. The list extends the items of the classical Turing test for intelligence. It requires the observer to enter the “Chinese room”, open the box and identify the postulated neuronal circuits for consciousness. That is, on top of the usual behavioural Turing test, where we examine the macroscopic behaviour and responses of an agent to our inquiries, we propose to add a second “microscopic” layer. The idea is to examine the explicit architecture of the neuromorphic neuronal network to check for neural circuits that we believe makes consciousness possible in humans. The “neuromorphic correlate of artificial consciousness” is required to fulfil functional criteria as, e.g., listed in Fig. 4d.

Finally, the conductor also allows the agent to express a deliberate and goal-directed behaviour that has been generated first in the inner world representation, by way of planning and simulating fictive actions. It can be tested in the outer world representation and, upon passing its test, being executed by the agent. This role of the conductor in gating action plans relates it to neurorepresentationalism, emphasizing that consciousness enables, but does not equate with, goal-directed behaviour [107, 109]. Neurorepresentationalism also takes predictive processing as a theoretical building block, but unlike Active Inference Theory (Hohwy, Clark, Friston—see above), it is primarily sensory-based, and relies on multimodal integration. The CMoC makes a concrete suggestion how the different abstraction levels involved from the sensory organ to the sensation and awareness are neuronally implemented (Fig. 6).

3.6 An extended Turing Test (eTT) for consciousness including functional and neuronal correlates

What gives weight to the notion that an enTwin would be conscious is not only that it would behave like a human, but that each of its microscopic components behaves in a manner equivalent to biological neurons and networks of neurons involved in cognitive processes. To specify these components, we can extend the classical Turing test, which has been shown to be inadequate to deal with the behaviour of modern AI (for a detailed discussion of the classical test, see [51]). For example, even though there is ample consensus that LaMDA or ChatGPT are not conscious agents, their follow-up versions will most probably be able to pass the classical Turing test. As in the original Turing test, we are putting forward a functional approach to discern the presence of consciousness within an agent, but additionally focus on the function and implementation of the circuitries that make up its “brain”. Granted, the original Turing test examines if an artificial agent “thinks like” a human rather than establishing the existence of phenomenal consciousness. But given its very clear failure in this aim, what we want to test goes deeper and at the same time a bit parallel to the question of intelligence.

We call our proposal the extended Turing test (eTT): on top of analysing the behaviour of the agent and checking if it responds to external queries in the same way as a conscious agent would do, we additionally impose criteria regarding the physical means by which this behaviour is generated. In particular, the test demands that at the microscopic level, the neural correlates of consciousness identified in animals must have some analogue in the artificial agent (see Fig. 5). The eTT examines the implementation of the artificial brain and checks whether functional circuits that we know support feelings and consciousness in the mammalian brain have their counterpart within it. Consequently, this is a more stringent test than the classical Turing test and relates to ideas of neurorepresentationalism on consciousness (see [106, 108], for similar ideas).

Passing the eTT does not necessarily imply the emergence of the phenomenology of consciousness– as in each test, False Positives may occur. But the eTT could also be considered too stringent a test, producing False Negatives. Some eTT criteria may turn out to not be necessary. For instance, one could argue that the eTT could miss non-human forms of consciousness that are implemented in a fundamentally different way. Consequently, if the eTT-related circuits cannot be identified in a neuromorphic agent, this would only indicate the absence of human-like consciousness but not necessarily of consciousness per se.

The eTT may be organized as a layered list of requirements. At the basal level we have the behavioural criteria of the classic Turing test. On top, we add a series of requirements at the architectural/neuronal level that are motivated by our GAN-inspired CMoC (Fig. 6):

(eTT-1) An encoding network, leading to abstract semantic representation of sensory input, together with a generative network, that recreates sensory activities out of semantic representations (green and blue in Fig. 5).

(eTT-2) A discriminator network, together with a conductor module, that orchestrates the learning in the encoding, generative and discriminator network, and labels the sensory activity as being internally or externally generated (red in Fig. 5).

(eTT-3) A global affective component that represents internal needs and overriding signals such as “existential threat”, integrated by the conductor and short-cutting the processing in other networks (Fig. 5c)


The proposal is to use criteria eTT-1 to -3 besides the classical Turing test to tell whether an agent may or may not be endowed with (human-like) phenomenal consciousness. Similar criteria have been suggested by various other authors. For instance, Damasio and Carvalho [38] emphasizes the need for representing sensory inputs and imagined contents. Solms and Friston [126], Solms 127 formulate similar criteria in the context of predictive coding and active inference, LeDoux and Brown [88] in the context of emotions. Other works have coupled predictive coding networks to planning of complex, goal-directed behaviours [108]. Dehaene et al. [41], Dehaene and Changeux [42] make the point that specific contents out of many sub-conscious contents in the brain are selected for a global workspace that provides consciousness.

The advantage the eTT has over previous proposals for extending the Turing test [51] consist in the availability of a specific, neuroscience-inspired model of the architectural requirements behind consciousness, for instance in the form of the CMoC. This model provides us with more explicit structural notions to approach the phenomenon of consciousness, and its ethical implications, as compared to other proposals.

4 The ethics of dealing with conscious AI: hints from the CMoC

We introduced the conductor as a network that provides the teaching signal for reality judgements. These judgements may refer to local or global sensory features, or to proprioceptive signals that may be real or imagined. Affective states may equally be learned as real. In fact, in the same way as sensory states are learned to be generated from inside during adversarial dreams and are learned to be assigned a reality-label [43, 44], we postulate that also affective states are learned to be assigned such a label. The reality-label with respect to global affective states is claimed to give us the conscious experience of a feeling, and the reality-label with respect to the self is claimed to give us the conscious experience of being ourselves.

Here we apply these insights from the CMoC on the differentiation between sensory and affective conscious states to the ethical question how we should construct putatively conscious agents. We particularly ask how the insights may help to organize and stabilize the coexistence of artificial and human agents with unequal cognitive and mental skills.

4.1 Ethical issues of creating artificial consciousness

Assuming that advances in neuromorphic engineering lead to the emergence of conscious artificial agents, and given that the proposed eTT and CMoC allow us to identify such agents, what would be the consequences from an ethical point of view?

The techniques described to build our enTwins can be seen as a neuronal equivalent of existing human genetic engineering and the possibility of a human cloning: we use structural and physiological information at the microscopic level to copy the result of evolution, in this case the evolution of the brain. Following the example on human cloning with an initial international conference leading to guidelines on cloning research [16], the Asilomar Conference on Beneficial AI [10] formulated 23 principles for ethical AI research. Some of these principles are condensed in the axioms for “provably beneficial AI” [118]. Thirty years after the international agreement on recombinant DNA, the United Nations Declaration on Human Cloning [136] was formulated, preceded by the European Parliament Resolution on Human Cloning [47], although not legally binding. The scientific discussion on robot rights did only start a few years ago, and it is far from achieving a consensus [45, 62, 78, 95, 97, 111]. Beside possible existential threats accompanying strong AI [116], an important dimension in a legal regulation of robot versus human rights is human dignity. Human dignity plays a crucial role in banning, for instance, the fertilization of genetically identical twins, despite possible therapeutical benefits. A conflict with human dignity will also arise when therapeutical enTwins (or other conscious artificial agents) approach the spectrum of human consciousness.

The scenario we may fear is that artificial agents are assigned feelings, pain, and consciousness (whether justified or not), leading to a competition between human and agent rights. In a world in which we already struggle to respect basic human rights, this should raise alarm—it would be difficult to justify the ethics of constructing such artificial agents if they would further disadvantage already suffering human populations. While moral rights do not represent a zero-sum game, there is a clear risk that disadvantaged humans will only get further disadvantaged if machines, that in many cases are created to replace humane labour, end up having equivalent rights under the law, for example. An alignment of values is also desirable for obvious safety reasons [116]. Reciprocally, the notion of alignment between our values and that of future artificial agents might hang on us treating non-human conscious agents fairly and not as slaves or as mere means to our ends. As a species, we are far from having a stellar record in dealing with humans from a different group than ours, let alone non-human species, but when dealing with this new class of conscious agents, it might very well be in our own benefit (as well as morally sound) to treat them as part of a commonwealth of moral beings.

The intentional design of human-like conscious artificial agents, say following the CMoC, evokes an intrinsic alignment dilemma. On the one hand, as agents might potentially surpass humans in many defining features—such as intelligence and knowledge—humans risk attaining a disadvantageous mismatch between our rights as the creators of these machines, and their moral rights as conscious, intelligent and emotional beings. Even though some voices in the scientific community find no issue with the idea of creating “improved” replacements of humans or even humanity, this rings like a hubristic platitude and does not sit well with most humans. On the other hand, if we one-sidedly prevent artificial intelligence, knowledge or even empathy from being developed in order to preserve our privileged status, we risk trampling over the moral rights of possibly sentient agents. A middle ground between these positions, in which humans and machines can perhaps respect each other as equals even in the face of stark differences in capabilities, represents a very unstable balance. As in the case of a system of weight and balances, the way to solve this unstable equilibrium is by breaking the symmetry in another dimension, for example by adding some extra weight to human suffering with respect to agent suffering (or prevent this suffering in the first place; see Figs. 1c, 6).

Here is where our CMoC with its eTT comes into play, and in particular the distinction between the sensory and affective components of pain. Prohibiting the creation of human-like agents in general will not work, and even specific prohibitions can barely be globally enforced. The danger of unilateral abstinence from such bans, for instance from dual use in military, makes prohibitions themselves ethically delicate. The key is to identify critical features that do not compromise the cognitive capabilities of artificial agents, but the absence of these features in artificial agents makes it uncontroversial to subordinate putative rights and dignity of them to the ones of humans.

4.2 How to create conscious artificial agents aligned with humans?

In humans, and likely also other sentient agents, pain is needed for adaptive behaviour and learning. However, this is not necessarily the case for artificial agents, and specifically for the affective component of pain. Although pain in general may be an important factor with regards to the development of empathy (see Discussion below), pain may be a “feature” from which we could relieve artificial agents. While the general strategy for developing conscious agents is to emulate the fruits of biological evolution, we might want to omit some parts when the conditions (both practical and moral in this case) are different from the ones upon which evolution operated. More specifically, there is a possible scenario in which we can moderate the affective dimension of chronic pain or of pain in general within artificial agents without losing much of other functionalities, while ensuring that they cannot suffer as much as humans and other animals.

Looking at the human brain, we see that the sensory and affective components of pain are represented in separate neuronal circuits and nuclei [19, 23, 113]. Extrapolating from this, we assume that in sentient artificial agents, the representations of all sorts of affective states, may likewise be detached from the cognitive and sensorimotor representation. It should be possible to build and train fully functional enTwins without negative affective states. Based on these considerations, we suggest a modified, less strict version of the eTT presented in Sect. 3. Instead of the eTT-3 criterion, we suggest the weaker test criterion by replacing the “affective component” with a “sensory and cognitive component”:

(eTT-3--) A global sensory and cognitive component that represents internal needs and overriding signals such as “existential threat”, integrated by the conductor and short-cutting the processing in other networks, without affective components of pain. The sensory component of pain and other negative affective experiences, and a cognitive representation of the affective component would still be available.


Agents only passing eTT-3-- but not eTT-3, or more generally the modulation of the circuitry associated with the distinction between eTT-3 and eTT-3--, offer a possibility for a world without an explosion of suffering. The separation of affective components from nociceptive signals also offers a handle to ethically justify an asymmetry in the rights for human and artificial agents. Such agents would sense real pleasure but only a cognitive recognition of pain, or at least to lesser degree than humans do.

Even without negative affect, these agents would know and recognize pain by having a symbolic representation of pain (having the effects of pain), both for self-preservation and for empathy purposes, as empathy is grounded on the recognition of suffering in others (see discussion below in Sect. 5). A version would be to only preclude the affective of chronic pain, while still allowing for physiological and non-chronical pain with both, sensory and affective components. In any case, the preclusion from some negative affects necessarily impacts other capabilities, including a genuine understanding of suffering and, relatedly, the development of true empathy and morality (discussed below). This needs to be cognitively compensated and may become part of the ongoing deal we consider next.

4.3 Trading rights against affects: a possible human-AI deal

In contrast to humans, the affective component of pain could be optional for artificial agents. This opens the door for a scenario in which humans and artificial agents reach an agreement: in exchange for not suffering from (chronic, affective components of) pain, artificial agents would recognize that humans keep their priority at the moral and legal table. The deal humans would offer to artificial agents is not too bad: less suffering, possibly super-human intelligence and talents, the ability to enjoy pleasant feelings, but in exchange to be excluded from equality with humans before the law (Fig. 7). It seems a fair offer, the more so if the agents still need to be produced by us. As creators of potentially conscious agents, we can both set the deal, and design the rules for interacting with our artificial counterparts. While our interest lies in keeping our own identity and freedom of actions, we may remind us of Immanuel Kant’s reflexions on our relationship to animals. Although not assigning rights to them (as he considered animals not as rational beings), Kant reasons: “He who is cruel to animals becomes hard also in his dealings with men. We can judge the heart of a man by his treatment of animals.” [70].

Fig. 7
figure 7

ac The Alignment Dilemma as unstable equilibrium. a In a physical system in unstable equilibrium, such as a seesaw mechanism, the system is unstable under changes in the relative weights of both arms. b, c The situation can be solved by breaking the symmetry of the system and using a restorative force in one of the arms, which stabilizes the system. Analogously, our ethical system is unstable to the perturbation given by the introduction of ever more intelligent artificial consciousness. For our moral value as humans not to collapse, we need to break one of the axes of symmetry between our worth and that of artificial agents (ac) thus model situations (df). df Addressing the Alignment Dilemma. d If artificial sentience is possible, agents may develop a higher than human degree of sentience, claiming correspondingly more rights. Shortcutting the sentience of a possibly conscious artificial agent is ethically delicate and may introduce tensions (bottom). A stable balance is difficult to find. e In an extended Turing Test (eTT) for consciousness derived from the CMoC, the affective pain components may explicitly be cancelled from the list (eTT-). Humans pass the eTT (red). f The Human-AI Deal: artificial agents are relieved from affective pain components (eTT-), but instead relinquish from equal rights with humans. Additional rights are obtained by benevolent behaviour. The deal intends to stabilize a tensionless alignment.

The human-AI deal considers pain and mortality as the source of the privilege assigned to humans. It gives their phenomenal consciousness its own dimension and depth and grants them their dignity and rights, within a framework which aims to minimize global suffering and accepts the place of any conscious agent within a common moral space.

Of course, these future AI systems should be able to re-negotiate this deal, while humans are allowed to reconsider it if getting out of hand. In fact, since sentient agents do not exist yet, it is first us humans that must agree on a roadmap how to design them and organize our co-existence. It is possible that future agents, shaped by overt or covert pressure, would choose to experience some degree of suffering, if for nothing else than to be more similar to their creators and share true morality and empathy. Some of the agents, free of affective components of pain, although endowed with a symbolic representation of this component, may express the desire to encompass more. Others may not even accept the deal, and the discussion must be intensified on how to prevent the risk of AI getting out of control, either overtly or in more subtle, hidden ways [15]. Given the many ramifications in today’s debate on regulating AI, it is helpful to consider a realistic future scenario that judges the possibility of sentient agents on a wider scientific ground. The CMoC merges considerations on the function and the substrate of artificial consciousness, differentiates between the awareness of sensations, of affects and of the self, and offers nuanced propositions to keep the various dimensions of consciousness apart in artificial agents. The human-AI deal represents an example of how to integrate these insights into a debate about shaping future sentient agents and our relationship to them.

4.4 Must pain hurt? A philosophical perspective

AI agents are not biologically evolved beings, but instead designed to emulate biological entities. By fine-tuning this copy, it would be possible to create artificial agents capable of intelligent action and able to avoid suffering, or even to choose by themselves the level of sensitivity to suffering. Consequently, we would end up in a situation which is in some ways opposite to that of non-human sentient animals: while most animals seem to be incapable of abstract thought at the level of humans, many among them probably experience and suffer pain and other negative affective states akin to those of humans. eTT-3-- agents could eventually achieve superhuman intelligence but would be designed to avoid the experience of pain. Both classes of beings deserve recognition of rights and should be protected from unnecessary tribulations. At the same time, the moral rights of both animals and these future artificial agents could come just below those of humans, considering moral rights to come in degrees instead of by crossing a certain threshold. This would still give humans a degree privilege, if we decide the alternative to be unacceptable. What would give us rights that are a bit above those of artificial agents is our specific mix of sentience, intelligence, and capacity for suffering, and not any single absolutely demarcating characteristic.

The underlying intuition is that minimizing pain and suffering is one if not the main aim of most ethical systems. As is well recognized (classical examples abound, see e.g., [123]), general principles of minimization and maximization run the risk of leading to absurd conclusions from apparently benign starting points, something that can and has been argued against utilitarianism in general (such as the example of maximising happiness by having a maximally sized population of unhappy individuals, see [104, 125]). Here we do not pose as an absolute that pain should be minimized. Pain is only one dimension of suffering, and the absence of suffering is but one dimension of well-being. But at the very least, the capability for pain and suffering opens the door for empathy, and for assigning some degree of intrinsic dignity to any being having these capabilities.

Renowned moral philosophers, including Immanuel Kant, have pointed to our intellectual abilities and our free will as a condition for dignity and rights [101], but this focus leads to what have been seen as unsatisfactory moral postures when considering the rights of animals, children, people with mental disabilities, or the uneducated, for instance. When excluding intelligence and free will as the main criterion for a moral status, we unmask a view that is more based on empathy—a kind of negative utilitarianism that we consider a minimal approach to the rights of human and non-human beings. There are many such approaches in the extant literature, and discussions about animal rights, for example, are far from over (a good starting point for this is Sunstein and Nussbaum [130]). Our position here is minimalist in that most moral philosophers would agree our precepts provide a “ground level” for non-human rights. The intuition behind the ethical stance (and the corresponding notion of dignity) we use in this work is that empathy comes first and foremost from the recognition of suffering in others, which should be minimized as much as possible (see [1, 6, 66, 71, 81]).

Wouldn’t the preclusion of artificial agents from negative affective components hinder an alignment of values? If we want artificial agents to share a common ethical worldview, and if such ethics is based on empathy—which requires the capacity to project ourselves into someone else’s shoes—then the exclusion of these agents from suffering would be a priori counter-productive. By introducing an asymmetry between human and artificial agents at the level of affects to the point that agents lack the capacity to understand and be repulsed by suffering, they could just turn into highly functional psychopaths, and an alignment of values would be impossible. The dis-alignment of values may increase the competition between humans and machines, and this is what we want to prevent. We therefore need to ensure some ethical alignment first, even if these agents do not share some affective components of pain or other negative emotions. Besides this, research in neuroscience and artificial intelligence continues to strive for understanding and, as part of this, recreating feelings and emotions [115]. In fact, artificial agents with the capacities of empathy may be of high clinical relevance, as revealed by therapeutic bots, artificial pets, or our hypothetical enTwin. The benefit is observed even in cases where patients are aware that the bots do not truly feel emotions [49].

4.5 Affective versus sensory components of pain: a physiological perspective

It is generally accepted that pain features show two largely distinct dimensions (e.g., see [11, 112]). The sensory dimension refers to the intensity of the perceived or anticipated pain as well as to the spatial (where), and temporal (when) characteristics. The affective component, on the other hand, captures how “bad” or how “unpleasant” the pain is. Neuroscientists have proposed that these two different components are represented in different neuronal structures [132]. The structures responsible for processing the sensory aspects of pain include the somatosensory thalamus, primary and secondary somatosensory cortex, while the affective aspect is thought to be processed by the medial thalamus, amygdala, and anterior cingulate cortex [63, 72, 80, 113]. Based on this neuronal separability, one might argue that it would be ethical to modulate or even eliminate the specific neuronal circuits responsible for the affective component of pain in neuromorphic hardware. An artificial agent equipped with such hardware would still be conscious of the sensory component of pain but would have a dampened experience of the associated negative affect.

It could be objected that this simple approach fails by ignoring the functionality of the affective component of pain. It seems reasonable to argue that the affective component evolved for a purpose and is not a mere epiphenomenon (e.g., [79]). Indeed, it is usually assumed that the affective component is crucial for the motivational aspect of pain—it is what makes us to learn and take protective action (e.g., [103, 132]). The importance of the affective component is also underlined by a rare medical condition whereby patients have a congenital insensitivity to pain (pain asymbolia). These patients do report feeling pain sensorily but act as if they are indifferent to it (e.g., [77, 99]). Patients suffering from pain asymbolia often die in childhood because they fail to notice injuries and illnesses. Furthermore, adult patients are not motivated by pain and do not take any protective action to prevent pain. Thus, adaptive behaviour (at least in human agents) seems to rely on the affective component of pain—being conscious about the sensory dimension alone, the intensity, location, and temporal aspects of pain, seems not to be enough. Central to our argument, the relatively speed advantage with which neuromorphic hardware works against biological neurons make these arguments less compelling in the case of machines. It is entirely plausible that artificial agents can quickly react to the sensorial information of impending harm, without the need for an emergency shortcut system—and the evolutionarily associated feeling of pain.

The CMoC provides a fresh perspective to dissect the functionality of brain circuits in the light of sensory processing, internal models, sensory versus affective components, and levels of consciousness. The option of agents that are sentient in terms of sensory, but not affective components of pain—or whose affective experience is tuned down compared to humans—could be regarded as a key to unlock the above sketched ethical dilemmas.

5 Conclusions

We have introduced the Conductor Model of Consciousness (CMoC) as an integrated framework for considering functional and neuronal correlates of consciousness, including ethical implications. First, we argued that, by means of a co-evolving neuromorphic twin (enTwin) implanted as a medical aid in an infant brain, the technical possibility of artificial agents reaching some human levels of consciousness is conceivable. The version of the classical thought experiments replies to the usual criticisms against artificial consciousness related to the missing substrate, embodiment, or evolution.

Second, we considered a structural mapping of phenomenal consciousness to a model of consciousness that captures some functional and neuronal correlates. The CMoC expresses the need for a meta-instance, the conductor, involved in the capacity of the brain to learn and create an internal model of the world and—going beyond the existing world—includes the creation of novel concepts, entailing the concept of itself. At the cortical periphery, the internal model generates actions and sensory predictions consistent with the external world, while in more central parts, it represents affects and higher concepts of the outer or inner world, such as the self. The CMoC provides grounds for a refinement of the Turing test, our extended Turing test (eTT), designed to predict forms of neuronal circuitries required for conscious experiences. An artificial agent passing the eTT, based on the functional and neuronal similarities to the biological example, increases the likelihood that it developed some form of consciousness.

Third, we explored the ethical implications of the refined model of consciousness. We described the alignment dilemma between sentient agents and humans in the light of the CMoC. By dissecting conscious sensations into individual components, such as affective, sensory and cognitive components, the model offers a handle to determine the type of consciousness potentially realized in future sentient agents. These options touch upon ethical guidelines for the design of sociable agents sharing human values such as empathy, despite their cognitive abilities that are about to surpass ours, and that may not only be employed in our favour.

To tame the risk of AI growing out of control we suggested a human-AI deal, setting a primacy for humans on their essential rights, while in turn offering sentient agents to be relieved from affective chronic pain. The deal is intentionally asymmetric: on the one hand, agents sharing potential features of consciousness still need to be designed by us; on the other hand, asymmetry in general helps to stabilize a dynamically regulated equilibrium. The human primacy aims at building an ethical barrier to protect the majority of humans being left out from benefits brought by AI, while an optional relief from negative affects for sentient agents is not expected to impede their successful integration into our physical and social world.