Versions of self are articulated through pre-defined terms, states Judith Butler: ‘We form ourselves within the vocabularies that we did not choose’, suggesting that to develop experiences outside of established patterns, ‘we have to reject those vocabularies, or actively develop new ones…’ (Butler).

Extended Mind Theory is a key concept in Philosophy of Cognition addressing the role of social and material structures in the construction of mind, where mind extends beyond the boundaries of skull and skin. Ideas from Extended Mind Theory re-contextualised in relation to moving image, in my view, facilitate the active development of a vocabulary of the liminal zones between mind, body and world relations, which adapt to, and expand, ideas of voice. The term “recruitment” describes the process by which an agent appropriates available materials or equipment – a walking stick, for example – to extend and restructure their engagement with the world. Such recruitment induces a reconfiguration of experience. In relation to voice, I propose, this amounts to a reconfiguration of established conventions of address and the accepted (power) dynamics contained within them.

Embodied cognition concentrates on the complex ways in which bodily skills and action inform perception and thought. The incorporation of new ‘equipment’ into ‘the thinking and acting systems that we identify as our minds and bodies’, thus expands ‘the negotiability of our own embodiment.’ (Clark 32). These external systems or objects extend the agent’s literal reach, in the example of a (walking) stick, and equally extend the agent’s opportunities, which includes the well cited example of dementia sufferer Otto’s trusted notebook with directions enabling his visit to MOMA in New York[i] (Clark and Chalmers). Embodiment in this view is a negotiable experience and open to change; it is a fluid and potentially constantly evaluative state of being inhabiting the liminal zones between mind, body and world. In what circumstances could voice function as such an extension?

The contact point or threshold between agent and extension, and where the performance-relevant interaction can be reliably defined, is the ‘interface’. The case for cognitive extension relies on the special features of the flow of information across those thresholds, and the resultant properties of the ‘new systemic wholes’ created through the newly incorporated bodily and sensory enhancement. (Clark 33) The speaking subject that is also a hearing subject simultaneously affects herself in a feedback loop transmitting and receiving all at once (Kane, 192) as voice is both delivered and received by the agent in the moment of speech. In this context, I suggest that ‘new systemic wholes’ brought about by voice occur where voice is both subject and object. Voice has the dual capacity to present the agent in the moment of speech (subject-voice), and represent the agent (object-voice) where object-voice is the voice heard by the speaking agent in their moment of speech. In other words, voice can be recruited as object; voice originates from the agent and can be extension for that agent. Voice itself is a threshold between subject and object, agent and extension, and we could describe as the process of voice, the liminal territory enacted between the subject-voice and its position as object-voice. This process I consider a form of embodied extension.

The Present Voice

Voice is positioned in the dynamics of movement, time, force, space and intentionality or directionality. (Stern 9) These ‘vitality forms’ enable us to understand the nuances and meanings beyond ‘official’ language and in relation to their implicit, contextual and/or culturally specific readings. The shifting formal and temporal qualities of voice in spoken address heard in variations of pitch, tone, volume, speed and directionality, make up the foundations of voice training for actors and public speakers to share emotion and meaning with their audience through methods of delivery and beyond the content of what is actually said. In my moving image work I avoid these intentional forms of vitality to offer a passive-dynamic vocality for the audience, in other words, I do not wish to deliver more than the content of what is said, there is no intention to emote or convince. Working to avoid these pervasive techniques I practice a flattened tone of voice with reduced forms of vitality to deliver spoken testimonies of subjective knowledge collected from archive and autobiographical materials: diaries and journals, documented correspondences and interviews I conduct. In my view this offers a more ambiguous and liminal context for the listener in the sense that it does not aim to inhabit or share any particular state of being. This is also used to merge the visual and vocal fields in sharing equal position rather than a voice-‘over’ the images: voice is part of the fabric of the film, its texture belongs to the images and is not a didactic presence with a singular authoritative position. Voice positioned ‘over’ images can be addressed by volume but is frequently and more pervasively done through certainty of voice. As part of this approach of passive vocality, rather I engage environmental sounds to harness auditory description and vitality dynamics with the aim of sharing embodied engagement through the environment and thus position the viewer on a shared ground with the speaker/voice. One such example is found in my work, A Desire For Organic Order (2015), in which my, at times, breathless speaking voice relays a presentness and physicality to what is said, recorded as I walked the grounds of the Royal Botanical Garden Edinburgh where birds, distant voices, cars and planes overhead can all be heard delineating an area of nature within a city environment. These are details that my professional sound-mixer suggested be removed, concerned they may be heard as distractions and interpreted as erroneous due to their impediment on the clarity of a voice over image. The sound and voice share an embodied expression of the site and my own bodily position journeying within it. This is recorded with two different types of microphone: a set of binaurals and a stereo voice mic. The binaural microphones take the shape of small headphones, which are placed inside the ear to record the exact dynamic properties of sound in relation to the listening agent’s bodily position and perspective within the acoustic environment. Not commonly used for recording voice, I used this method’s proximity to my throat to capture detail of my internal noises, swallowing and breathing, as I walked and talked, and conceptually to record my object-voice from the position of my ears, in other words to document how I ordinarily hear my own spoken voice returned in the moment of speech. The second microphone was a hand-held stereo recording device carried in front of my body to record my spoken voice as it would sound to ‘other’. These dual recordings play simultaneously on the soundtrack creating a third voice that inhabits the between.

The Re-presented Voice

We make auditory inferences as viewers watching a talking head on TV whilst their voice emanates from the location of the speakers. The sensory integration is described as ‘drag’ on each of the senses to arrive at a satisfying ‘between of measures of precision and imprecision’ (Hohwy 131). This articulates the physical liminality inhabited by the viewer stationed between the TV image and the speakers’ projected field, as well as the viewer’s sensory liminality present in the moment between the essentially conflicting visual and auditory sensory inputs.

Ideas of extension connected with the occupation of these liminal or suspended spaces, for embodied viewing/listening and for the on-camera embodied subject, are those of near and far space for the body. In the example of holding a stick, which works to alter the brain’s understanding of near space (within reach) and far space (outside of reach), ‘simply holding a stick causes a remapping of far space to near space. In effect the brain, at least for some purposes, treats the stick as though it were a part of the body.’ (Berti and Frassinetti quoted in Clark 38). By expanding near space, the agent could be described as taking up more space in the world. In relation to voice we could consider that in everyday situations we know the capabilities of our voice, the range and audibility to an extent, and therefore our reach in common circumstances. When we speak we expect to be understood and this is broadly unchanging within our local environment, yet the circumstances of our speech (background noise or echo, one-to-one or group conversation) are in constant flux, and the voice must respond accordingly: to be heard, to be understood, to take account of the proximity of the hearer, etc. A vocal example of ‘holding a stick’, I suggest, is the reverberating, echo-ing voice. By calling out in a cavernous space, hearing our own temporally-delayed, suspended voice repeated and returned to us as an independent object, we are present in a situation where we experience an embodied extension beyond our bodies’ natural reach. Echo ‘takes the moment of a sound and repeats it, expanding beyond the original event, and yet also returning it, as other to itself.’ (LaBelle 169). This articulates the duality of the voice, where the distance created by echo illuminates the thresholds between subject and object that voice inhabits. In this example, the original release of the voice is an embodied presentation of self extended by repetition, while the returned voice, a representation of self, takes the role of ‘the stick’, an independent object expanding the reach of the body, which through our embodied recognition and sense of ownership of it, we could imagine the brain treating as part of the body. This is an opportunity for the brain to remap far space to fall within near space, thus expanding what is considered near space.

Address to camera could perhaps be considered as a version of the echo-ing site. Just as calling out in a cavernous space results in an echoed voice, of a suspended version of self that could be said to expand near space by extending the distance between the subject-voice release and its return as object-voice, we could say for people in the public eye who frequently hear their words returned to themselves via television, printed press or radio, it performs a long-delayed echo. As this becomes embedded into the agent’s patterns of behaviour and action, it provides a sense of extension into the public realm, expanding their vocal sense of near space. The embodied agent in this context experiences a sense of taking up, or perhaps requiring, more space in their environment.

The returning voice of the echo is ‘an iteration whose reverberations expand to pry open a space between the “original” and its rearticulation … to put into motion an uncertain trajectory, where orientation may also tend toward disorientation; where singularity may give way to multiplicity.’ (LaBelle 170). This articulates well the suspended-self produced through the echoed voice; a version of self that not uncommonly people in the public eye suggest is ‘not the real me’. The experienced and authoritative voice, or version of self, harnessed through these forms of direct address to camera, also doubles as the exposing voice: ‘the voice as authority is one part of the story. On the other hand it is also true that the sender of the voice, the bearer of vocal emission, is someone who exposes himself, and thus becomes exposed to the effects of power which not only lie in the privilege of emitting the voice, but pertain to the listener. The subject is exposed to the power of the other…’ (Dolar 80). This dual state can be often be witnessed by politicians and other powerful individuals aiming to straddle two opposing versions of address: of authoritarian leader, and fallible human. The address between agent and camera inhabits an intimate dynamic, in the moment of speech the voice is absorbed and simultaneously returned to the speaking agent by the mute, un-flinching lens. This physical dynamic could be compared to that cultivated in analysis, between analysand and analyst, where the intended hearer of what is said is the speaking agent who hears him/herself through the silent listener. In address to camera, the imagined audience is the intended hearer present at the time of address but only in the mind of the speaker, and in this way the speaker is talking to him/herself as other.   The camera facilitates hearing oneself back through a silent, listening other as the speaker accesses an imagined audience via the camera. Address to camera in this light is an interface that facilitates ‘performance-relevant interaction’ where a new agent-world circuit is produced through the special features of the flow of information across thresholds: a looped dialogue between the speaking agent and their self as other.


[i] Inga hears from a friend that there is an exhibition at the Museum of Modern Art, and decides to go see it. She thinks for a moment and recalls that the museum is on 53rd Street, so she walks to 53rd Street and goes into the museum. Otto suffers from Alzheimer’s disease. [He] carries a notebook around with him everywhere he goes. When he learns new information, he writes it down. … Otto hears about the exhibition at the Museum of Modern Art, and decides to go see it. He consults the notebook, which says that the museum is on 53rd Street, so he walks to 53rd Street and goes into the museum. […] Otto believed the museum was on 53rd Street even before consulting his notebook. … the notebook plays for Otto the same role that memory plays for Inga. …it just happens that this information lies beyond the skin.’ (Clark & Chalmers 12-13).

