Nadia Guerouaou recently defended her thesis in Cognitive Neuroscience. Her doctorate focused on the use of computer technologies to transform emotion in speech during social interactions and their impact on the inferences made by individuals about their interlocutors.
She addressed this issue both from a societal point of view (experimental ethical study of moral acceptability by the population) and through questions of interest to cognitive neuroscience and therapy (study of the potential use of these filters for the treatment of traumatic memories in psychiatry). She is Principal Investigator of the TraumacoustiK clinical trial and Scientific Director of the TraumaVoice clinical trial.
She also went on a research visit to Tokyo to examine the relationship between researchers and artists and the ethical issues surrounding the synthetic self (androids, virtual reality, etc.). This work was funded by the Japanese Society for the Promotion of Science and the CNRS.
Your work focuses on the voice in social interaction. Can you tell us what role it plays?
During our daily social interactions, we are constantly and very often unknowingly making inferences about the state of our interlocutor that will influence our behavior towards him or her. From an individual's voice alone, we can deduce his or her emotional state, discern social attitudes such as benevolence, or detect doubt or confidence about what he or she is saying. All this without even having a clear idea of what we are basing these inferences on. The voice is therefore a crucial vehicle of information about other people's mental states, and in particular their emotions. Particularly aware of this, the fields of emotional computing and human-computer interaction have made it a major object of study. The idea is to enable artificial intelligence systems (AIS) to convey as many emotional nuances as possible, in order to "enhance" the user experience. This is what prompted OpenAI to develop "Sky", a particularly emotionally expressive voice for its chatbot.
Speaking of AI and voice, your thesis focuses on a computerized voice transformation technology you call a "voice filter". What is it and why this topic?
For some time now, our social interactions have been marked by two major societal changes. Firstly, the emergence of technologies that enable us to computer-control our facial and vocal expressions, which were once "natural" and associated with emotional states. We are all familiar with face "filters", for example, which enable us to sport an artificial smile, and are becoming increasingly popular on social networks. What I called "voice filters" in my thesis, which enable us to parameterize the emotions displayed in our voice, are still confidential. However, we recently learned that Soft Bank (a Japanese company) is to market an AI-based technology called "Emotion canceling Voice conversation", which would enable call-centers to erase anger in the voice of disgruntled customers in real time! Other companies in Europe are working on the development of these emotional voice filters. Which reinforces my belief that this technology is on its way out of the computer labs.
My second point is that the increasing digitization of our interactions (from video meetings to telemedicine) is creating a context conducive to the deployment of these tools for the computerized parameterization of the self. My thesis then addressed the question of their potential to transform our socio-emotional inferences about our interlocutors - i.e., their "anthropotechnical potential" to use a philosophical term - depending, among other things, on their social acceptability.
On the subject of acceptability, do you think that individuals will accept these AIS that transform their voice?
This is precisely the question I asked myself when I began my thesis in 2020, and to answer it I conducted an experimental ethics study aimed at assessing the moral acceptability of different situations involving the use of these filters, thought up a little like a black mirror. One of these scenarios was that of the filter proposed by Soft Bank to erase anger in call-centers! Particularly surprisingly, the results showed a high degree of acceptability of these voice-filtering situations, even when they involved hiding these voice transformations from the interlocutor. This obviously raises a number of ethical and societal questions, not least in relation to the issue of deepfakes.
Interdisciplinarity occupies a central place in your work, particularly through the articulation of neuroscience and philosophy to address digital issues. What is in it for you?
The question of anthropotechnics is usually dealt with by philosophers of technology, and it has particularly infused my work. In this respect, it seems to me that cognitive neuroscience, thanks to its methodology and theoretical foundations, can greatly contribute to this reflection. For example, I proposed using LF Barrett's model of predictive inference and constructivist theory of emotions to examine potential transformations in sociability processes.
Can you briefly summarize what these neuroscientific theories have enabled you to think about the effects of using voice filters?
Neuroscience tells us that our deductions about the emotional states of our interlocutors rest on our internal model of the world, based on our beliefs and past experiences. We need to bear in mind that emotions are culturally constructed: a voice inflection or facial expression is not an emotion in itself, but becomes one through the meaning our cognition attributes to it according to the "rules" of a given culture. This association, inscribed in our internal model, shapes our perceptions of other people's states. This is because our brain, far from being a passive receiver of information from our environment - contrary to common belief - creates our perception of the emotions in the voices of our interlocutors, and it does so on the basis of our beliefs learned from observing these associations.
What might become of this internal model living in an environment where an artificial smile can be presented on a face or in a voice, even when we are completely depressed? This could change the way we use these cues to deduce the emotions of our interlocutor. Several hypotheses, based on these theoretical corpuses, can then be considered, such as the emergence of new expressions specific to the digital world, the appearance of a monoculture of emotional expressions, or even a difficulty in reading the emotions of others...
All these hypotheses make it possible to consider, in advance of deployment, the various ethical issues raised by this technology.
Could you give us a few examples?
Very quickly, I am going to talk about the possible influence of these technologies on our social norms. If moral values can influence the use of technologies, we also know that this use can itself influence our moral landscape, and this is one of the soft impacts of technologies that I do not think we talk about enough. In our society, which promotes the control of self-presentation - A.R. Hochschild's "emotional labor" is a case in point - it is easy to imagine that the expression of certain emotions, once we have the means to control them, becomes completely unacceptable. I see these tools, in reference to Foucault's work, as veritable techniques of the self, whose socio-political stakes of control and power obviously need to be considered, a subject on which my future research will focus.
Human Technology Foundation, July 09th 2024
Further readings
Barrett L. F. (2012), “Emotions are real”, Emotion, 12(3) p. 413-429.
Casili, A. (2010). Les liaisons numériques. Vers une nouvelle sociabilité. Paris: Editions du Seuil.
Guerouaou N., Vaiva G., and Aucouturier, J.-J. (2021), “The shallow of your smile : the ethics of expressive vocal deep-fakes”, Philosophical Transactions of the Royal Society B : Biological Sciences, 377(1841), 20210083.
Guerouaou N. (2022), « Rendre sa voix plus souriante : deepfakes et filtres vocaux émotionnels », AOC, 6 Juillet 2022. https://aoc.media/analyse/2022/07/05/rendre-sa-voix-plus-souriante-deepfakes-et-filtres-vocaux-emotionnels/
Hochschild, A. R., Fournet-Fayas, S., and Thome, C. (2017). Le prix des sentiments :au cœur du travail émotionnel. La découverte.