Big data, open data, raw data, data driven... The term "data" is omnipresent in the debates on digital technologies. We would like to question here the false evidence that this neutral term seems to cover.
Probably, one of the main difficulties in measuring the effects of digital technologies is in the extreme difficulty of defining this concept of "data" which induces a kind of natural dimension to the detriment of the processes and actors, often hidden and erased, that gave birth to it.
Day to day, there is a very strong relationship between scientific and technological production and the act of writing in its many forms. Interest in the act of writing within the edifices of science and technology saw new life in the1980s. We thought an ethnography, so to speak, of the work associated with scientific and technological production would aid us in shedding light on the mystery surrounding the concept of “data”.
Following in the footsteps of Derrida, these works shifted attention to focus solely on semantic content and divulge the active and material aspects of its production. As a result, scientific and technological literature cannot be thought of as a mere vessel, neutral and transparent, but rather as a participant via its composition, organization, and the way it creates knowledge.
At the very heart of the issue, which also includes the idea of data in the digital world, is the matter of knowing how scientists write down their findings to reflect reality. Simply put, it is impossible to isolate knowledge of any kind, even when reducing it to a unidimensional “datum”, without paying equal attention to the material factors that allowed it to be inscribed. This especially applies to methods of displaying data, by which phenomena originating in a “natural” setting are made visible. The reality of these phenomena is progressively translated into inscriptions that can be seen and interpreted, either as written text or graphical representations (map, diagram, table, etc.). In science, data has “written”properties that make it coherent and allows it to be spread.
By expanding the idea of text to include the broader idea of graphical“inscriptions” (e.g. drawings, graphs, recorded numbers, points), one must look not just at the human authors of scientific literature – researchers or technicians –but also at the “inscription devices”, which contribute a denser description of reality. In a laboratory, from the sensor recording a variable to the publication of a paper in a scientific journal or the result of an algorithm, there is always an extensive chain of reading and inscribing. As is the case today with digital data, one must be able to trace back its long, written history to understand the role of writing, its process, and the skills required to create it.
To follow what science and technology studies have been teaching since the1980s, it is vital to be aware of the fact that the digital world is above all, perhaps even more so than laboratories, a writing environment filled with recording instruments (those ubiquitous sensors and devices) coordinated by an immense field of writing (scripts, codes, protocols). Today, this dynamic is invisible and creates a massive “black box” effect. This opacity contaminates every level of society and is beginning to produce widespread damage to public trust in these intrusive methods.
Ethnologists observing activity in the laboratory have pointed out another aspect vital to our purposes: the invisible, hidden effort and skill needed when recording on a large scale as well as when organizing and translating reality. To go from one inscription to another, it is crucial to examine all of these “negligible” tasks done by whole hordes of data workers as a means of shining a light on all of the “little hands” doing the “grunt work”.
The positivist and reductionist roots of the digital age can generally be traced back to the 1840s and the ubiquitous, normalized, and mechanized written communication of bureaucracies still in their infancy. Between the 19th and 20th centuries, public institutions and businesses had a wide array of writing technology and infrastructure. Then, just as the civil service developed a public system of population statistics, businesses experienced a management revolution by rethinking the market, which was endowed with writing technologies for measuring, calculating, and sequencing. Even before the digital age, data became a commodity for the public and private sectors as a key component for coordinating all types of exchange.
At this stage, it is necessary to point out the political dimension of writing processes by examining the value placed on the various stages of data production.The mechanized normalization of producing writing and data within bureaucratic organizations is primarily centered on the political principle of efficiency. By valuing efficiency, data work and data workers are, often imperceptibly, relegated to the shadows cast by algorithms and machines. This is spectacularly true when it comes to the discourse around and perception of artificial intelligence, in which almost no credit is given to the colossal efforts made by AI “handlers” and the microwork done by people in the shadows. In fact, at the heart of even the most seemingly autonomous processes, there is still a proportion of essential work done by people on the edges and in the interstices of the network, including maintaining the machines used to produce and spread data.
There are in fact many cases that show that, although producing masses of data may look simple, automatic, “mindless”, and valueless, a closer look reveals a density and complexity worthy of our full attention. Data workers, a term that includes more or less anyone who has contributed even queries to a search engine, are part of a massive “back-officing” of the world. Micro tasks, coordinated by ever more monopolistic platforms are the harbingers of a post capitalism with such negative and violent social externalities that even a sovereign state will struggle to counteract – if it is not doing so already.
Of the many varieties of data, it is “raw data” that should receive our full attention here because the historical status of data has changed along with it subiquity in any discussion of digital technology. The massive liberation and increased speed of data distribution has become an easy stand-in for transparency, innovation, democracy, and efficiency to the point of becoming the forefront of the vast solutionism movement so typical of discourse and agendas in the digital age.
In this worldwide technophile movement, there is one term that catches the eye because it is found within the positivist aspirations of champions of so-called“open” systems, free software, the bureaucratic virtue of transparency, access to information, and Anglo-Saxon accountability. The term in question is “open data”and its corollary of “raw” or “unmodified” data.
This previously unknown data entity, appearing first in 2007 at the meeting inSebastopol that laid the foundations of open data and having since been amplified by the biggest names in the digital transformation, would go on to create “raw data” as a new type of information, one that does not refer to files created by bureaucratic administration, nor to statistics. Instead, it refers to a type of information that is a more fundamental precursor to the usual categories – with no further definition. It refers to something that is “already there”, something that pre-dates any type of write-up and that would be easy and straightforward to “liberate”. It is a theory of information that runs counter to what this paper has already pointed out: the real and material significance of data production, processing, and distribution.
The idea of “raw data” aims to dematerialize the concept of data, or even to naturalize it by granting it the status of a raw material and commodity. This neo positivist ideology, however, does not bear out in reality and, in our opinion, contributes to an oversimplification of digital data, especially when it does not measure the political aspect of its social fabrication in the positive and collective sense of the word. In doing so, this conception of the idea of data, which we deem false, has major consequences when it implies to the public that gaps in the use of so-called “raw” administrative data are suspected of feeding a binary opposition between transparency and opacity. Indeed, the perfect datum, innate and discovered naturally by “platformized”, crowdsourced programs, does not exist and runs counter to the reality of the discrete, complex mechanisms that create it.We must therefore abandon any realist position and admit that data is not an informatic entity that already exists and just needs to be disseminated (or“liberated”), but the provisional result of a delicate process of creation. As Latour once wrote, we must admit that data is always something that has been obtained.
Encouraging people to change their individual or collective behavior while maximizing cost effectiveness (be that cost financial or political) is the ultimate goal of any human government. Whether it is the public authority acting as partof a biopolitical vision to protect and develop the population (health care policy, security, ecology, etc.) or a business, whose reason for existence is managerial, productive, or commercial efficacy, human organizations all seek progressivist“change”.
Until recently, this mission consisted of devising a variety of top-down incentives that were always limited by the risk of excessive repression or authoritarianism(costly and counterproductive), long lag times (between decision making, implementation, and measuring the effects), or even time- and space-limited effectiveness.
Over the past decade, however, the aforementioned increase in data work, social psychology, management, and the digital platformization of social relationships have given rise to a general theory of gentler influence now known as “nudge”.While it is not yet a household term, we believe it will become a central topic in the coming months and years. As was the case for marketing and advertising, a democratization of these behavioral techniques is essential.
The founding work on this trend, written by Richard Thaler and Cass Sunstein, was published in 2008. In the introduction to their book, the authors explain that economism and its notion of the rational consumer is pure fiction. Instead, the many real examples of social dysfunction – obesity, debt, and lack of social security are examples given by the authors that this paper will discuss – give credence to the notion that the idealist perception of rational human behavior has failed. Homo economicus, whom Thaler and Sunstein cleverly nickname the“Econ”, makes judgement errors every day revealing false reasoning and multiplying biases. Two areas of behavior define the power of nudge: inertia andthe possibility to use it to design choice architecture. For example, in a self service cafeteria, it is very easy to direct people’s choices simply by displaying the food in a certain way.
This simple method of incentives through choice architecture, like the artful displaying of wares that has been done since the dawn of trade, can be drastically upscaled with digital-age trading platformization. Very quickly – by the late aughts – nudge theory was being studied for its uses in policy. In highly liberalized American and British politics, nudge theory provided a theoretical third way between state interventionism and ultra liberalism. When it came to health care and high debt, supporters of nudging claimed to have found a solution for counteracting behavior considered to be antisocial while maintaining a lack of state intervention. Anglo-Saxon public policy has always erred on the side of "laissez-faire", a liberal policy whose hypothesis postulates that everyone can be their own entrepreneur and consistently make the right decision motivated by self-defense and their own calculated self-interest. Laissez faire also rejects any type of coercion, in keeping with liberal doctrine, whose goal is to limit state meddling in individual lives as much as possible.
Thaler and Sunstein go on to counter these two postulates but do so using an original approach: For them, it is necessary to accept that people are fallible and to consider instead that, given the complex set of choices available when it comes to buying health insurance in America, for example, help is indispensable to navigate what is on offer. Moreover, this type of state paternalism is not the same as coercion; skillfully employing choice architecture, interface design, and optimized data processing could make the external nudges on choice imperceptible to users. To this end, Thaler and Sunstein coin an almost Or wellian oxymoron: “libertarian paternalism”.
Choice architecture, which seeks to do no more and no less than improve the lives of users of public services without their knowing, is naturally underpinned by an eminently political vision of social relations. It is therefore symptomatic that the primary areas of application for nudge (obesity, debt, lack of insurance) are typical stigmas of poverty. By postulating a hypothesis based solely on the behavioral origins of these negative social traits, however, the authors completely erase the essential social and political dimensions of these inequalities. Their book and their approach make no mention of the collective political responsibility for these issues or their remedy.
Although the success of nudge in the late 2000s was linked to both DavidCameron’s conservatism and the health-care debate under Barack Obama, pairing it with data turn and artificial intelligence may well have an enormous impact on globalized choice architecture policy, especially digital platforms and players that are deliberately trying to achieve or maintain a monopoly in the area of information processing.
Of course when it comes to guiding students to make healthy choices in the cafeteria, reducing speeding with automatic radars that use smiley faces insteadof words, or helping people to not forget to renew their insurance policy, everyone can agree on a general application of this type of choice architectureand “libertarian paternalism”.
The road to hell, though, is paved with good intentions and below are two quite well documented examples of ethically questionable usage of these social neuroengineering techniques.
One of the most promising applications of nudge is in the relationship between humans and robots in the form of an extension of choice architecture andinterfaces. Building emotional relationships with robots has become a field of research unto itself which starts by detecting, then classifying, and ultimatelymodelling emotions using verbal and non-verbal cues. From commercially available voice assistants to the potential for robot carers of the sick or disabled,nudge theory has a wide field of application to study and improve certain groups’ empathy towards machines. Modeling and implementing language-related socialskills such as politeness, humor, or irony is what some research teams are focusing on to identify and interpret certain behavioral cues by human users that indicate social and emotional interaction in order to then use humor to engage the human user in a long-term relationship with a Nao robot. Chatbots’ andanthropomorphic robots’ ability to detect, interpret, and simulate emotions –made possible by deep-learning and AI – means that they can emotionally profilepeople in real time.
Once an emotional state has been determined, the machine can calculate the target (increasing emotional well-being in the elderly, for example) and use conversational nudge, such as humor, to establish and have a dialogue with the person to guide them towards this type of behavioral objective. Emotional attachment to a machine is the behavioral objective in this case.
However, these influencing methods drift very quickly into conditioning, especially with young children. An example of this is when a voice assistant teaches a child to be polite to it using libertarian paternalism, which translates here to a spoken message of reward when the child addresses it politely (“please”,“thank you”).
The acceleration and scale at which nudge is being implemented through digital means paired with artificial intelligence has not failed to raise recent interest in one particular area of the public sector: elections. A textbook example of choice and democratic free will, voting presents itself as a natural area of application for nudge. Ever since Barack Obama’s 2008 campaign, voter data has been a strategic pillar. Profiling and mapping were prime drivers of his massive and famously successful canvassing campaign.
Eight years later, with the development of nudge theory and the aid of advances in processing Big Data, the 2016 American presidential campaign employed “big nudge” or “hyper nudge”. Voters were openly profiled by their fears (immigration, gun control), which, according to campaign advisor Roger Stone, are the most powerful drivers. Targeting these demographics by their location in key counties in swing states coupled with massive social-media campaigns containing bold-faced lies to trigger fear in those voters turned out to be a remarkably effective choice architecture using nudge that was unprecedented in its efficacy and precision.
Invisible data work and the development of nudge theory underline the importance of an age-old school of thought that is becoming ever more relevant.This philosophical concept has been a driving force behind technical progressivism since the 18th century and seems to us to be worth revisiting tohave an overall view of the issues raised when digital technology intrudes in all aspects of individual and collective lives.
This powerful philosophy is called “reductionism”. Notwithstanding its sizeable role in scientific and technical efficacy, our opinion is that it cannot be allowed toextend to the dogma of “reducing” human complexity down to data, even masses of it, that can be modeled and manipulated.
Reductionism is a pivotal concept in Cartesian materialism, which aims to simplify existing phenomena down to as many elementary components as necessary. In materialist doctrine, all that exists is matter and physics is the fundamental science. The resultant analytical method, essential for scientific processes, has demonstrated its full worth and has been endorsed by the biggest names in science. However, the absolutionist take on this principle, by which only that which is composed of matter, physical phenomena or anything that falls under measurable “data” should be considered to exist, is of course questionable.
Indeed, as the title of Pablo Jensen’s book states, society does not fit into equations. Physicism, by which physics is the general model that explains the material world, cannot be epistemologically transposed onto issues that are sociological, anthropological, or especially political in nature. Rationalism’s key ideas, such as reproducibility and predictability, clash with the complexity of human relations. Statistics, on the other hand, is always defined as a rational science of social affairs that could experience an epiphany thanks to the digital turn. Political excesses in social engineering result in the blatantly misguided belief that any social issue (work, health, violence) can be modeled by isolating and simplifying it. The repeated – even consistent – failures in economic forecasting underline how excluding certain effects to simplify models or confusing statistical correlation with statistical proof cause the reductionist approach to social phenomena to fail.
There are two opposing prediction models: extrapolating the past and modeling.The former is only relevant for the near future, the second collapses as the number of parameters grows. To explain the epistemological limits of the social sciences, Jensen says that there are four essential factors that make it qualitatively more difficult to simulate society than matter: human heterogeneity, the lack of any kind of stability, the numerous relationships to consider both in time and space, and the way people respond to having their activity modeled.
The element to remember here is that the fictional representation of models, which has of course proven to be effective in the physical or natural sciences, is incapable of predicting social behavior. More broadly, modelling and social engineering, despite their rationalist dressing, play a part in the political conception of societies: one that believes that society can be externally modeled and simplified. This ideology of reifying human relations assumes that only action from the outside can “change society” and that the creativity and pluralism of the people involved is not enough. In reality, it is a depoliticized, even dehumanized, view of social relations.