Computer-generated imagery used to be a fairly well-defined object, compartmentalized from the indexical images of celluloid-based cinematography and even live-action videography. CGI would appear as clearly recognizable logos in television advertisements, as ostentatiously flashy graphics in blockbuster movies, or as experimental objects on display in demo reels and the like. In such forms, CGI has long been combinable with non-computational imagery, but a clear juxtaposition between the two was generally in play—at least from a technical and production-oriented point of view, if not necessarily from the perspective of the end products’ viewers, who were ideally both fascinated by the technical marvel and, paradoxically, supposed to be convinced of the (pro-filmic and/or diegetic) reality of the computer-generated objects. Today, however, whatever remained of these distinctions between CG and non-CG imagery has been thoroughly dismantled, as virtually all images are filtered through computational systems, with the effect that we can hardly speak any longer of non-computational imagery at all. We might protest that a selfie I take with my smartphone is still an image of me, and thus far different from the algorithmic models of earlier CGI, so that the distinction remains. But there is a generative dimension at play in all sorts of computational image capture and playback processes (including “smart” preprocessing, motion prediction and motion smoothing, upscaling, and so on). Where does the non-computational image end and CGI begin? In fact, I contend, all images today are computational in a significant, generative sense, which is to say that virtually all imagery today is CGI. The borders have been thoroughly blurred, and this has important ramifications for our perceptual and embodied relations to the world—with startling political consequences as well.

Machine learning–enabled face swapping videos, so-called DeepFakes, will serve here as an example. DeepFakes are all about blurring borders, both between CGI and non-CGI as well as between very real bodies and their images. As a result, DeepFakes pose significant challenges to conventional modes of viewing; indeed, the use of machine learning algorithms in these videos’ production complicates not only traditional forms of moving-image media but also deeply anchored phenomenological categories and structures. By paying close attention to the exchange of energies around these videos, especially the investment of energy on the part of the viewer struggling to discern the provenance and veracity of such images, we discover a mode of viewing that both recalls pre-cinematic forms of fascination while relocating them in a decisively post-cinematic field—thus leveraging a shift in the correlative potentials or modes of intentionality open to viewers. This media-historically anchored transformation, which recalls what Alexander Galloway has recently identified as the shift from a photographic to a computational “contract” of visuality, depends on a partial undoing of constituted subjectivity;1 the human perceiver, as we shall see, no longer stands clearly opposite the image object but instead interfaces with the spectacle at a pre-subjective level that approximates the nonhuman processing of visual information known as machine vision. While the depth referenced in the name “deep fake” is that of “deep learning,” the aesthetic engagement with these videos implicates an intervention in the depths of embodied sensibility—at the level of what Merleau-Ponty has called the body’s “inner diaphragm” that, “[p]rior to stimuli and sensory contents, […] determines, infinitely more than they do, what our reflexes and perceptions will be able to aim at in the world, the area of our possible operations, the scope of our life.”2 While the overt visual thematics of these videos is often highly gendered (their most prominent examples being so-called “involuntary synthetic pornography” targeting mostly women), viewers are also subject to affective syntheses and pre-subjective blurrings that, beyond the level of representation, open their bodies to what Hortense Spillers refers to as fleshly “ungenderings”3 and re-typifications with far-reaching consequences for both race and gender.

How does this work? First, let us note that DeepFake videos are a species of what I have called “discorrelated images,”4 in that they trade crucially on the incommensurable scales and temporalities of computational processing and its ability to defy capture as the object of human perception (or the “fundamental correlation between noesis and noema,” as Husserl puts it). 5 To be sure, DeepFakes still present to us something that is recognizable as an image. But in them, perception has become something of a by-product, a precipitate form or supplement to the invisible operations that occur in and through them. We can get a glimpse of such discorrelation by noticing how such images fail to conform or settle into stable forms or patterns, how they resist their own condensation into integral perceptual objects—for example, the way that they blur figure/ground distinctions. The article widely credited with making the DeepFake phenomenon known to a wider public in December 2017 notes with regard to a fake porn video featuring the likeness of Gal Gadot: “a box occasionally appeared around her face where the original image peeks through, and her mouth and eyes don’t quite line up to the words the actress is saying—but if you squint a little and suspend your belief, it might as well be Gadot.”6 There’s something telling about the formulation, which hinges the success of the DeepFake not on a suspension of disbelief—a suppression of active resistance—but on a suspension of belief—seemingly, a more casual form of affirmation—whereby the flickering reversals of figure and ground, or of subject and object, are flattened out into a smooth indifference.

In this regard, DeepFake videos are worth comparing to another type of multistable image: the digital lens flare, which is both to-be-looked-at (as a virtuosic display of technical achievement) and to-be-overlooked (after all, the height of such images’ technical achievement is reached when they can appear as transparently naturalized simulations of a physical camera’s optical properties).7 The tension between opacity and transparency, or objecthood and invisibility, is never fully resolved, thus undermining a clear distinction between diegetic and medial or material levels of reality. Is the virtual camera that registers the simulated lens flare to be seen as part of the world represented on screen, or as part of the machinery responsible for revealing it to us? The answer, it seems, must be both. And in this, such images embody something like what Neil Harris termed the “operational aesthetic” that characterized nineteenth-century science and technology expos, magic shows, and early cinema alike; in these contexts, spectatorial attention oscillated between the surface phenomenon, the visual spectacle of a machine or a magician in motion, and the hidden operations that made the spectacle possible.8 It was such a dual or split attention that powered early film as a “cinema of attractions,” where viewers came to see the Cinématographe in action, as much as or more than they came to see images of workers leaving the factory or a train arriving at the station.9 And it is in light of this operational aesthetic that spectators found themselves focusing on the wind rustling in the trees or the waves lapping at the rocks—phenomena supposedly marginal to the main objects of visual interest.10 DeepFakes also trade essentially on an operational aesthetic, or a dispersal of attention between visual surface and the algorithmic operation of machine learning. However, the post-cinematic processes to whose operation DeepFakes refer our attention fundamentally transform the operational aesthetic, relocating it from the oscillations of attention that we see in the cinema to a deep, pre-attentional level that computation taps into with its microtemporal speed.

Consider the way digital glitches undo figure/ground distinctions. Whereas the cinematic image offered viewers opportunities to shift their attention from one figure to another and from these figures to the ground of the screen and projector enabling them, the digital glitch refuses to settle into the role either of figure or of ground. It is, simply, both—it stands out, figurally, as the pixely appearance of the substratal ground itself. Even more fundamentally, though, it points to the inadequacy, which is not to say dispensibility, of human perception and attention with respect to algorithmic processing. While the glitch’s visual appearance effects a deformation of the spatial categories of figure and ground, it does so on the basis of a temporal mismatch between human perception and algorithmic processing. The latter, operating at a scale measured in nanoseconds, by far outstrips the window of perception and subjectivity, so that by the time the subject shows up to perceive the glitch, the “object” (so to speak) has already acted upon our presubjective sensibilities and moved on. This is why glitches, compression artifacts, and other discorrelated images are not even bound to appear to us as visual phenomena in the first place in order to exert a material force on us.11 Another way to account for this is to say that the visually-subjectively delineated distinction between figure and ground itself depends on the deeper ground of presubjective embodiment, and it is the latter that defines for us our spatial situations and temporal potentialities. DeepFakes, like other images produced by discorrelative technologies, are able to dis-integrate coherent spatial forms so radically because they undercut the temporal window within which visual perception occurs. The operation at the heart of their operational aesthetic is itself an operationalization of the flesh, prior to its delineation into subjective and objective forms of corporeality. The seamfulness of DeepFakes—their occasional glitchy appearance or just the threat or presentiment that they might announce themselves as such—points to our fleshly imbrication with technical images today, which is to say: to the recoding not only of aesthetic form but of embodied aesthesis itself.12

In other words: especially and as long as they still routinely fail to cohere as seamless suturings of viewing subjects together with visible objects, but instead retain their potential to fall apart at the seams and thus still require a suspension of belief, DeepFake videos are capable of calling attention to the ways that attention itself is bypassed, providing aesthetic form to the substratal interface between contemporary technics and embodied aesthesis. To be clear, and lest there be any mistake about it, I in no way wish to celebrate DeepFakes as a liberating media technology, the way that the disruption of narrative by cinematic self-reflexivity was sometimes celebrated as opening a space where structuring ideologies gave way to an experience of materiality and the dissolution of the subject-positions inscribed and interpellated by the apparatus. No amount of glitchy seamfulness will undo the gendered violence inflicted, mostly upon women, in involuntary synthetic pornography. Not only that, but the pleasure taken by viewers in consuming this violence seems to depend, at least in part, precisely on the failure or incompleteness of the spectacle: what such viewers desire is not to be tricked into actually believing that it is Gal Gadot or their ex-girlfriend that they are seeing on the screen, but precisely that it is a fake likeness or simulation, still open to glitches, upon which the operational aesthetic depends.

Nevertheless, we should not look away from the paradoxical opening signaled by these viewers’ suspension of belief. The fact that they have to “squint a little” to complete the gendered fantasy of domination also means that they have to compromise, at least to a certain degree or for a short duration, their subjective mastery of the visual object, that they have to abdicate their own subjective ownership of their bodies as the bearers of experience. Though it is hard to believe that any trace of conscious awareness of it remains, much less that viewers will be reformed or repent as a result of the experience, it seems reasonable to believe that viewers of DeepFake videos must experience at least an inkling of their own undoing as their de-subjectivized vision interfaces with the ahuman operation of machine vision.

What I am saying, then, and I am trying to be careful about how I say it, is that DeepFake videos open the door, experientially, to a highly problematic but multistable space in which our predictive technologies participate in processes of subjectivation by outpacing us, anticipating us, and intervening materially in the pre-personal realm of the flesh, out of which subjectivized and socially “typified” bodies emerge. It is here that a re-engineering of correlative potentials is made possible, where tactility is captured by the new visuality and the new materiality of machine-learning–enhanced computer-generated images. The late Sartre, writing in the Critique of Dialectical Reason, defined commodities and the built environment in terms of the “practico-inert,” in light of the ways that “worked matter” stored past human praxis but condensed it into inert physical form.13 Around these objects, increasingly standardized through industrial capitalism’s serialized production processes, are arrayed alienated and impotent social collectives of interchangeable, fungible subjects. Compellingly, feminist philosopher Iris Marion Young takes Sartre’s argument as the basis for rethinking gender as a non-essentialist formation, a nascent collectivity, that is imposed on bodies materially—through architecture, clothing, and gender-specific objects that serve to enforce patriarchy and heterosexism. The practico-inert, in other words, participated in the gendered typification of the body, as bodies were “positioned,” “oriented,” and entrained with new “routine practices and habits”14—thus reorganizing the social substrates around which gender are configured and imposed on the body. We could extend the argument to racialization processes as well, and Spillers’s notion of the ways that (un)gendering and racialization are bound up with one another, and with the “tortures and instruments of captivity”—“the calculated work of iron, whips, chains, knives, the canine patrol, the bullet”—suggests just this extension.15

However, if it was difficult to perceive these embodied and social standardization processes in an industrial-cinematic lifeworld, then it is all the more difficult in our post-cinematic one. For the worked matter at issue now is a microscopically worked matter, operating microtemporally and predictively, well in advance of subjective regard or resistance; the standardization and typification processes I just mentioned are more fine-grained, more “personalized” or targeted than was previously possible. Moreover, the neural nets at the heart of DeepFakes’ production are black-boxed entities that are neither directly programmable nor transparent to retrospective analysis. Operating without direct human control or insight, they have been trained on large data sets to produce outputs that statistically resemble their inputs, for example reproducing stylistic traits or “typical” bodily motions. As Hannes Bajohr writes, “repetition is in the very nature of neural nets”16; and it is by way of this repetition that DeepFakes discipline and typify bodies—both those on screen and those in front of the monitor.

That DeepFakes nevertheless provide a glimpse, however fleeting, of these processes is thus no small feat; it points us to an important margin of multistability, where the new visuality—i.e. the new generalization of all imagery, even all reality, as a kind of CGI—might be felt as the powerful force that it is. That is, the flattening of subjectivity, the suspension of belief and depersonalization of vision in DeepFake videos, provides limited aesthetic access to the contemporary “ungendering” of the flesh that marks a preliminary step in the computational intensification of racialized and gendered subjectivation. Clearly, this is a truly insidious aesthetics of the flesh, and one that must be combatted vehemently. However, it suggests the possibility that alternative aesthetic options might exist or be forged, that it might still be possible to seize the multistable margin, to reverse engineer the algorithms of statistical correlation and control, and to appropriate post-cinematic media in order to recode our fleshly mediality for a less awful world.17

Endnotes

  1. Alexander R. Galloway, Uncomputable: Play and Politics in the Long Digital Age (New York: Verso, 2021).
  2. Maurice Merleau-Ponty, Phenomenology of Perception, trans. Colin Smith (New York: Routledge, 2002), 92.
  3. Hortense Spillers, “Mama’s Baby, Papa’s Maybe: An American Grammar Book,” in Black, White, and in Color: Essays on American Literature and Culture (Chicago: University of Chicago Press, 2003), 203-229; 207.
  4. Shane Denson, Discorrelated Images (Durham: Duke University Press, 2020).
  5. Edmund Husserl, Ideas: General Introduction to Pure Phenomenology, trans. W. R. Boyce Gibson (New York: Routledge, 2012), 192.
  6. Samantha Cole, “AI-Assisted Fake Porn Is Here and We’re All Fucked,” Motherboard, December 11, 2017.
  7. I have written about the phenomenological paradoxes inherent in CGI lens flares in Discorrelated Images, especially 27-30.
  8. Neil Harris, Humbug: The Art of P. T. Barnum (Chicago: University of Chicago Press,1973), 59–89.
  9. Tom Gunning, “The Cinema of Attraction: Early Film, Its Spectator and the Avant-Garde,” Wide Angle vol. 8, nos. 3-4 (1986): 63–70.
  10. As recounted, in 1896, by Maxim Gorky. See Maxim Gorky, “The Kingdom of Shadows,” in Jay Leyda, Kino: A History of the Russian and Soviet Film (London: Allen and Unwin, 1960): 407-409. See also Jordan Schonig’s video essay, “The ‘Wind in the Trees’ from Early Cinema to Pixar,” Vimeo, September 15, 2020.
  11. This is a central argument in Discorrelated Images. See also Mark B. N. Hansen, “Algorithmic Sensibility: Reflections on the Post-perceptual Image,” in Shane Denson and Julia Leyda, eds., Post-Cinema: Thoerizing 21st-Century Film (Falmer: REFRAME Books, 2016), 785–816.
  12. I introduced the concept of “seamfulness” in Chapter 4 of Discorrelated Images.
  13. Jean-Paul Sartre, Critique of Dialectical Reason, vol. 1, Theory of Practical Ensembles, new ed., trans. Alan Sheridan-Smith (London: Verso, 2004).
  14. Iris Marion Young, “Gender as Seriality: Thinking about Women as a Social Collective,” Signs: Journal of Women in Culture and Society vol. 19, no. 3 (1994): 713-738; 730, 724.
  15. Spillers, “Mama’s Baby,” 206, 207.
  16. Hannes Bajohr, “Algorithmic Empathy: Toward a Critique of Aesthetic AI,” Configurations, no. 30 (2022): 203-231, 219.
  17. Compare Galloway on recoding the black box. Galloway, Uncomputable, 215-245.

About The Author

Shane Denson is Associate Professor of Film and Media Studies and, by Courtesy, of German Studies and of Communication at Stanford University. He also serves as Director of the PhD Program in Modern Thought & Literature. His research interests span a variety of media and historical periods, including phenomenological and media-philosophical approaches to film, digital media, and serialized popular forms. He is the author of Discorrelated Images (Duke University Press, 2020) and Postnaturalism: Frankenstein, Film, and the Anthropotechnical Interface (Transcript- Verlag, 2014) and co-editor of several collections: Transnational Perspectives on Graphic Narratives (Bloomsbury, 2013), Digital Seriality (special issue of Eludamos: Journal for Computer Game Culture, 2014), and Post-Cinema: Theorizing 21st-Century Film (REFRAME Books, 2016). See shanedenson.com for more information.

Related Posts