The Detective’s Gaze

I’ve inaugurated a new video series, on detective games. The inaugural video is an extended version of this old conference presentation, buffed up with new examples and more extensive sources. The second video will be arriving shortly—I knew I’d be super busy as soon as all three of my current jobs kicked in, so I planned ahead and worked on two videos simultaneously during the summer months, both of which I’m hoping to get out the door in September.

Script below the jump.

Before I get to video games, I want to take a little tour through prior media.

Let’s say you’re writing a detective novel. You are describing an important clue. Let’s say it’s a shard of glass from a perfume bottle. You have to be careful while you’re doing it. You don’t want to give too much away. As mystery writer Marie Rodell writes in her how-to manual Mystery Fiction: Theory and Technique, you should strive to describe it “as the detective sees and feels it; its color, shape, condition of wear and, when possible, the way in which it has become a fragment.” You should give an accurate and detailed description of what the detective observes, while giving your reader some room to maneuver, to draw their own conclusions rather than having a particular interpretation foisted upon them. This requires a degree of precision that the English language isn’t always well-equipped for, and that must be carefully learned and applied. “A whiff of perfume must be characterized if it is to mean anything to the reader, who cannot smell it from the page: it must be pungent, or sickeningly sweet; it may be rose or jasmine or carnation; or, more subtly, it may evoke certain reactions in the detective, which are always the same whenever he smells that particular perfume.”

Rodell is consistent here: clues should be described clearly and precisely, with the reader being a chance to observe everything the detective observes. You’ll find rules similar to this whenever someone tries to define the concept of “fair play” in detective fiction. S. S. Van Dine insists that “All clues must be plainly stated and described.” Ronald A. Knox forbids the detective nothing anything that isn’t “instantly produced for inspection by the reader.”

These rules seem straightforward, but in practice things get a bit twisty. Sometimes a detective has a wildly different experience of a clue from what a layperson would. Take a passage from Arthur Conan Doyle’s Sherlock Holmes series, from the story “A Case of Identity”:

‘Quite an interesting study, that maiden,’ [Sherlock] observed. ‘I found her more interesting than her little problem, which, by the way, is rather a trite one. …

‘You appeared to read a good deal upon her which was quite invisible to me,’ I remarked.

‘Not invisible but unnoticed, Watson. You did not know where to look, and so you missed all that was important. I can never bring you to realize the importance of sleeves, the suggestiveness of thumb-nails, or the great issues that may hang from a boot-lace.’

Sherlock then goes on at length about the things he understood just by inspecting the woman’s hands and sleeves. In a passage like this, we’re supposed to recognize Watson as our perceptual equal, and be in awe of Sherlock’s observational skills. Sir Conan Doyle walks a tightrope in the Sherlock stories: Sherlock’s powers of observation are important for the storytelling, because they provide necessary insight to certain elements of the crime. But it also means that Conan Doyle can’t narrate the stories entirely from the perspective of Watson: Sherlock must interject at regular intervals, provide the reader with the insights gathered from his peculiar methods of observation. It’s not enough to stick to describing the “color, shape, and condition of wear” of objects, as Rodell would have it. We also need flashes of Sherlock’s interiority, privileged access to his uncanny powers of perception, association, and recall.

As we move from literature to cinema, some of the difficulties of detective fiction melt away entirely, just by switching media. Filmmakers don’t have to worry about describing objects carefully and precisely, because we can see them for ourselves. Filmmakers established a visual vocabulary relatively quickly, one that could strike a balance between objectivity and interiority. A detective’s attention to a particular detail could be simulated by means of a close-up. This draws the viewer’s attention in to what the detective is observing, but then once we see it we just … see it, presented as a straightforward photographic image.

Over the course of decades, new approaches to interiority in detective films emerged. In the 1940s, Hollywood’s “film noir” adaptations of the hardboiled detective novels of Raymond Chandler, Dashiell Hammett, and Mickey Spillane often featured voice-over narration, with detectives filling the movie’s soundtrack with delicious purple prose. The function of this narration was usually to provide a window into these detectives’ cynical worldviews, as a way of establishing mood and tone. But every now and then they’d explain the significance of a clue, as well, giving viewers insight they would otherwise lack.

In the 1960s you start to see bolder uses of cinematic visuals to communicate subjective understandings of clues, particularly on the part of international directors whose work straddled the line between genre cinema and art cinema. In Kurosawa Akira’s High and Low, law enforcement personnel hide special heat-released dyes in the fabric of an attache case being handed off to a kidnapper. When the kidnapper incinerates the case after collecting the money, a plume of colored smoke announces his location to law enforcement, and the film—which has been entirely black and white until now—announces the importance of this clue by introducing a dash of color. Suzuki Seijun uses a similar technique in Youth of the Beast, calling attention to the narrative importance of flowers in this crime saga by selectively coloring them.

By the 1990s, visual depictions of detectives’ interiority were growing even more experimental. Two successful adaptations of Thomas Harris novels—1986’s Manhunter, and 1991’s Silence of the Lambs—launched the figure of the FBI profiler into the public imagination. Traditional detective narratives had long had norms in place against detectives having “unaccountable intuition” or “supernatural abilities,” but “criminal profiling” was such a slippery concept that it effectively wrote a blank check, opening the gates to detective figures who stopped just short of being psychic.

In Twin Peaks, agent Dale Cooper gained inspiration on the Laura Palmer case by consulting his dreams.

Twin Peaks wasn’t long for network TV screens, but a few years later Frank Black and Sam Waters made their debuts on FOX and NBC, respectively. Both profilers solved weekly crimes with the help of flashily-edited visions … visions that proved especially valuable when examining crime scenes, or bodies. Other characters are often stunned by these profilers’ perceptive abilities. Sometimes they question their deductive abilities head-on. When the question arises, the shows offer hand-wavy explanations, paying lip service to “instincts” honed by years of experience. Supernatural explanations are assiduously avoided. It’s undeniable, though, that this type of detection-by-montage provides a shortcut, relieving screenwriters from doing the hard work of actually explaining which clues lead detectives to a given conclusion, and how. These characters’ crime-solving techniques are just vaguely-defined enough for the shows to go utterly wild.

The heyday of the profiler procedural ended with the 90’s. But you still see more recent shows playing around with this visual idiom, such as Hannibal, which bring a prestige-television budget to its hallucinatory imagery. Twin Peaks, Millennium, Profiler, and Hannibal all offer up stylistically adventurous visual depictions of psychic interiority during the process of detective work that you just didn’t get in, say, the 1940s-era Sherlock Holmes adaptations.

But, eventually, television would get around to depicting Sherlock Holmes’ interiority, as well. Oh, boy, did it ever.

Steven Moffat’s BBC Sherlock series debuted in 2010. And its visual representations of things like Sherlock’s observational prowess, his mind palace, the uncanny memory of blackmailer Charles Magnussen, as well as, um, what it’s like to be a dog, represent the culmination of a visual trend that I like to call “GUIness.” In GUIness, insight is visualized in a text-heavy manner, similar to cinematic depictions of computerized augmented reality: all clean, and white, and san-serif.

The GUIness of Sherlock is emblematic of a sharp aesthetic break in depictions of expert cognition and perception between the 1990s and the 2000s. As production pipelines became entirely digital, and compositing software such as After Effects and Shake found a regular place in editors’ workflows, the “grubby 16mm experimental film” aesthetic of the expressionistic visions of Millennium and Profiler faded out of favor to make room for digital compositing’s new affordances.

Sometimes this type of compositing was used to depict futuristic human-computer interfaces. In 2002, Steven Spielberg’s Minority Report imagined a future in which crimefighting is fundamentally changed not just by precognitive psychics, but by advanced AR tech. The image of Tom Cruise using gestures to rewind, fast-forward, and re-arrange video clips projected onto a glass surface has had a lasting effect on how futuristic GUIs are imagined in cinema and television. But this connection between gesture, insight, and visual augmentation wasn’t limited to literal depictions of futuristic interfaces. Ron Howard’s A Beautiful Mind, from the year before, uses similar compositing in a purely metaphorical way, as a way of illustrating mathematician John Nash’s advanced pattern-recognition abilities, initially a boon to him and later a hindrance when they feed into his growing schizophrenia.

GUIness is an easy style parody. In fact, it was parodied quite concisely in 2009’s The Hangover, released the year before Sherlock debuted on BBC. But Sherlock used the style earnestly, as a 21st century metaphoric rendering of Sherlock’s unique detecting expertise. Subsequent productions have used the style earnestly, as well. The Good Doctor, which premiered in 2017, frequently uses the style as a sort of “autistic-vision,” offering a visual metaphor for autistic protagonist Shaun Murphy’s powers of close attention and encyclopedic recall.

I find Sherlock to be kitschy, and I find The Good Doctor to be downright problematic. But I also begrudgingly credit them with realizing the need for a distinct visual language for depicting the detective’s gaze. Since its inception, cinema has allowed us to see a crime scene from the position of a detective. But it’s harder for us to see it how the detective sees it, shot through with all of their relevant expertise. How do you visualize encyclopedic knowledge? How do you visualize a hunch? What do these things look like? The first several decades of depictions of detective work onscreen punted this question. Sherlock, for all its faults, at least tries its hand at depicting Holmes’ unique understanding of where to look and what to notice. I can’t completely discount that ambition.

And … it’s highly relevant to the visual language of videogames’ depiction of detective work.

Remember everything I said about the benefits of cinematic depictions of detective work: they way many of the difficulties of detective literature melted away, because we could see the clues ourselves, saving authors from the pitfalls of written descriptions? Well, early on, video games had none of that advantage. The first detective games to emerge were text-parser adventure games. And even those that had graphical components—such as Adventure International’s Apple II game The Curse of Crowely Manor, released in 1981—were hamstrung by the lack of visual detail afforded by the technology of the time.

Here’s a hallway, as rendered in The Curse of Crowley Manor. There’s really nothing to it. It’s just some walls, a floor, and a ceiling. In order to properly inspect it, we need to hit the return key, which brings us to the text screen. By typing the command “look hall,” we’re given a description and … what? A small statue is here? Where?

Well, it’s not visually represented in the picture. That would be too complicated. Instead it gets added to this list of ironically-named “visible items” in the game’s text UI. “Visible items” are things the game is calling our attention to, assigning importance to, highlighting for potential interaction—whether or not they’re actually visibly represented in the game’s illustrations.

So we’re stuck with this distinct dual access to the game’s world. There’s the visual illustrations of the world, which are kind of superfluous: theoretically, you could play the game without them. Then there’s the detail offered by the text screen and its list of “visible items.” This text screen is how we actually interact with the game’s world, so this is where all the actual important description of its contents lies. See, now I’m learning all sorts of things about this statue, that has never once been visually represented.

There’s a split between functions here. The pictures are pretty (or, at least, “pretty” by the standards of the Apple II). But the text-based UI is functional. And its specific function is that of focalization: it directs our attention to the actual pertinent details of the environment. When we slip into the text parser, use the “look” command, and check out the visible items, we’re entering into sort of a “detective’s gaze.”

Adventure games got better at visually representing details as they moved out of the text parser era and into the point-and-click era. But this sort of dual access, in which a layer of text served as a necessary compliment to visual representations, continued well into the point-and-click era. A good example is the “what is” verb function of LucasArts games such as Maniac Mansion. Enable it, and your cursor is transformed into an identification tool, allowing you to hover the mouse over various objects populating the scene and get a textual description of them, before you interact with them. It’s a weirdly clairvoyant cursor, too: there are several times in the game where you’re stuck in a dark room, and you can use the “what is” function to find the light switch. But while you have the “what is” function activated, you can identify every single object in the room, despite it being pitch-black.

As games moved into the 3D era, this adventure-game tendency of labeling morphed into the modern practice of using vision modes. I’m using the term “vision mode” here to refer to any alternate rendering of the game’s 3D environment, in which standard textures and lighting are abandoned, or at least augmented, with a new rendering of the scene that highlights interactive objects, makes hidden details, and generally clues the player in to pertinent environmental details. An early example of this is Samus Aran’s “scan visor,” introduced in Metroid Prime. When switching the view over to the scan visor, the environment becomes populated by hovering icons. Icons offer up different results when scanned: sometimes her suit computer translates an alien language. Sometimes it adds a journal entry to her collection. Sometimes it provides a hint for getting past a boss, and adds the info to her bestiary. Sometimes it provides hints for environmental puzzles. In general, it bolsters the game’s environmental storytelling, filling out details that might not be immediately apparent just from looking at the game’s scenery, and filling the game with lore and intrigue that its simple loop of exploration, combat, and item-collection otherwise wouldn’t be able to support. By the game’s conclusion, Samus has collected two addition vision modes, each of which renders the game’s environments in yet another way, all of which must be switched between during the game’s final boss battle.

Seven years after the first Metroid Prime game, Batman: Arkham Asylum gave us “detective mode,” still perhaps the most paradigmatic and broadly influential vision mode. Detective mode combined multiple constituent uses of Samus’ visor into one streamlined mode, highly focused on delivering useful visual information to the player and stripping everything else out. It highlights environmental objects that Batman can use, such as vents and gargoyles, while stripping away all other environmental detail, to the point where you can actually see through walls to see enemies, rendered in skeletal form.

As the name implies, “detective mode” also aids Batman in detective work, giving him access to a database of fingerprints and DNA, which he can pull up simply by looking at any piece of evidence and scanning it. As the series progressed, detective mode incorporate the resources of the bat computer in more and more elaborate ways, including staging three-dimensional reconstructions of crimes that could be fast-forwarded and rewound as needed, to help trace the trajectory and pinpoint the current location of things like shrapnel and shell casings.

“Detective mode” solves a basic problem of detective games rendered in 3D space: sometimes it’s tough to get players to locate clues, so developers call our attention to them in whatever way they can, to keep things moving at the desired pace. Even though game technology has come a long way from Curse of Crowley Manor, there’s still a need to focalize player attention, to pull them toward a condensed array of relevant items in the environment. We find a very similar approach to modeling detective work in Heavy Rain, which equips FBI agent Norman Jayden with a futuristic set of AR-enabled glasses. As in the Batman: Arkham games, the act of crime scene investigation in Heavy Rain is abstracted into simply surveying the area for the highlighted objects.

One presumably unintended effect of streamlining crime scene investigation in this way is that it essentially de-skills detectives. The real, hard work of forensic examination and study basically evaporates for the sake of gameplay expediency. Judged by the standards set forward by the Batman: Arkham games and Heavy Rain, something like Condemned: Criminal Origins is relatively slow-paced: FBI agent Ethan Thomas takes out specific forensic tools when prompted, players search the environment for a specific bit of evidence (say, a print, or a bloodstain), photograph the evidence, electronically send it over to Ethan’s lab technician Rosa, and then, after a few seconds, Rosa calls Ethan back with some forensic analysis. This is still an unrealistically expedited depiction of forensic work. But at least it acknowledges this labor as labor. Other games tend to gloss this over. Similarly to Condemned, the 2017 game Get Even makes prominent use of player-character Cole Black’s phone, but he isn’t even calling anybody. His phone is just artificially intelligent, with the ability to analyze DNA traces on objects, simply be being pointed in their direction. Hilariously, this process is simply referred to as “detection,” as if to emphasize that Black is dumb as a box of rocks, possessing no discernable skills, and that all actual detective work in the game is being done by his AI assistant. Siri, whose fingerprints are theses? (“No matches in biometric database.”) Sheesh, useless as usual.

And that’s the thing: by whipping up a technological equivalent of the “detective’s gaze,” these games ultimately portray their protagonists as not possessing any unique or noteworthy skills. Rather than Batman being the “world’s greatest detective,” Bruce Wayne has just spent more money than anyone else on AR tech and access to biometric databases that do his work for him.

The fictional explanation for these technologies ad hoc and to some degree superfluous, so game writers are free to be as hand-wavy as they like. The vision modes in these games are all sufficiently advanced that they are, for lack of a better word, magic. (In fact, some vision modes are quite literally depicted as magic, such as the alchemical eyepiece in the puzzle game The Room.) These technologies serve a mechanical purpose, over and above anything else—and this mechanical imperative overrides any attempt at establishing these games’ characters as uniquely perceptive.

This isn’t always the case. Sometimes, visions modes are explained not on the basis of technology, but on the basis of the player-character’s heightened senses.

So, for instance, Joel in The Last of Us has sensitive hearing, which can be “focused,” resulting in a vision mode that pinpoints the position of enemies through walls. This never gets used for investigation, as those sorts of mechanics aren’t used n The Last of Us, but it’s nevertheless an example of a vision mode motivated by sensory expertise.

Perception utilizes a similar technique to visually illustrate the sensitive hearing of its blind protagonist, Cassie. Since we never play as a sighted character in Perception, the entire game is visually rendered in this style, with sound being substituted for light—effectively, the entire game is presented in a “vision mode,” presented as Cassie’s dominant way of experiencing the world, rather than an “alternate” option.

In The Legend of Zelda: Twilight Princess, Link occasionally transforms into a wolf. And, when he’s in his wolf form, he’s able to “hone his senses” to pick up on things he can’t see as a human. This includes the ability to pick up on scent trails, which are represented visually while in this mode. These scent trails are put to good use in places like the Arbiter’s Grounds dungeon—for instance, there’s one moment where you have to light braziers in the right order, and the way you figure out that order is by visually following the path of someone who came before you.

Then there’s the area of Snowpeak, where you’re searching for an elusive Yeti. Another character warns you is impossible navigate—and, indeed it is, while you’re in human form. You face whiteout conditions, and if you stray too far from a completely invisible path, the screen will eventually fade to white and you’ll be placed back at the beginning. The only way to actually through this area is to discover that the Yeti likes to eat a certain kind of fish. If you figure out what kind of fish it is, you can fish for it, and then catch its smell while in wolf form. Once you’ve done that, what as a human looked like an undifferentiated snowy wasteland now has a clear scent trail running through it.

I remember really liking this section when I first played through Twilight Princess—I thought it was clever the way the game set up a space that was confusing, meaningless, and unnavigable, and then gave you access to a previously-invisible layer of meaning and guidance once you completed a quest line. Returning to it a decade later when the HD edition came out, I was less impressed, because in the intervening years I had followed many different trails, with the help of many different vision modes.

Now, there are good ways to use vision modes in games. I’m quite fond of the investigation sequences in Batman: Arkham Origins, which I think strike a nice balance between directing players’ attentions, and actually giving them something worthwhile and interesting to do. There are also bad ways to use vision modes. A vision mode can strip all challenge out of an investigation, turn it into rote busywork. A vision mode can also sap all visual variety out of a game: you have to be careful to not make it too useful, or players will never turn it off. At their best, I think vision modes have the capability of providing nuanced characterization of player-characters, by showing some unique aspect of how they exist in and relate to their world. I’m going to devote the rest of this video to an extended case study of a game I think does a decent job of this, whatever its faults.

Sherlock Holmes: Crimes and Punishments was released in 2014, an entry in Frogwares’ long-running series of licensed games based on the Conan Doyle property. I haven’t played all of them—I know the early ones, especially, have a reputation for being both comically and creepily shonky—and I have to admit that picking out Crimes and Punishment as my particular example is somewhat arbitrary. But, here we are.

Crimes and Punishment has a vision mode, but it’s not explained by means of technology, or super-sense shenanigans. Instead, it’s characterized simply as “concentrating Sherlock’s attention” to “find details that others are inclined to overlook.” The game hews very closely to Conan Doyle’s characterization of Sherlock, right down to its explanation of its vision mode. Sherlock doesn’t rely on any special technologies, and his abilities aren’t supernatural. He’s just a man who knows where to look. He knows where to look so well that his behaviors might seem strange to us, and his conclusions fantastical. But beneath it all, he’s just an unusually observant fellow. Functionally, this vision mode is nearly identical to something like Batman’s detective vision. But in framing it in non-technological terms, Frogwares does a real service to the character, adapting his abilities in gameplay rather than explaining them away.

I also can’t help but notice an evolutionary convergence at work here. We’ve come a long way from the UI of The Curse of Crowley Manor, and at this point highly-detailed images can spatially coexist with focalizing text descriptions. The end result doesn’t look entirely dissimilar from BBC’s Sherlock. In both adaptations, hovering text has a central role in visually representing Sherlock’s unique expert gaze, intervening in the visuals to as to give us a better sense of how Sherlock might see and understand the world, filling the gap between our experience and the great detective’s.

This technique is especially pronounced in the sections of the game where players are tasked with visually examining persons of interest in typical Holmesian style. These sections aren’t presented in the same grayscale-with-yellow-highlights vision mode used in crime scenes, but instead ask us to pixel-hunt as we move the camera across highly-detailed character models. Click while the cursor is hovering over the right place—by recognizing the “importance of sleeves” or the “suggestiveness of thumb-nails”—and the game will cycle through Holmesian deductions, establishing a bit of info about a character’s profession, or past. This info can then be used as Sherlock objects to statements offered, where we, as players, are asked to match the correct counter-evidence. The reward for doing so are miniature exchanges worthy of Conan Doyle.

Well, sometimes they are, anyway. The game can’t really maintain Conan Doyle’s batting average, and so sometimes these sections utterly fail to give us the sense that we’re inhabiting a uniquely perceptive character. Oh, this dude has an “arrogant look” and “disdainful mouth”? Yeah, no shit, Sherlock. I mean, literally. Sherlock … no shit.

So, yeah, the game isn’t perfect. But I think it’s worth studying as an adaptation, one that is quite sincere in its attempts to translate Sherlock’s expert detective’s gaze into workable gameplay mechanics. Much like in the Holmes short stories and novels, we’re given access to all of the clues necessary to deduce the answer to these crimes ourselves. It’s just that sometimes we need things made a bit more explicit for us: we need occasional access to those flashes of Holmesian insight, that perceptual prowess that makes Sherlock Sherlock—and leaves us feeling more like Watson. In order for these stories to “play fair,” we occasionally need to get inside Sherlock’s head.

And the game literalizes this metaphor to absurd effect! We’re not just given access to Sherlock’s interiority; we we dive into the very neurons of his great big brain. The UI for drawing connections between clues, and making deductions based upon those connections, is a simulation of what it’s literally like to be inside Sherlock’s head, as his synapses spark away. I actually rather like the way the the game builds a mechanic around conflicting interpretations of clues, but the packaging of this UI in terms of neurons is goofy. Delightfully so, but still.

So that’s Sherlock Holmes: Crimes and Punishment. It’s not a perfect game, by any means. It’s got more than its share of jank. But it uses things like vision modes and text augmentations to give us access to the detective’s gaze, in a way that isn’t thoroughly demystified by relying on technology as a crutch.

Don’t worry—we’ll get to some games in this series that are genuinely good, and not just “interesting, with caveats.” So stay tuned. Thanks for watching.

Intermittent Mechanism

Writing and Teaching of Ian Bryce Jones

The Detective’s Gaze

Share this:

Related