Industry News

In an Age of High-Definition Digital Audio, Why Do We Still Use Human Stenographers?

Ah, the dreaded questions stenographers and court reporters have had to endure hearing over and over again: “Why are we still hiring stenographers when we could just be recording the proceedings and typing them up later?” and “Isn’t your job going to be obsolete soon?”

For full disclosure, I am now a software engineer. But I’ve worked as a professional freelance stenographer for six years prior to my transition. Although my main source of income no longer comes from providing live-captioning services, stenography is and always will be a passion of mine. It is such an intelligently and ergonomically designed input system that, whether it be code or prose that I’m writing, it makes my experience on the computer feel fluent and effortless—unhindered as compared to having to tap my thoughts out character by character on a normal keyboard. As an engineer, I write code, write PRs, review others’ PRs, and deploy to production all from my steno machine. Thanks to steno, I can send Slack communications at the speed of thought. I have shortcuts in the hundreds ranging from Docker and rake tasks to git commands programmed in so I can perform pulls, pushes, stashes, pops, checkouts, and countless other functions with one swift motion of the hand (TKRA*EUBG = docker-compose run — rm app rake repo:db:init).

I’m seldom without a steno machine if I have my computer on me. And, yes, I have more than one of these keyboards. My portable one, as shown above, is a 3D-printed mod and it uses light-touch mechanical-keyboard switches—perfect for when I’m on the go or working from one of our satellite offices.

With a single chord, KPWRAERBGT (IMPRAERKT), I can import React faster than you can hit “i” on your keyboard. If you aren’t familiar with what stenography (or “steno” for short) is, it’s the system of chorded input performed on a special keyboard communicating with shorthand-to-text software that court reporters and captioners use to quickly write down what is said in a courtroom, on TV, in lectures, or at Coachella (video). Basically, this amounts to a single person being able to catch and write down what a roomful of people are saying, verbatim, in real time. Olympic-level note-taking.

People normally speak English at around a rate of 160–220 words per minute. A skilled stenographer is able to hold a writing speed in excess of 200 words per minute for hours. To become nationally certified, one must pass a series of speed tests on a shorthand machine—the final of which requires one to transcribe a two-voice Q&A dictation at an unrelenting rate of 225 words per minute at >95% accuracy for five full minutes… in one take. The world record for steno speed is 360 words per minute for one minute, set by Mark Kislingbury.

As for how steno actually works, that’s for its own article entirely. What I’m here to discuss in this piece is why this craft is still integral and relevant to the legal and hearing-accessibility realms. It is currently under siege by numerous unscrupulous wielders of magic microphones who profess, “Digital recording is the future!” and fallaciously claim that simply replacing the stenographer with a microphone will cut costs and produce comparable, if not better transcriptions than one taken stenographically by hand. After all, stenographers are humans, too: we make mistakes, we need breaks, we get hungry, and we tire like everyone else. Technology, including a microphone, does not. So much easier and cheaper; right?

Well, this seemingly obvious and viable alternative is problematic for several reasons.

I mean, if you’re fortunate enough to have a smartphone, you have an audio recorder right in your pocket. Go to Best Buy, and you can find recorders with great audio quality for a fairly cheap price. Memory is also now extremely cheap. But go talk to any lawyer who’s conducted a deposition, and they’ll quickly tell you that stenographers’ fees aren’t cheap and can raise the cost of litigation by several figures depending on the length of the transcript and how many copies are sold. Court reporters typically charge a flat appearance fee and then by the page, so if you have a 300-page transcript and all parties are ordering copies, you might be in for sticker shock. So if we have this much-thriftier alternative, doesn’t it just make economic sense to let a recorder do the “work” and commission a bargain-basement transcription service to type it up for you later?

The problem with this proposition is two-fold:

First, people vastly underestimate how common background noise, mumbling, and people talking over one another are in normal discourse and how problematic these are for audio recordings and speech-recognition software.

Second, people who have not trained for years to transcribe verbatim are actually really bad at transcribing verbatim.

Team of six normal typists on the left, one professional stenographer on the right.

Assuming you have normal hearing, human ears and their connection to the brain are exquisitely tuned for speech detection and can pick one voice out from the environment, whether it be from a group of speakers or from downright acoustic slurry. If you’ve ever held a conversation in a loud bar or a restaurant, you’ve experienced this uniquely human capacity in action.

In the moment, you may not notice that you’re talking over one another and not enunciating clearly (because you have the inherent bias of knowing what you said). But to a third party, it may be that nothing that you or anyone uttered during that heated exchange was comprehensible. A live stenographer present will immediately notice if he or she isn’t getting it and will tell the parties to repeat if a response was too quiet or quick to register. If everyone is talking at once, the stenographer will be there to immediately stop the proceeding and request that the parties slow down and go one at a time so that everything said is properly noted and verified that it reads correctly on-screen.

Microphones do not know when they are “not getting it,” and thus, cannot regulate people’s sloppy speech habits. Simply handing off a recording to someone who can hear and type English doesn’t necessarily guarantee they will be able to understand everything. They aren’t able to see who is speaking, so it will be much harder for them to correctly attribute each utterance to its speaker. Moreover, unless they are highly trained, their verbatim-transcription skills may not be up to snuff.

Stenographers also perform painstaking research beforehand and will ensure they are in the know of any tricky terms and acronyms that might come up in advance. Additionally, they will meet each of the speakers, make sure they have the correct spellings of all the names, and will identify them in person during the deliberation. Once the proceeding begins, a live stenographer has three distinct advantages over an audio recording that are crucial to an accurate record:

  • Being able to see who is speaking and match the identity of the speaker with their words.
  • Having preloaded vocabulary and names entered into their transcription software’s lexicon.
  • Above all, knowing the context in which these often industry-specific or esoteric terms are being used at the moment they are uttered.

A live person also has recourse to read a speaker’s lips or their body language or to quickly glance at documents or a PowerPoint for additional cues during rapid-fire moments of uncertainty.

Also, based on years of feedback from conference attendees who have complimented the quality of my and others’ live-captioning work, it has become clear to me that stenographers, in general, can pick out words from mumbled strings of syllables far better than the average person can. This is because our ears are trained to catch every word from years of practicing the art of listening and simultaneously converting it to shorthand—whereas most people don’t realize it when they’ve missed some parts here and there until they see it go up on the screen, and if not for the live transcript, would just let it go.

A remote transcriber who is not physically present at the actual proceeding will lack all of this contextual knowledge, which often results in errors, drops, misspellings, or “(inaudible)s” littered in the final transcription. This is all in addition to the fact that the speed of a typist working with audio is comically slow (roughly one hour of transcription time for every 15 minutes of audio) compared to a well-trained, real-time stenographer trailing only a word or two behind at all times. One cough or paper-shuffle can cover up an entire word or phrase in a recording. I’ve transcribed from pure audio; even with steno, it’s an awful slog.

Junk-ass audio leads to junk-ass transcripts. Live stenographers don’t merely transcribe, we act as verbal moderators to ensure every word is captured. Verbal moderation is a key part of the job that people undervalue. It is both frustrating and infuriating when the Dunning-Krugererers out there continually push for their seemingly simple but inane solutions to a problem, the complexity of which they know nothing of.

So this is why relying on a dumb electronic recording device to document legal proceedings, required by law that they be in writing, that may dictate a person’s future is an ABOMINABLE idea.

People without experience transcribing dialogue verbatim from recording don’t understand how difficult, annoying, and downright painful it is to have to constantly rewind over and over trying to parse words from a blob of noise because someone shifted in their seat, grazed their mic with their arm, or talked out of turn. In the worst case scenario, those words are lost forever.

People are messy in their communication because, as humans, we’re lazy and want to expend the least amount of effort to get the message across. Unless you’re recording in a sound-controlled booth with all parties having the enunciation skills of professional news anchors, a live human being will always have to be there to verbally moderate. Live stenographers ensure not only that every utterance is accurately documented, but also who said it, and when they said it. They are able to certify that the resulting transcript is true and correct according to what they heard and witnessed. Microphones paired with remote transcriptionists simply can’t compete with this level of fidelity. And that is why live stenographers still and will continue to exist.

Article courtesy of: Stanley Sakai