The murmur of the snarkmatrix…

August § The Common Test / 2016-02-16 21:04:46
Robin § Unforgotten / 2016-01-08 21:19:16
MsFitNZ § Towards A Theory of Secondary Literacy / 2015-11-03 21:23:21
Jon Schultz § Bless the toolmakers / 2015-05-04 18:39:56
Jon Schultz § Bless the toolmakers / 2015-05-04 16:32:50
Matt § A leaky rocketship / 2014-11-05 01:49:12
Greg Linch § A leaky rocketship / 2014-11-04 18:05:52
Robin § A leaky rocketship / 2014-11-04 05:11:02
P. Renaud § A leaky rocketship / 2014-11-04 04:13:09
Jay H § Matching cuts / 2014-10-02 02:41:13

The State of the Speakularity

Matt coined (or at least first wrote about) “the Speakularity” in 2010: “the moment when automatic speech transcription becomes fast, free and decent.”

Five years and change later, we’re still not exactly there! But we are closer. Like the horizon, the Singularity, or the coming of the Messiah, the Speakularity is always ever-so-slightly in the distance.

I recently reevaluated my rig for transcribing recorded audio and thoroughly reworked it. I feel much happier about this than any of my previous setups, which leaned a little too heavily on procrastination and weeping.

Also, I recently read Friend of the Snark Charlie Loyd’s entry on “The Setup” about the tools he uses, and feel correspondingly moved to actually tell people how I do things in the hope that they might add, improve, adopt, critique, be entertained, or otherwise benefit from it. You know, like how the internet used to be!

This setup requires a few pieces of software. Some of them I even paid American money for.

CallRecorder for Skype. Skype is… less than perfect. But it’s common, and you can do app-to-app calls or call an outside phone number. Most of what I do these days is interview sources and contacts on the phone. If you have a landline from which you can easily record incoming audio… do that. The rest of us sinners, we have to do this.

There are a bunch of call recording programs for Skype. There are also ways to rig Skype and your sound card to dump audio into a file. I’ve used Soundflower before. But I like Call Recorder for a few reasons:

  1. I already bought it;
  2. I can set it to record Skype calls automatically;
  3. It can easily split the recorded audio into two files, one for each side of the conversation.

This last part turns out to be important. It gives you a pristine audio file with no trace of your own voice. You don’t have to listen to your own stupid self! Totally worth the price of admission. Or I don’t know, rig Soundflower to do the same thing. I can’t figure it out, but you probably could.

Ok, now I do a rough pass of this separated audio in a voice transcription app. I use an older version of Dragon Dictate. Again, I use this partly because it (kinda) works, but mostly because I have it. It’s like eating what’s in your fridge before you go back out shopping. You can also use YouTube, especially if you don’t care that Google might have a copy of your audio.

You can also use IBM Watson’s speech-to-text API for two cents per minute. This has some advantages in that it’s relatively easy to script. I’ve just started messing with Watson by way of Dan Nguyen’s video transcription project on GitHub. Sometimes Watson works for me and sometimes it times out, which might be a function of my often-iffy Wi-Fi more than anything else. So usually for a first pass I try Dragon instead.

All I want for this quick-and-dirty transcription is a basic idea of what was said. Plus, it’s good to get an auto-transcription of the audio file before you start messing with it, which we’re about to do.

The next piece of software I use is an app called AudioSlicer. AudioSlicer is free but comes with some limitations, like being Mac-only and only working on MP3 files. So I may try another app like WavePad Audio Splitter. Maybe you have a favorite you’d like to share.

The important thing you’re looking for with this app is that it 1) detects silences in an audio file and 2) elegantly splits that file into multiple files, wherever silence is detected.

This, in conjunction with splitting your Skype recordings into a you-side and a them-side, is magic. Not only do you not have to listen to yourself talk, but those places where you did talk? They become punctuation for the other person’s audio. You can get audio files broken up into natural units of conversation. This, unsurprisingly, makes for audio files that make good quotes, and are a natural length for you to edit and transcribe in one go.

Now we’re on to the last app: ExpressScribe. This company also makes WavePad Audio Splitter, which makes me think they might work well together. Anyways, this is a genius little free app. It lets you load and save audio files, has a text editor right there, and adjusts speed without changing pitch. Again, it’s far from perfect, but it solves a lot of problems for you.

So you take all those split audio files from AudioSlicer or WavePad or wherever. Sometimes I sort them by size and weed out the smallest ones, which are usually just somebody saying “yup” or “uh-huh,” “ok,” etc. Then you load them into ExpressScribe. I’ve got my quick-and-dirty transcription of the entire interview, which helps guide me for the quotes I’m looking for. When I find those audio files, I run them through the transcriber again by their lonesome. (If I’m using Watson, I probably bulk upload here; Dragon, you have to do them one at a time). I pick whichever of the two transcription (pre-cut or post-cut) is more accurate, or maybe take pieces of both of them. Then using ExpressScribe, I do a fine-grained edit of the transcribed the text, checking it against the audio.

When I’m done with the transcription (either piece-by-piece or the whole thing), I put the transcribed files into my notes (which I keep in Scrivener). Now I’ve got a bunch of separate quotes that I can deploy anywhere I need them. I’ve got the audio that goes with each note, if I have to finesse it. And I have a transcription of the entire talk, for context.

If I need to, I transcribe my side of the conversation — but most of the time… this is actually unhelpful. I mean, sometimes I say something really smart on a phone call or I stupidly phrase a question in a way that you need it in order to make the answer make sense. But most of the time, even if I say something smart, it’s to try to goad the other person into saying something smarter. The more I can get out of my own way, the better.

So right now, February 2016, that’s how I’m transcribing my phone calls. I’m sure I will relentlessly fine-tune this process, especially when doing so means that I might be able to avoid actually writing or especially, actually hand-transcribing audio.

What do you use?


Merry Christmas

From M. John Harrison:

A Happy Christmas to everyone else, the best possible Christmas to the fucked up and the nearly done, all the deadbeats and ne’er-do-wells, the metaphysicians, atheists and losers, all the so-called scroungers, all those not in receipt of a Royal pardon, all the thoughtful, intelligent and above all decent people who believe there is such a thing as a society, the readers and the writers, students and philosophers, and — especially — a big shout out to the 32,000 people in the UK who didn’t receive their benefits on Christmas Eve due to “administrative error”.

Over here, we have Calexico’s Green Grows the Holly on repeat. Merry Christmas!


Hide the switch and shut the light


Combing the number line

There’s been a lot of news in the world of primes this year; a breakthrough paper-out-of-nowhere from Yitang Zhang on the distribution of twin primes (like 3 and 5, or 9929 and 9931) kicked off a season of super-productive work by mathematicians all across the world. I won’t attempt to summarize that work here, because I don’t understand it well enough to explain, and because Erica Klarreich has already done it with great vigor and clarity.

Her piece is actually about (at least) two pretty fascinating things:

  • these recent advances in the mathematics of primes, and
  • the contrast between the lone genius model and a more collaborative approach — both of which have proven effective here.

On the collaborative front, doesn’t this sound fun?

For the mathematicians working on this step [of the complicated collaborative process], the ground kept shifting underfoot. Their task changed every time the mathematicians working on the other two steps managed to reduce the number of teeth the comb would require. “The rules of the game were changing on a day-to-day basis,” Sutherland said. “While I was sleeping, people in Europe would post new bounds. Sometimes, I would run downstairs at 2 a.m. with an idea to post.”

More fun that tearing your hair out in your grim shadowed math-cave, for sure.

Finally, it’s worth reading this piece just to learn what the phrase “de facto admissible-comb czar” means.

Link via Trivium, reliably math-y and fascinating.


An image for winter

The fact that this is snow — I mean, really? REALLY? — seems important to me. Like, if you sat beneath a fig tree and meditated on that fact for a few years, you’d probably somehow understand everything.


It’s winter here. There are no mathematically-perfect ice crystals; we don’t get those in Berkeley. But boy is it cold.

One comment

“Liking” poems

It’s taken me some effort to learn how to appreciate poetry. I can make broad statements about liking books and music without having to like all books or all music, but with poetry–for whatever reason–it’s been more difficult.

However, as someone who would also say he likes math, opening a book of poems and seeing this has immediate appeal:


This is from R. D. Liang’s Knots, a book of poems about “the patterns of human bondage.” I like this. It gives me that “this looks crazy and I want to understand it” feeling.

That feeling is everywhere in math, though it has nothing to do with liking numbers or concepts: It’s a love for the notation itself, the joy of getting to move towards increasingly strange symbology as you understand more.

And reading Knots feels much the same — up to a point. My fascination thus far is wholly with its notation (e.g. the cryptic use of brackets, the strings of random numbers) and the structures in the book itself (e.g. its syntax — the “knots”). Try this one, for example:

Jack sees that
    Jill does not know
    Jack does not know what
    Jill thinks
    Jack knows.
But Jack can’t see
    why Jill does not know
    that Jack does not know
    what Jill thinks
    he knows.

I’ve been reading these poems for the last few nights and I’ve still yet to get much closer to appreciating the subtleties of what Liang has to say about human psychology. (To me, the poem fragment above is not so much about “knowing what others know” as it is about learning how to parse the sentence to read it.) In some sense, I’m hung up solely on the way it’s presented rather than on what it means.

So I’m curious: When is it okay to not want to understand? (Or is it always okay?) Is “understanding” a poem something different from understanding other things?