This is just about my favorite kind of internet writing: the deep nerd-out, gracefully written, but without any real expectation of, or capitulation to, a general-interest audience. This one is about the past, present, and future of digital projection in movie theaters and I totally enjoyed it.
(I think my favorite part is the picture of the Digital Cinema Package in its red plastic carrying case, and the accompanying description of the all the crazy DRM surrounding it. You have to get the password over the phone!)
Plenty of things worth writing about Kevin Kelly’s post on “Techno Life Skills.” Kelly’s point of departure is that learning how to master any specific technology is less important than learning how to adapt to, use, and understand any technology that emerges (or that meets your newly emergent needs).
Here are a few notes about how technology frames us, how we think, and what we can do:
• Tools are metaphors that shape how you think. What embedded assumptions does the new tool make? Does it assume right-handedness, or literacy, or a password, or a place to throw it away? Where the defaults are set can reflect a tool’s bias.
• What do you give up? This one has taken me a long time to learn. The only way to take up a new technology is to reduce an old one in my life already. Twitter must come at the expense of something else I was doing — even if it just daydreaming.
• Every new technology will bite back. The more powerful its gifts, the more powerfully it can be abused. Look for its costs.
And a few more about accepting the limits of your own knowledge, and how your ignorance isn’t a defeat:
• Understanding how a technology works is not necessary to use it well. We don’t understand how biology works, but we still use wood well.
• Nobody has any idea of what a new invention will really be good for. To evaluate don’t think, try.
• Find the minimum amount of technology that will maximize your options.
I think these last three observations might be both Kelly’s most powerful and the most true.
Update: I forgot maybe the number-one smart, accept-your-own-ignorance observation, which Alan Jacobs rightly pulled:
• You will be newbie forever. Get good at the beginner mode, learning new programs, asking dumb questions, making stupid mistakes, soliticting help, and helping others with what you learn (the best way to learn yourself).
Yesterday NiemanLab published some of my musings on the coming “Speakularity” — the moment when automatic speech transcription becomes fast, free and decent.
I probably should have underscored the fact that I don’t see this moment happening in 2011, given the fact that these musings were solicited as part of a NiemanLab series called “Predictions for Journalism 2011.” Instead, I think several things possibly could converge next year that would bring the Speakularity a lot closer. This is pure hypothesis and conjecture, but I’m putting this out there because I think there’s a small chance that talking about these possibilities publicly might actually make them more likely.
First, let’s take a clear-eyed look at where we are, in the most optimistic scenario. Watch the first minute-and-a-half or so of this video interview with Clay Shirky. Make sure you turn closed-captioning on, and set it to transcribe the audio. Here’s my best rendering of some of Shirky’s comments alongside my best rendering of the auto-caption:
|Manual transcript:||Auto transcript:|
|Well, they offered this penalty-free checking account to college students for the obvious reason students could run up an overdraft and not suffer. And so they got thousands of customers. And then when the students were spread around during the summer, they reneged on the deal. And so HSBC assumed they could change this policy and have the students not react because the students were just hopelessly disperse. So a guy named Wes Streeting (sp?) puts up a page on Facebook, which HSBC had not been counting on. And the Facebook site became the source of such a large and prolonged protest among thousands and thousands of people that within a few weeks, HSBC had to back down again. So that was one of the early examples of a managed organization like a bank running into the fact that its users and its customers are not just atomized, disconnected people. They can actually come together and act as a group now, because we’ve got these platforms that allow us to coordinate with one another.||will they offer the penalty-free technique at the college students pretty obvious resistance could could %uh run a program not suffer as they got thousands of customers and then when the students were spread around during the summer they were spread over the summer the reneged on the day and to hsbc assumed that they could change this policy and have the students not react because the students were just hopeless experts so again in western parts of the page on face book which hsbc had not been counting on the face book site became the source of such a large and prolonged protest among thousands and thousands of people that within a few weeks hsbc had to back down again so that was one of the early examples are female issue organization like a bank running into the fact that it’s users are not just after its customers are not just adam eyes turned disconnected people they get actually come together and act as a group mail because we’ve got these platforms to laos to coordinate|
Cringe-inducing, right? What little punctuation exists is in error (“it’s users”), there’s no capitalization, “atomized” has become “adam eyes,” “platforms that allow us” are now “platforms to laos,” and HSBC is suddenly an example of a “female issue organization,” whatever that means.
Now imagine, for a moment, that you’re a journalist. You click a button to send this video to Google Transcribe, where it appears in an interface somewhat resembling the New York Times’ DebateViewer. Highlight a passage in the text, and it will instantly loop the corresponding section of video, while you type in a more accurate transcription of the passage.
That advancement alone — quite achievable with existing technology — would speed our ability to transcribe a clip like this quite a bit. And it wouldn’t be much more of an encroachment than Google has already made into the field of automatic transcription. All of this, I suspect, could happen in 2011.
Now allow me a brief tangent. One of the predictions I considered submitting for NiemanLab’s series was that Facebook would unveil a dramatically enhanced Facebook Videos in 2011, integrating video into the core functionality of the site the way Photos have been, instead of making it an application. I suspect this would increase adoption, and we’d see more people getting tagged in videos. And Google might counter by adding social tagging capabilities to YouTube, the way they have with Picasa. This would mean that in some cases, Google would know who appeared in a video, and possibly know who was speaking.
Back to Google. This week, the Google Mobile team announced that they’ve built personalized voice recognition into Android. If you turn it on for your Android device, it’ll learn your voice, improving the accuracy of the software the way dictation programs such as Dragon do now.
Pair these ideas and fast-forward a bit. Google asks YouTube users whether they want to enable personalized voice recognition on videos they’re tagged in. If Google knows you’re speaking in a video, it uses what it knows about your voice to make your part of the transcription more accurate. (And hey, let’s throw in that they’ve enabled social tagging at the transcript level, so it can make educated guesses about who’s saying what in a video.)
A bit further on: Footage for most national news shows is regularly uploaded to YouTube, and this footage tends to feature a familiar blend of voices. If they were somewhat reliably tagged, and Google could begin learning their voices, automatic transcriptions for these shows could become decently accurate out of the box. That gets us to the democratized Daily Show scenario.
This is a bucketload of hypotheticals, and I’m highly pessimistic Google could make its various software layers work together this seamlessly anytime soon, but are you starting to see the path I’m drawing here?
And at this point, I’m talking about fairly mainstream applications. The launch of Google Transcribe alone would be a big step forward for journalists, driving down the costs of transcription for news applications a good amount.
Commenter Patrick at NiemanLab mentioned that the speech recognition industry will do everything in its power to prevent Google from releasing anything like Transcribe anytime soon. I agree, but I think speech transcription might be a smaller industry economically than GPS navigation,* and that didn’t prevent Google from solidly disrupting that universe with Google Navigate.
I’m stepping way out on a limb in all of this, it should be emphasized. I know very little about the technological or market realities of speech recognition. I think I know the news world well enough to know how valuable these things would be, and I think I have a sense of what might be feasible soon. But as Tim said on Twitter, “the Speakularity is a lot like the Singularity in that it’s a kind of ever-retreating target.”
The thing I’m surprised not many people have made hay with is the dystopian part of this vision. The Singularity has its gray goo, and the Speakularity has some pretty sinister implications as well. Does the vision I paint above up the creep factor for anyone?
* To make that guess, I’m extrapolating from the size of the call center recording systems market, which is projected to hit $1.24 billion by 2015. It’s only one segment of the industry, but I suspect it’s a hefty piece (15%? 20%?) of that pie. GPS, on the other hand, is slated to be a $70 billion market by 2013.
Why Jonah Lehrer can’t quit his janky GPS:
The moral is that it doesn’t take much before we start attributing feelings and intentions to a machine. (Sometimes, all it takes is a voice giving us instructions in English.) We are consummate agency detectors, which is why little kids talk to stuffed animals and why I haven’t thrown my GPS unit away. Furthermore, these mistaken perceptions of agency can dramatically change our response to the machine. When we see the device as having a few human attributes, we start treating it like a human, and not like a tool. In the case of my GPS unit, this means that I tolerate failings that I normally wouldn’t. So here’s my advice for designers of mediocre gadgets: Give them voices. Give us an excuse to endow them with agency. Because once we see them as humanesque, and not just as another thing, we’re more likely to develop a fondness for their failings.
I love this bit of de-naturalization from Brian Eno:
On the end of an era: “I think records were just a little bubble through time and those who made a living from them for a while were lucky. There is no reason why anyone should have made so much money from selling records except that everything was right for this period of time. I always knew it would run out sooner or later. It couldn’t last, and now it’s running out. I don’t particularly care that it is and like the way things are going. The record age was just a blip. It was a bit like if you had a source of whale blubber in the 1840s and it could be used as fuel. Before gas came along, if you traded in whale blubber, you were the richest man on Earth. Then gas came along and you’d be stuck with your whale blubber. Sorry mate – history’s moving along. Recorded music equals whale blubber. Eventually, something else will replace it.”
(“De-naturalization” is my favorite new term of art; I’ve heard it from several historians lately. If it’s not obvious, it means taking things that seem natural, inevitable, or just like part of the firmament and revealing them for the wacky, lucky historical accidents that they are. Because everything is.)
It’s been fascinating to me to watch the circulation of this clip of Louie CK on Conan O’Brien, loosely titled “Everything Is Amazing And No One Is Happy.” The appearance is
about eight months old (if anyone can track down an exact date, please let me know) over a year old, but I still regularly get links, embeds, forwards of it. What’s more, each highlights a different aspect — for some people, the clip is about the importance of thankfulness; for others, it marks generational change; it’s a good riff about the magic of new tech; and for still others, it’s a great attack on spoiled jerks.
Something about this resonates with people. It hit a sweet spot, and it’s on its way to becoming a modern classic.
Consider a few things that are colliding, at this moment, in my brain.
Warm-up number one: The writer Michael C. Milligan is writing a novel in three days. Just as interestingly—maybe even more interestingly—Eli James over at Novelr is live-blogging the process. It starts on Tuesday.
Warm-up number two: If I get to $10,000 over at Kickstarter (I’m $76 away!) I’m going to write an entire short story on my flight to New York on Tuesday.
Warm-up number three: Alain de Botton as Heathrow’s writer-in-residence. You see him stalking the terminal, taking notes.
All together, these set up this sort of writing-as-performance vibe. The text alone is not the thing.
Now, here’s what’s really got me thinking: Google Wave has a playback feature. EtherPad’s got it, too. This takes wiki-style document versioning a step further, or maybe a million steps further. It’s so much more granular! It goes keystroke by keystroke and attaches a time-stamp to each one. It records and recreates not just words and spaces, but confidence and hesitation.
So, skip past the obvious notion of playing back the creation of a standard short story or a novel. That’s fine; it makes me shiver, but it’s fine.
Think instead of a short story written with playback in mind. Written for playback. Typing speed and rhythm are part of the experience. Dramatic deletions are part of the story. The text at 2:20 tells you something about the text at 11:13, and vice versa. What appear at first to be tiny, tentative revisions turn out to be precisely-engineered signals. At 5:15 and paragraph five, the author switches a character’s gender, triggering a chain reaction of edits in the preceding grafs, some of which have interesting (and pre-planned?) side effects.
Talk about intertextuality.
I’m sure there are arty precedents (and if you know of any, I’d love to hear about them). But this feels like an interesting moment, simply because these are tools with (potentially) mass audiences. It’s possible that a lot of people are suddenly about to get a bit better at version-scrubbing, at understanding documents in time. And that means—maybe?—an audience for writing as real-time performance.
As I was skimming the list of new MacArthur fellows, one name popped out: Maneesh Agrawala. Strange trails of clicks, via Google and graphics coding forums, have led me several times to Agrawala’s UC Berkeley home page. Sometimes (usually?) the MacArthur picks make you go “huh?”—this one shouldn’t.
His work seems (and this is my description, here) built on two things: One, the realization that pixels are plastic, and that there’s not such a huge gap between 2D and 3D after all. Two, the understanding that we need much better ways to help people understand what they see on computer screens. (And he’s found a way to meaningfully explore both data visualization and user interfaces in the context of number two. Nice.)
I’d love to see more people who make software, in any context, on the MacArthur list in the future. To the degree that software accounts for more and more of our experience of the world, and to the degree hardware is getting to be more like software, this is exactly the spot where giving a super-brain the freedom to just jam for five years could, well, change everything.
I’m pretty sure this is the first Agrawala project I ever came across: I think it’s still the most impressive.
And finally, check it out: He’s even worked on ebooks!
So I was browsing Download.com (as I, you know, sometimes do) and noticed an interesting app. It was #36 or something on the most-downloaded list at the time—right up there next to WinZip and “Download Accelerator Plus.” It was a little program called Athan.
The athan is the call to prayer that you hear in Muslim countries, five times a day. Usually broadcast on tinny loudspeakers, it’s become a cliche of international reporting, an easy atmospheric effect. “Then, the sound of the muezzin calling the faithful to pray—distant, spectral—echoed through the streets.” Something like that.
It sounds like this. I tried to find a video that was more representative of actually hearing an athan in a Muslim city; it’s never so well-recorded, never so in-your-face. It’s more like the emergency sirens that cities rev up here in the U.S. on the first Tuesday of every month (or whatever)—you can hear it everywhere, but it always seems to be coming from somewhere else.
(Here’s something I don’t know: Do mosques in the U.S. or Europe play the athan over loudspeakers? Are they allowed? Probably not, right?)
Now, to be clear, I am a serious atheist. I am not dabbling in Islam. But even so, this app really called out to me (ha!) for two reasons. One, nostalgia. I do remember the athan—distant, spectral—from my time in Dhaka. Two, structure. I’m building my days entirely for myself now, and finding that it’s a challenge to split them into pieces. When does this thing end, and that one begin? It’s arbitrary. So—admittedly this is silly—I thought hey, this works for folks! Let’s give it a spin!
I am 100% glad I downloaded it, if only to see the interface.
Wow. Do you want the athan from Mecca or Medina? How about one from Egypt? They’ve all been sampled. Do you want the dua after the athan? What juristic method will you be using for the asr prayer? (The default is the one preferred by Imams Shafii, Hanbali, and Maliki.)
It might sound like I’m poking fun, but I am absolutely 100% not. One of my favorite intersections—and one of the most underreported—is the one between technology and religion. And an app like this lets you not just read about it, but sort of explore it.
And, come on: 42,305 downloads on Download.com last week! This is significant. This is a piece of culture, a piece of people’s lives.
Weirdly, it is now a part of my life, too. The volume is set really low, so the fajr athan at 5:43 a.m. doesn’t wake me up. I can’t even hear it in the next room. But the athans do play, and they do offer a gentle reminder to pull myself out of my laptop and look around.
And sometimes—this is the fun part—I’ll be listening to my writing soundtrack Pandora station, and the athan will start up, and it will suddenly be the coolest technology/religion remix you’ve ever heard.
(I’m totally on some watchlist now, aren’t I?)