data

A Snarkmarket mini-collaboration: Snarksyllabi

Via Tim’s retweet, I saw that Dan Cohen at the Center for History and New Media at George Mason University released a really interesting dataset today: a million syllabi culled from the web, from 2002–2009.

I think this might make a fun Snarkmarket mini-collaboration. My tender programming chops are such that I can cook up a simple script to parse the data. I’m happy to share the code (and/or collaborate a bit) on Github, too, though I’m no pro with version control.

So the real question is: what sort of questions should we ask?

I’m open to anything, but my bias goes towards something slightly wacky, rather than, you know, something of scholarly significance. Let’s reverse-engineer an inquiry by starting with a Slate headline!

I mean, think about it—a syllabus is

  • a course of study,
  • a set of instructions,
  • a statement of values,
  • a collection of related documents,
  • an indirect payment to a bunch of authors,

and more, all in one.

What might we learn from a million of them all together?

Drop an idea, suggestion, meditation or musing in the comments!

Update: Bit of a hiccup with the database, per Dan’s second update here. As soon as the full version with the cached HTML pages is live, we’ll start playing with it. I’m leaning toward something simple and ngrammy to start, per Tim’s comment.

 

Now this is a genius grant that I can support

As I was skimming the list of new MacArthur fellows, one name popped out: Maneesh Agrawala. Strange trails of clicks, via Google and graphics coding forums, have led me several times to Agrawala’s UC Berkeley home page. Sometimes (usually?) the MacArthur picks make you go “huh?”—this one shouldn’t.

His work seems (and this is my description, here) built on two things: One, the realization that pixels are plastic, and that there’s not such a huge gap between 2D and 3D after all. Two, the understanding that we need much better ways to help people understand what they see on computer screens. (And he’s found a way to meaningfully explore both data visualization and user interfaces in the context of number two. Nice.)

I’d love to see more people who make software, in any context, on the MacArthur list in the future. To the degree that software accounts for more and more of our experience of the world, and to the degree hardware is getting to be more like software, this is exactly the spot where giving a super-brain the freedom to just jam for five years could, well, change everything.

I’m pretty sure this is the first Agrawala project I ever came across: I think it’s still the most impressive.

And finally, check it out: He’s even worked on ebooks!

 

This is the most useful graph ever plotted

Wow. Just wow. OkCupid crunches the numbers on first messages, correlating word usage with reply rates. Here’s just one of many—many—graphs they present:

20090915_okcupid

My friends, we now have the data we need to construct, Serpentor-style, the perfect OkCupid profile. He will be a tattooed physics grad student who’s in a veg-metal band and is looking to date a smart, fun, attractive zombie. Right?

(I’m smiling at my screen in self-satisfaction right now, but the truth is, the OkCupid commenters have beat us to it. Once more: wow.)