Via Tim’s retweet, I saw that Dan Cohen at the Center for History and New Media at George Mason University released a really interesting dataset today: a million syllabi culled from the web, from 2002–2009.
I think this might make a fun Snarkmarket mini-collaboration. My tender programming chops are such that I can cook up a simple script to parse the data. I’m happy to share the code (and/or collaborate a bit) on Github, too, though I’m no pro with version control.
So the real question is: what sort of questions should we ask?
I’m open to anything, but my bias goes towards something slightly wacky, rather than, you know, something of scholarly significance. Let’s reverse-engineer an inquiry by starting with a Slate headline!
I mean, think about it—a syllabus is
- a course of study,
- a set of instructions,
- a statement of values,
- a collection of related documents,
- an indirect payment to a bunch of authors,
and more, all in one.
What might we learn from a million of them all together?
Drop an idea, suggestion, meditation or musing in the comments!
Update: Bit of a hiccup with the database, per Dan’s second update here. As soon as the full version with the cached HTML pages is live, we’ll start playing with it. I’m leaning toward something simple and ngrammy to start, per Tim’s comment.