The murmur of the snarkmatrix…

Matt § A leaky rocketship / 2014-11-05 01:49:12
Greg Linch § A leaky rocketship / 2014-11-04 18:05:52
Robin § A leaky rocketship / 2014-11-04 05:11:02
P. Renaud § A leaky rocketship / 2014-11-04 04:13:09
Jay H § Matching cuts / 2014-10-02 02:41:13
Greg Linch § Matching cuts / 2014-09-16 18:18:15
Inque § Matching cuts / 2014-09-05 13:27:23
Gavin Craig § Matching cuts / 2014-08-31 16:33:56
Adam § Matching cuts / 2014-08-28 07:44:59
Tim Maly § Sooo / 2014-08-27 01:35:19

Micro vs. Macro in a Duel to the Death
 / 

Get ready: I am about to compare Wikipedia to Wal-Mart.

Chris Anderson says the magic of Wikipedia (and other internet systems, e.g. Google) is that they work on hugely macro “probabilistic” scales. Think of it like this:

To put it another way, the quality range in Britannica goes from, say, 5 to 9, with an average of 7. Wikipedia goes from 0 to 10, with an average of, say, 5. But given that Wikipedia has ten times as many entries as Britannica, your chances of finding a reasonable entry on the topic you’re looking for are actually higher on Wikipedia. That doesn’t mean that any given entry will be better, only that the overall value of Wikipedia is higher than Britannica when you consider it from this statistical perspective.

Nick Carr replies:

OK, but what are the broader consequences? Might not this statistical optimization of “value” at the macroscale be a recipe for mediocrity at the microscale — the scale, it’s worth remembering, that defines our own individual lives and the culture that surrounds us?

So here goes: This seems analogous to the debate over Wal-Mart.


Wal-Mart saved American shoppers about $250 billion in 2004.1 That’s huge. (More than enough to cover the cost of the war in Iraq, even! Thanks Wal-Mart!) And yet, Wal-Mart inflicts all sorts of costs on a more human scale.

I hesitate to compare Wikipedia to Wal-Mart in any way, of course, but I really do think the same line is being deployed in defense of both: “The good of the many outweighs the good of the few… or the one.” (c.f. Star Trek IV)

And this argument against that logic (a comment on Anderson’s post) is more than old-school hand-wringing, I think:

I agree that Wikipedia as a whole has more total value than Britannica as a whole. It probably does produce more social utility than Britannica, just as the Web + Google produces more utility than a good library + a card catalog.

But no one needs the whole of Wikipedia. They need the article they need, and they need it to be (mostly) right.

It’s actually a smart re-focusing. Pure macro-statistics (like Anderson’s rough numbers up top) cannot really guide us in much, because there’s so much we can’t count.

So instead we need to look at real lives and see how big systems, Wikipedia and Wal-Mart alike, operate at that level. For the record, based on my experience, I think Wikipedia is a joy (though not without room for improvement) and Wal-Mart is depressing (though not without some really smart innovations).

Frankly, I think if someone disagrees with either assessment, the road to a resolution is conversation — not arithmetic. It’s just never enough to say “hey, it works in aggregate” and call it a day.

1. Source: Skeezy press release, but fairly legit-seeming research (PDF). Note that it’s not all direct savings — much of the total comes from the downward pressure on prices that Wal-Mart’s cheaptasticness exerts on other stores.

December 27, 2005 / Uncategorized

9 comments

First of all, that’s Star Trek II. Or I think it also comes up in Star Trek III, but you don’t want to watch that (except for the comedy value of Christopher Lloyd as a Klingon).

Second, those statistics are way too vague. The range or even mean of article quality is not as important as the actual distribution, and I wouldn’t assume it is Gaussian in either case (although more so for Britannica). To my mind, the best general statistic governing Wikipedia is the variance in quality as determined by the popularity of the topic. Topics that have a lot of current-events or pop culture appeal will get a lot of visitors, a lot of edits, and tend towards mediocrity with relatively low variance. A topic that is arcane or difficult will receive fewer edits and will vary more wildly in quality, from extremely good to completely wrong.

The worst articles are ones that many people think they know something about, but in fact few people do. Try anything having to do with color science for an example (“what, my preschool art teacher’s concept of complementary colors isn’t authoritative?!?”). The best articles are often those that only experts will dare edit, but for which there is a large pool of experts. See any mainstream topic in undergraduate level math for examples.

When consulting WP it’s always good to have an idea in the back of your mind of “what is the likelihood that the people who edit this article 1) have Ph.D.s versus 2) are fourteen years old”.

Except, of course for my favorite articles–Harry Potter, the Final Fantasy video games, and supermodels–in which case, having been edited by fourteen-year-olds may be just fine.

You’re totally right, it was II and III. Star Trek IV was the one with the whales.

I love your General Rule of Wikipedia Usage, Peter. I am totally going to remember that. But also with Craig’s Amendment of course.

Star Trek, eh? I always thought that was John Stuart Mill (http://en.wikipedia.org/wiki/Utilitarianism).

I’d also note that there’s a lot of independent research available that makes the question of Wal-Mart’s *net* contribution to society a lot less dramatic than your citation, Robin, though I acknowledge that doesn’t really affect your analogy, which seems useful.

Have you checked in with the Star Tribune’s “Big Question” experiment, more-or-less blogging a story and using reader questions/reactions as touchstones for the next iteration? I think of it as a kind of moderated wikireporting … I think it holds great promise.

Then again, if you frame this as a discussion of process vs. results instead of micro vs. macro, then Wikipedia and Wal-Mart are diametrically opposed. Wal-Mart delivers terrific results to consumers (decent products at very low prices) by using often reprehensible practices (rock-bottom, unprotected wages for employees, pressuring out local merchants and manufacturers, etc.). Wikipedia, on the other hand, has a new and promising process but with often disappointing results.

The next big difference is that Wikipedia is free, and there’s lots of healthy competition for the kind of information you can get from Wikipedia, whether elsewhere online or at the library. If you live in a small town and can’t buy things online, you can’t say either of those things about Wal-Mart.

The debates around Wal-Mart and Wikipedia, though, both seem to hinge on a sense of a necessary connection between process versus results. Wal-Mart defends its processes in terms of its results, claiming that the only way it can deliver goods at the prices it offers is by using its business practices, and that any attempt to alter, regulate, or protest these practices threatens its ability to deliver goods cheaply. Likewise, Wikipedia defends its sometimes shabby results by appealing to its process, that some error is inevitable given how its entries are put together. Critics of Wikipedia take this sense of necessity a step further and argue that some kind of formal editorial or fact-checking process is necessary to preclude the particularly and grievously erroneous and/or inevitably mediocre results.

In both cases, what remains to be shown is that the process can be uncoupled from the results — that Wal-Mart could, if it wished to, deliver its products at prices lower than anyone else and still act responsibly, or that Wikipedia’s process could yield results as rich and reliable as anything offered by an edited encyclopedia. And, surprise! this is what most people usually wind up talking about.

Call me out on it if I’m missing the point, but isn’t this kind of comparing apples and orangutans? We’re talking about the amount of social harm inflicted by Wal-Mart in the context of the value of the articles on Wikipedia. But shouldn’t we instead match up the social harm caused by each, or the value of the goods offered by each? Because I think there are two completely different lines of defense employed for the two parameters (even though I’d call the quality argument a subset of the social harm argument).

In the social harm column, this is where people start making those big macro arguments about how many billions of dollars retail wages drop by vs. how many billions of dollars Americans save. And if we looked at Wal-Mart’s social harm on a micro scale, those macro defenses should still hold up. So for every Wal-Mart worker who has to go on Medicaid, maybe there are five families who can afford clothing for their children. Which is, yes, “the good of the many outweighs the good of the few,” but scaled down to a more human level. We must remember that Wal-Mart not only inflicts costs on the human scale, it also produces benefits.

I don’t know that I’ve ever heard Wikipedia attacked from this broad social harm perspective. First off, we’d need some way of quantifying the social harms and benefits produced by Wikipedia. Accuracy of term papers that cite it? Number of people libelled on it?

More often, I’ve heard Wikipedia attacked and defended for the value of its goods. And there, I don’t think the argument boils down to “the good of the many outweighs the good of the few,” although we’re still in the throes of utilitarianism. But it’s useful to make the Wal-Mart comparison here, because we never see Wal-Mart attacked or defended for the value of its goods.

People seem to widely comprehend that Wal-Mart Huggies ≠ diapers from Nieman Marcus. But for most, the Wal-Mart diapers are the better value. In the same way, for most people, Wikipedia provides a better value than Britannica or Google, on both a macro and a micro scale. On the macro scale, we’ve got all the stuff Chris Anderson talked about. Then on the micro scale, you’ve got the random guy who probably got to Wikipedia by typing something into Google.

Do you remember when you had no idea what something was and you’d just Google it, hoping to find out? And how there was all this hand-wringing about how you can’t trust everything you find on Google? (I hope you remember this, ’cause it was like last month.) And do you remember how that made you seriously want to resurrect the phrase “No duh”? This kerfuffle is the new That. Only I’d say Wikipedia is generally better than Google for this.

Well, you’re right about the benefits of WP for sure. As for costs, I was moved to write by Nick Carr’s contention that such costs do exist, however subtle — you can read lots more in his post. I don’t think he’s right but I do appreciate his urging to move away from this ‘in aggregate’ business to a more realistic appraisal of the experience. Which turns out to be good, but still.

I hadn’t seen that CAP report on Wal-Mart; it reads like they came up with the tagline first (“Wal-Mart: A Progressive Success Story!”) and the filled in the analysis.

And it still deals only in dollars. Any economist will tell you there are all sorts of very real costs and benefits that never get metered out in prices or wages — and it is precisely there that I think Wal-Mart comes out in the red.

Howard, I will check out that Big Question feature! Sounds cool!

Dan says…

I would like to open up the possiblility that there are social harms associated with Wikipedia. This is not an argument that I am committed to, but one that seems plausible to me and which is worthy of consideration following Tim’s assertion that Wikipedia’s process is benign (or even good). Tim writes that one of Wal-Mart’s reprehensible means is the pressuring and driving out of business of small merchants and manufacturers. My supposition is that Wikipedia could be viewed in a similar light, although instead of merchants and manufacturers of goods, it would seem to be those who create and sell kinds of information who might be in trouble. What if Wikipedia became a true competitor for Brittanica and many of the other specialist encyclopedias? Would it then be guilty of a similar sin?

And what about Wikipedia’s wages? They don’t pay their slaving scholars anything for all of their time and effort! Of course, that is an absurd argument. But it points to the flaw in calling Wikipedia free. Wikipedia is free only if we ignore the huge amount of capital invested by private investors (and perhaps even more so the U.S. government) in sustaining the very physical operation of the internet and all of the money that goes to support those Ph.D.s who we hope are writing the non-supermodel-related entries.

In the end, we say that Wikipedia has a good process because we can claim that its results are owned by everyone — its benefits truly belong to all. Wal-Mart, on the other hand, is said to be reprehensible because it is a corporation with a limited set of investors who profit to an unfair extent due to its exploitation of workers both in the US and abroad. So, in the end, I think the process is less important to these debates than the distribution of benefits and distribution of costs — if we stick with a utilitarian model for justice. So the question must really be posed for Wikipedia: who really benefits, and who pays?

Dan, that’s a good tie-in with a posting on the Digital Humanities Blog, mentioned yesterday at if:book. The author of the post asks a version of your question — “Why would Google and Yahoo be so interested in supporting Wikipedia?”

In the longer term, I think that Google and Yahoo have additional reasons for supporting Wikipedia that have more to do with the methodologies behind complex search and data-mining algorithms, algorithms that need full, free access to fairly reliable (though not necessarily perfect) encyclopedia entries.

Let me provide a brief example that I hope will show the value of having such a free resource when you are trying to scan, sort, and mine enormous corpora of text. Let’s say you have a billion unstructured, untagged, unsorted documents related to the American presidency in the last twenty years. How would you differentiate between documents that were about George H. W. Bush (Sr.) and George W. Bush (Jr.)? This is a tough information retrieval problem because both presidents are often referred to as just “George Bush” or “Bush.” Using data-mining algorithms such as Yahoo’s remarkable Term Extraction service, you could pull out of the Wikipedia entries for the two Bushes the most common words and phrases that were likely to show up in documents about each (e.g., “Berlin Wall” and “Barbara” vs. “September 11″ and “Laura”). You would still run into some disambiguation problems (“Saddam Hussein,” “Iraq,” “Dick Cheney” would show up a lot for both), but this method is actually quite a powerful start to document categorization.

The snarkmatrix awaits you

Below, you can use basic HTML tags and/or Markdown syntax.