It’s harder than you might think to use Google Ngrams to actually chart trends in cultural history — or do “culturomics,” as the Science article authors would have it — because of well-known problems with the data set.
Here, Matthew Battles tries (on more or less a lark) to see some history play out, Bethany Nowviskie spots a trend (maybe true, maybe false), and Sarah Werner flags the problem.
Aw, man — that fhit Seriously Pucks.
You know what would actually be pretty cool, though? If it were easier to go one level deeper and use Ngrams to do Google Instant Regression. You could graph trends against well-known noise (other s-words misread as f) AND other trends — or instantly find similar graphs.
Let’s say the curve of the graph for the f–word in the 1860s is similar to that for other words and phrases — like “ass”* or “confederacy”* — you could correlate language with other language, individual words with stock phrases, and even (using language as an index/proxy) extralinguistic cultural trends or historical events.
Single-variable analysis just doesn’t tell you very much, even on a data set as problematic as print/language. You need systematic data, and better comparison and control capacity between variables, before you can start to do real science.
(* Ignore for the purposes of this example ascribing contemporary historical meanings to these two ambiguous terms.)