web analytics

Words, meet numbers

Have you seen this thing? So cool.

Google has now digitized 15 million books and this dingus allows you to graph word and phrase frequency over time. Up to five comparisons at once.

So, for example, you can see the handover point when “World War I” replaced “Great War” in literature. Or the exact point “Tiananmen Square” drops out of Chinese books.

It isn’t by a pure word count. It couldn’t be. There were only half a million books published in English before 1900 and a squintillion since then, so any pure word count would make a screaming spike roar up the Twentieth Century. They explain a bit more about how they normalize the data here.

This would have been a great tool to have during the Michael Bellesiles controversy. (Though I still think the simplest way to debunk him would’ve been counting how many cookbooks had recipes for game. Game means guns).

Hold on, hold on, hold on. Y’all are going to go look up wirty dords, aren’t you?

Yeah, I know you people like the back of my hand.

Comments


Comment from S. Weasel
Time: December 16, 2010, 11:02 pm

Here’s a New Scientist article about the same thing.

Oh, and Jimmy Wales has the Crazy Eyes:


Comment from Mitchell
Time: December 16, 2010, 11:04 pm

So after 1960 we all became a bunch of potty mouths. Figures.


Comment from Can’t hark my cry
Time: December 17, 2010, 1:00 am

Oh, man. I’ve been a bit uneasy about the Google Books project. . .but I do like this result of it. I foresee a lot of wasted time in my life!


Comment from S. Weasel
Time: December 17, 2010, 1:03 am

Yes indeedy. Google is a rotten intrusive huge evil organization…that builds some of the coolest tools EVER.


Comment from Uncle Badger
Time: December 17, 2010, 1:10 am

Add my name to the list of the Google-worried. That chap Schmidt is dangerosusly, barking mad.


Comment from Can’t hark my cry
Time: December 17, 2010, 1:16 am

Gee, you guys are way ahead of me–I just don’t like the idea of digitizing copyright material and making it available for free (or for your profit but not the copyright holder’s). And I know the issue was way more complex than that, but I did my first (and, if I have anything to say about it, last ever) Federal jury trial this fall, and a whole lot of really important stuff just whipped past my eyes without ever hitting my brain. . .


Comment from Can’t hark my cry
Time: December 17, 2010, 2:54 am

Totally OT, and it may be too gentle/not critical enough for y’all, but this G&S-loving fashioned-in-the-clay liberal sniggered herself sick over this: http://www.247comedy.com/obama-musical

[Mind you. The all-time-best “Modern Major General” takeoff can be experienced, delightfully, here: http://www.privatehand.com/flash/elements.html%5D


Comment from Uncle Badger
Time: December 17, 2010, 4:03 am

Sorry, Can’t Hark, IAMANL but my take on it is every bit as fundamental as your ‘digitizing copyright material and making it available for free (or for your profit but not the copyright holder’s)’ And no amount of BS from bent lawyers arguing about the meaning of ‘is’ will change that.

It used to be fashionable to castigate Microsoft for unsavoury behaviour. As far as I can see, Google makes Microsoft look like rank amateurs in ‘teh evil’ stakes.

All Microsoft wanted was your money…


Comment from QuasiModo
Time: December 17, 2010, 5:34 am

The curve on the word ‘pussy’ is interesting…it was an innocent word to denote a cat in 1920, became a dirty word towards 1960, then everybody decided they didn’t care by 1982 🙂


Comment from SDN
Time: December 17, 2010, 10:55 am

Point of order, milady: game doesn’t HAVE to mean guns; let’s ask Robin o’ the Hood how many of the King’s deer ended up on his table via gunpowder. Really, now. I have to remind someone in England of that???

However, the broader point is absolutely true.


Comment from surly ermine
Time: December 17, 2010, 1:27 pm

Trying to find a word that went from complete obscurity to rousing popularity in the shortest period of time. So far “ratshit” wins.
What the hell was an “ipod” circa 1920?

did i mention the boss isn’t here today…


Comment from Can’t hark my cry
Time: December 17, 2010, 1:28 pm

Uncle B, it has long seemed to me that every large organization takes on some cast of evil. I truly think it is a function of size. One of the maxims to live by that I learned at the knee of my father, the engineer, was “it all has to do with the surface-to-volume ratio.” I didn’t understand what it meant until I took high school biology; but thereafter, it became a powerful metaphor in thinking about organizations.

Why we distrust government–the biggest organization of all, with NOONE who can effectively step in and control it when it gets the bit between its teeth. . .

There is more than a bit of truth in Jasper Fforde’s fictional creation The Goliath Corporation.


Comment from surly ermine
Time: December 17, 2010, 1:38 pm

No wait, “batshit crazy” has a helluva spike. Guess that’s a phrase though.

“Muggle” has been around a lot longer than i would have thought.
http://en.wikipedia.org/wiki/Muggle_%28disambiguation%29


Comment from S. Weasel
Time: December 17, 2010, 1:54 pm

Note that if you click the links at the bottom of the graph, it shows you the books your words were found in. And in some cases, there are excerpts (or maybe whole books?).


Comment from Mrs. Peel
Time: December 17, 2010, 3:02 pm

My objection is to the ridiculously long copyright periods. Basically, Disney is super-powerful, so the copyright period continues to extend so that it perpetually remains long enough that Mickey is still under copyright. That means that, for example, 2 of the 8 Anne of Green Gables books are still under copyright, because they’re younger than Steamboat Willie. Which I think is dumb. Is anyone actually making money off Anne at this point?

I think they should have a shorter set period, maybe 20 years after the death of the author, with a form to apply for exceptions/extensions if you can prove that losing the copyright would harm the business (as in the case of Mickey). But if the business ceases to be family-owned, no more copyright.


Comment from S. Weasel
Time: December 17, 2010, 4:28 pm

I totally agree, Mrs P. And there are all sorts of unsavory intellectual property lawyers out there making outrageous claims to improbable ideas. Very frustrating to people trying to work with Zazzle, who will pull stuff out of the marketplace at the slightest complaint.

Don’t blame them. They’re there to sell t-shirts, not fight for principles in court. But it’s a bore.

And good to see you again, btw.


Comment from David Gillies
Time: December 17, 2010, 6:40 pm

It’s a week till Xmas, Weasel peeps, and I pose this conundrum: what the fuck is a Jingle Horse, and how can it simultaneously a) giddy up and b) pick up its feet? Surely the necessity of gathering one’s feet, presumably in a basket of some description, contraindicates the ability to giddy up.


Comment from S. Weasel
Time: December 17, 2010, 7:31 pm

Jingle horse, jingle horse.

No idea, but if it’s out there tonight, it’s dangly bits will freeze off.


Comment from David Gillies
Time: December 17, 2010, 7:53 pm

It’s been brass monkeys here too, at least by tropical standards. I kid you not, the fact that the ambient temperature dropped to 12.8 °C in the metropolitan area was sufficiently noteworthy to make front page in the newspaper. Yes, you read that right: plus thirteen Celsius is headline news. But when you’re used to 28 °C year round, weather that lets you see your breath is a right shocker, I can tell you


Comment from Nina from GCP
Time: December 18, 2010, 12:17 am

And I have to admit that it is inconceivable to me that anyone could digitize that many books.


Comment from Mrs. Peel
Time: December 18, 2010, 1:09 am

Sorry I haven’t been by more often, sweas. Got married and all, which has kept me busy.

Btw, did you know that was me tweeting you about your vocabulary? I meant to email and clue you in that I use my meatspace name on twitter, but forgot.


Comment from JeffS
Time: December 19, 2010, 9:28 pm

Note that if you click the links at the bottom of the graph, it shows you the books your words were found in. And in some cases, there are excerpts (or maybe whole books?).

Hmmmmmm….Google needs to tweak their OCR software just a tad……

Back in the olden days, a lower case “s” closely resembled what is now used as a lower case “f”. Which results in this howler (among many) from “The blessing of Judah by Jacob”:

… He made him to fuck honey ** out of the rock. — with fat of lambs — and thou *’ didst drink the pure blood of the grape.” Jacob says, ” His eyes should be red with wine, …

Heh heh heh heh!


Comment from S. Weasel
Time: December 20, 2010, 12:26 pm

Congratulations, Mrs Peel! And thereby I prove how often I visit other people’s blogs these days.

I used to walk my blogroll every day, back when I had a job. Now that I’m unemployed, I don’t have the time.

Weird but true.


Comment from tawny
Time: December 21, 2010, 11:39 pm

Speaking of phrases that come from nowhere, serial killer is another which shot to prominence in the 1980s and has risen since. Also fun to compare ‘cat’ and ‘dog’ (go dogs!)


Comment from S. Weasel
Time: December 22, 2010, 12:40 am

Robert Ressler is usually credited with coining the term “serial killer” in the ’70s — though there is some dispute.


Comment from Mark Matis
Time: December 22, 2010, 1:10 am

How ’bout “cereal killer”?

Or maybe “surreal killer”?
}:-]

Write a comment

(as if I cared)

(yeah. I'm going to write)

(oooo! you have a website?)


Beware: more than one link in a comment is apt to earn you a trip to the spam filter, where you will remain -- cold, frightened and alone -- until I remember to clean the trap. But, hey, without Akismet, we'd be up to our asses in...well, ass porn, mostly.


<< carry me back to ol' virginny