{"id":126,"date":"2008-06-08T21:24:12","date_gmt":"2008-06-09T02:24:12","guid":{"rendered":""},"modified":"-0001-11-30T00:00:00","modified_gmt":"-0001-11-30T05:00:00","slug":"","status":"publish","type":"post","link":"https:\/\/lithoguru.com\/life\/?p=126","title":{"rendered":"Lithography Word Recount"},"content":{"rendered":"<p>I was befuddled (rank: 53,829) by my recent experience with <a href=\"http:\/\/wordcount.org\">wordcount.org<\/a> (see my previous post).  It seems that the word \u2018lithography\u2019 is ranked appallingly low in frequency of use, relegating me and my life\u2019s work to the denizens of the perennially unpopular.  But something smelled funny.  I began to think that WordCount was not very good at counting.  Since I have spent a lot of time thinking about how to measure things over the years, I decided to do what I always do when I see a data point I don\u2019t like:  blame the measurement.<\/p>\n<p>I began by looking into the website\u2019s counting method.  From the wordcount.org site:<\/p>\n<p>\u201cWordCount\u2122 is an artistic experiment in the way we use language. It presents the 86,800 most frequently used English words, ranked in order of commonness\u2026 WordCount data currently comes from the British National Corpus, a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent an accurate cross-section of current English usage.\u201d<\/p>\n<p>So WordCount is an art project.  I suppose that doesn\u2019t mean it couldn\u2019t be accurate, though I suspect that accuracy is low on the list of success criteria for most artists.  But what is the British National Corpus?  I found the official BNC website, and this is what they said:<\/p>\n<p>\u201cThe British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written.\u201d<\/p>\n<p>British English!  That explains a lot.  I thought the word count would relate to real English.  But since lithography was a European invention, and was certainly practiced in England, I\u2019m not sure that this could explain lithography\u2019s unexpected lack of popularity.  True, England doesn\u2019t have a semiconductor industry to speak of, so talk of semiconductor lithography over dinner is probably unlikely.  But still, the frequency of use seemed too low, especially compared to \u2018sciorto\u2019.<\/p>\n<p>I did a little more digging.  Of the 100 million words in the collection, the word \u2018lithography\u2019 is used 47 times.  That\u2019s a pretty small count, even if the sample appears to be large.  100 million words is obviously not enough if you want good statistics at the tail of the distribution.  The other words near \u2018lithography\u2019 on the list \u2013 luqa, calculi, tiverton, kaysone, sciorto, and bullingdon \u2013 were all tied with lithography.  Digging further in the BNC website, I could even find the sources for those 47 word uses.  This is where the fun begins.<\/p>\n<p>Yes, Sciorto is an Italian family name, but Count Roman di Sciorto is a character from a romance novel called <i>Calypso\u2019s Island<\/i>, the source of all 47 occurrences in the BNC.  Talk about skewing the sample.  Here is one example:  \u201cHow ludicrous, after all, to have imagined that the great Count Romano de Sciorto, of Casa Sciorto, of the Citt\u00e0 Notabile, the Noble City, could fall seriously in love with her.\u201d  Riveting.  Tiverton, while certainly a city in England, is also a character from another romance novel, <i>Hidden Flame<\/i>, from which 19 of its 47 word-use references came.  It seems that romance novels make up a fair part of the 100 million word collection.  Almost every use of Bullingdon occurred on television news and refered to the prison of that name in Oxfordshire, England.  What we have here is a phenomenon called \u2018the sampling sucks\u2019, caused by the lumpiness of an abysmally low sample size for these words.  100 million words seems large, but when you think about all of the words that are written and spoken in English each day, that number starts looking very small.<\/p>\n<p>The bottom line is this:  WordCount is art, and while it definitely has words, it doesn\u2019t do a very good job of counting.  You shouldn\u2019t expect artists to count \u2013 that\u2019s what nerds are for.<\/p>\n<p>By the way, \u2018recount\u2019 is number 29,409 on the list.  I think the wordcount.org folks need to move it a little higher up.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was befuddled (rank: 53,829) by my recent experience with wordcount.org (see my previous post). It seems that the word \u2018lithography\u2019 is ranked appallingly low in frequency of use, relegating me and my life\u2019s work to the denizens of the perennially unpopular. But something smelled funny. I began to think that WordCount was not very [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-126","post","type-post","status-publish","format-standard","hentry","category-general"],"_links":{"self":[{"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=\/wp\/v2\/posts\/126","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=126"}],"version-history":[{"count":0,"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=\/wp\/v2\/posts\/126\/revisions"}],"wp:attachment":[{"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=126"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=126"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lithoguru.com\/life\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}