Bunty wrote:I don't know about comfort, but there seems to have been an interesting blip in the use of the word 'internet' centred around the year 1900...
http://books.google.com/ngrams/graph?co ... moothing=3
I ran across similar data a few years back, with an amusing explanation. When the authors investigated some of these "1900 blips", they found that most were an artifact of the fact that a small but significant amount of the world's online material is on "mainframes", i.e., IBM systems. Most of the date-handling libraries on such systems use 1900 as their "epoch". That is, when partial years are used in date fields (to save two bytes), the software prepends "19" to the value. New things stored on such systems are now being dated "12", which some of IBM mainframe software will expand to "1912". These date bugs are being patched, but in the dwindling world of IBM mainframe software development, this tends to be slow, and will take decades to die out. They'll probably never die out entirely, and will remain as an annoying problem for historians looking at documents from the early 21st century.
Note that the "1900 blip" in the occurrence of "internet" is roughly from 1900 to 1910 or 1911. This is consistent with the above, and represents 2-byte date fields for documents from the years 2000 to 2011, as interpreted by software that automatically adds 1900 to all 2-byte dates. That blip should be expected to slowly extend over the next few years, though it should dwindle slowly. We don't know when it'll drop below the level of noise.
I'd bet that the problem isn't entirely on IBM mainframes, though. After all, Americans continue to insist on writing 2-digit years in dates. I've even seen dates like 07/12/09 in articles that were clearly about the early 20th century, with no way to be sure that the "09" meant 1909 or 2009. Or maybe it meant 2007-12-09. If I couldn't tell, I doubt that most software would guess the century right in such cases.
Google Books also has scattered dates that are simply wrong by random amounts, for no discernible reason.