## What-If 0063: "Google's Datacenters on Punch Cards"

What if there was a forum for discussing these?

Moderators: Moderators General, Prelates, Magistrates

mud8541
Posts: 1
Joined: Wed Sep 18, 2013 5:33 pm UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Cal Engime wrote: Unless of course the intended meaning is "supply the milkmen themselves with milk," in which case we could perhaps write "quis lactabit ipsos gerulos lactis?" ("Who will suckle the milkmen themselves?")

Google Translate came up with "Who milks the milk man?" Interesting. But I think your idea of "supply the milkmen with milk" is more in line with the idea of "watch the watchers".

Regardless, the absurdity of it is delicious.
I fully subscribe this this [url=en.wikipedia.org/wiki/Theories_of_humor#Incongruity_theory]theory[/url].

gmalivuk
GNU Terry Pratchett
Posts: 26739
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

keithl wrote:
nerdsniped wrote:Is it just me, or is some of the math way off here?

First, volume of the punch cards. Wikipedia gives the dimensions of an 80-column IBM punch card as 187.325 x 82.55 x 0.18 millimeters. Multiply that by 15 x 10^18 bytes, divide by 80 bytes/card, and converting units, I get 521.9 cubic kilometers. The area of New England is 186,458.8 square kilometers (Wikipedia again), so it would be covered only to a height of 2.8 meters.
I concur.

There is another error in the calculation - 80 column punch cards only stored 1 of 64 symbols per column - the coding was sparse, 6 bits of information per column! The newer 96 column cards (which never really caught on) had denser coding. So, at 60 bytes per card (3 to 4 encoding, which we actually used to send ascii-encoded binary over UUCP) the volume is 895.9 km3, and the depth becomes 3.7 meters.

A character could be 0 (space), 1, 2, or 3 punches. The average for a random character is 2.06 punches per column, 2.75 punches per byte. The chads would be about 0.11 inch by 0.05 inch, 2.79 by 1.27 mm, so the average chad volume per byte (assuming random data) is 1.75 mm3. For 15 exabytes, that is 26 cubic kilometers of chad. Assuming (WAG) 1000kg of fiber per cubic meter, 50% carbon by weight, that is 13 billion tons of carbon. Burning all the chads would add about 9ppm CO2 to the Earth's atmosphere. Burning all the cards would almost double atmospheric CO2.

Imagine telling a 1960's atmospheric scientist that you were about to "burn" 15 exabytes of data. If they believed you, they would kill you to stop you.
The cards are capable of handling more than 3 punches in a column, though, e. g. for binary.

Also, I think your estimate that card stock is as dense as water, and that all the carbon therein would become atmospheric CO2, is a bit high. Also, were 1960s atmospheric scientists especially concerned about CO2?
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

mfb
Posts: 948
Joined: Thu Jan 08, 2009 7:48 pm UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Ekaros wrote:CERN seems to be around 100 PB themselves... So smashing stuff around does also produce lot of data ;D

It would be way more if the experiments would not throw away most of the data (the collisions which are not so interesting).

They record events with a rate of 20MHz (planned: 40MHz), but they write to disk just with a frequency between ~300 (?) and 3000 Hz (LHCb). So we have a reduction by a factor of ~10,000 here - without that reduction, the LHC would need 150 exabyte/year, and pass one zettabyte within a few years.

plsander
Posts: 17
Joined: Thu Jul 24, 2008 1:58 pm UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Don't drop that deck!

Protection against a single box being dropped would use 4 digit sequence number on the cards so that they could be sorted back into order.

Multi-box mixing and sorting would require a larger sequence number, even if we go to packed or alpha-numeric sequence 'numbers'.

ConMan
Shepherd's Pie?
Posts: 1690
Joined: Tue Jan 01, 2008 11:56 am UTC
Location: Beacon Alpha

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Presumably, if you used a binary numbering on the punchcards, some numbers would cause the cards to be so full of holes that they'd be at risk of tearing apart like perforated stamp sheets. Would it be worth the increased inefficiency to use a number system like phinary to ensure that there are never too many holes clustered together?
pollywog wrote:
Wikihow wrote:* Smile a lot! Give a gay girl a knowing "Hey, I'm a lesbian too!" smile.
I want to learn this smile, perfect it, and then go around smiling at lesbians and freaking them out.

drewder
Posts: 33
Joined: Wed Oct 19, 2011 11:47 am UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Microsoft (They have a million servers,[1] although no one seems sure why.)[

Not sure why? seems like an unusual dig. If anything the number sounds low.

Epistemonas
Posts: 12
Joined: Thu Jan 03, 2013 7:58 am UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

In terms of storage capacity, a punch card is 0.57 tweets.

That seems to assume that a punch card column carries the same amount of information as one character of a tweet, which it clearly can’t. An 80×12 punch card can carry 960 bits of information. There are 1,111,998 valid Unicode characters (1,114,112 code points minus 2,048 surrogates and 66 noncharacters), so each character represents about 20.08 bits of information. A 140-character tweet can have nearly 2,812 bits of information. Therefore, a punch card is 0.34 tweets.

Even if tweets were restricted to currently-assigned, non-private-use characters, of which there are 110,117 as of Unicode 6.2, a punch card could still only hold 0.41 tweets.

Klear
Posts: 1965
Joined: Sun Jun 13, 2010 8:43 am UTC
Location: Prague

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Epistemonas wrote:
In terms of storage capacity, a punch card is 0.57 tweets.

That seems to assume that a punch card column carries the same amount of information as one character of a tweet, which it clearly can’t. An 80×12 punch card can carry 960 bits of information. There are 1,111,998 valid Unicode characters (1,114,112 code points minus 2,048 surrogates and 66 noncharacters), so each character represents about 20.08 bits of information. A 140-character tweet can have nearly 2,812 bits of information. Therefore, a punch card is 0.34 tweets.

Even if tweets were restricted to currently-assigned, non-private-use characters, of which there are 110,117 as of Unicode 6.2, a punch card could still only hold 0.41 tweets.

I think he meant that it can store at least 0.57 of 99% of all tweets.

gmalivuk
GNU Terry Pratchett
Posts: 26739
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Really, it's probably closer to 0.75 of 99.9% of tweets, considering that tweets are generally in a human language and thus have information densities far lower than the maximum possible.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

Epistemonas
Posts: 12
Joined: Thu Jan 03, 2013 7:58 am UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Klear wrote:
Epistemonas wrote:
In terms of storage capacity, a punch card is 0.57 tweets.

That seems to assume that a punch card column carries the same amount of information as one character of a tweet, which it clearly can’t. An 80×12 punch card can carry 960 bits of information. There are 1,111,998 valid Unicode characters (1,114,112 code points minus 2,048 surrogates and 66 noncharacters), so each character represents about 20.08 bits of information. A 140-character tweet can have nearly 2,812 bits of information. Therefore, a punch card is 0.34 tweets.

Even if tweets were restricted to currently-assigned, non-private-use characters, of which there are 110,117 as of Unicode 6.2, a punch card could still only hold 0.41 tweets.

I think he meant that it can store at least 0.57 of 99% of all tweets.

With the right encoding, a punch card could store 1.00 of a typical tweet—most are less than 140 characters. The statement was about storage capacity, though.

Klear
Posts: 1965
Joined: Sun Jun 13, 2010 8:43 am UTC
Location: Prague

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

Epistemonas wrote:
Klear wrote:I think he meant that it can store at least 0.57 of 99% of all tweets.

With the right encoding, a punch card could store 1.00 of a typical tweet—most are less than 140 characters. The statement was about storage capacity, though.

I anticipated this objection, that's why I wrote "at least", but I guess it wasn't enough.

keithl
Posts: 658
Joined: Mon Aug 01, 2011 3:46 pm UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

gmalivuk wrote:
keithl wrote:...There is another error in the calculation - 80 column punch cards only stored 1 of 64 symbols per column - the coding was sparse, 6 bits of information per column! The newer 96 column cards (which never really caught on) had denser coding ... Imagine telling a 1960's atmospheric scientist that you were about to "burn" 15 exabytes of data. If they believed you, they would kill you to stop you.
The cards are capable of handling more than 3 punches in a column, though, e. g. for binary.

Also, I think your estimate that card stock is as dense as water, and that all the carbon therein would become atmospheric CO2, is a bit high. Also, were 1960s atmospheric scientists especially concerned about CO2?

Please give an example of an 80 column card punched as you suggest. The 96 column cards indeed could be punched into binary lace, but the handling machinery was a lot more sophisticated and expensive, and thank goodness we went to disk packs and text terminals before those became widespread. The 80 column cards were stouter, and quite heavy; in my early teens I handled a few 2000 card boxes. Cards were pulled through readers and past mechanical contact switches at fairly high speeds and accelerations; a hypothetical card with all columns punched would simply tear in half and jam the reader. You could make such a card (pretty!) by running it through a keypunch multiple times. Submitting one in a card deck and jamming the reader was one way to sabotage a 1960's data center.

Another way was to wet the card and soften it, then go over it with a clothes iron to partly close the holes. Both were popular pranks among the anticorporate counterculture; utility bills were often printed on punch cards. Of course, the suits on the top floor weren't affected by these pranks, just the poor overworked schlubs in the basement machine room. My single mother was one such schlub.

Nope - normal 80 column cards had only 0 to 3 punches, for mechanical reasons. Even then, the readers were finicky and failed a lot; moisture would stick two cards together, lint from the cards would accumulate and jam the gears, etc. They used mechanical contacts because lint plugged up optical paths, and phototransistors weren't available.

Concern about CO2 and atmospheric warming? Demonstrated by Tyndall in 1860, numerically calculated by Arrhenius in 1896. The Mauna Loa CO2 measurements begin in 1958, and show an annual monotonic trend through the 1960's. While warming from atmospheric CO2 became a cause célèbre with Hansen's 1981 congressional testimony, the science was almost a century old. It's a pity most people (deniers and fearmongers alike) still don't understand Arrhenius's reasoning.

gmalivuk
GNU Terry Pratchett
Posts: 26739
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

keithl wrote:
gmalivuk wrote:
keithl wrote:...There is another error in the calculation - 80 column punch cards only stored 1 of 64 symbols per column - the coding was sparse, 6 bits of information per column! The newer 96 column cards (which never really caught on) had denser coding ... Imagine telling a 1960's atmospheric scientist that you were about to "burn" 15 exabytes of data. If they believed you, they would kill you to stop you.
The cards are capable of handling more than 3 punches in a column, though, e. g. for binary.

Also, I think your estimate that card stock is as dense as water, and that all the carbon therein would become atmospheric CO2, is a bit high. Also, were 1960s atmospheric scientists especially concerned about CO2?

Please give an example of an 80 column card punched as you suggest.
Why should I need to? You already admitted such exist, since they could be used to sabotage readers.

I didn't say it would be readable at the same high speed typical of practical card readers at the time, did I?

Concern about CO2 and atmospheric warming? Demonstrated by Tyndall in 1860, numerically calculated by Arrhenius in 1896. The Mauna Loa CO2 measurements begin in 1958, and show an annual monotonic trend through the 1960's. While warming from atmospheric CO2 became a cause célèbre with Hansen's 1981 congressional testimony, the science was almost a century old. It's a pity most people (deniers and fearmongers alike) still don't understand Arrhenius's reasoning.
Yes, I'm aware of how old the underlying physics knowledge is. That's not what I asked.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

PM 2Ring
Posts: 3707
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Sydney, Australia

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

keithl wrote:Please give an example of an 80 column card punched as you suggest.

From http://en.wikipedia.org/wiki/Punched_ca ... cter_codes

For some computer applications, binary formats were used, where each hole represented a single binary digit (or "bit"), every column (or row) was treated as a simple bitfield, and every combination of holes was permitted.

I used such cards on an IBM 2560 Multi-Function Card Machine. You did have to be careful with them, and a card with too many holes punched out could certainly be dangerous, but they worked ok for typical machine code. In fact, the IBM 360/20 that I first learned programming on used such a card as the first step in the bootstrapping process. And I even wrote a little program in machine code that I was able to optimize to make it small enough to fit on one card punched in binary mode. My program duplicated a deck of cards punched in normal EBCDIC coding, adding sequence numbers in columns 73-80 (the standard sequence number field) to the duplicates, and printing the (numbered) card contents to the line printer.

Here's a photo of an IBM 360/20 with 2560 from http://en.wikipedia.org/wiki/IBM_System ... _computers

EDIT

keithl wrote:Nope - normal 80 column cards had only 0 to 3 punches, for mechanical reasons. Even then, the readers were finicky and failed a lot; moisture would stick two cards together, lint from the cards would accumulate and jam the gears, etc. They used mechanical contacts because lint plugged up optical paths, and phototransistors weren't available.

The 2560 MFCM used solar cells, IIRC - they were faster than CdS photoresistors. For additional speed, cards were read sideways - i.e., row by row, rather than column by column.

However, I'm pretty sure the old 029 keypunch used mechanical sensing for card duplication & verification functions. But they could afford to run slower than the card reader attached to the computer.

gormster
Posts: 233
Joined: Mon Jul 23, 2007 6:43 am UTC
Location: Sydney

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

I wonder if Randall took into account the fact that punch cards have 12-bit bytes... so each punch card actually holds 120 (8-bit) bytes of data.
Eddie Izzard wrote:And poetry! Poetry is a lot like music, only less notes and more words.

PM 2Ring
Posts: 3707
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Sydney, Australia

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

gormster wrote:I wonder if Randall took into account the fact that punch cards have 12-bit bytes... so each punch card actually holds 120 (8-bit) bytes of data.

I doubt it; from http://what-if.xkcd.com/63/
A punch card can hold about 80 characters, and a box of cards holds 2000 cards

Oops! I meant to reply to this earlier.
ConMan wrote:Presumably, if you used a binary numbering on the punchcards, some numbers would cause the cards to be so full of holes that they'd be at risk of tearing apart like perforated stamp sheets. Would it be worth the increased inefficiency to use a number system like phinary to ensure that there are never too many holes clustered together?

Maybe, but that'd only reduce clustering in columns and doesn't affect row clustering at all. As you can see in the card image above, the inter-column distance was much smaller than the inter-row distance. So even in non-binary encodings, cards that had repeated characters were in danger of developing weak spots. So we'd really need something like phinary that works in 2D...

Philbert
Posts: 32
Joined: Mon Jan 05, 2009 12:32 pm UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

I wonder if Google really has data centers in Groningen _and_ Eemshaven in the Netherlands. Note that Groningen is not only a town, but also a province. A province that includes Eemshaven:
http://goo.gl/maps/bgE3g
There might be data centers in both locations, or it may be a mistake.

Update: Apparently there are data centers in both locations: http://goo.gl/maps/1wweX

ConMan
Shepherd's Pie?
Posts: 1690
Joined: Tue Jan 01, 2008 11:56 am UTC
Location: Beacon Alpha

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

PM 2Ring wrote:Oops! I meant to reply to this earlier.
ConMan wrote:Presumably, if you used a binary numbering on the punchcards, some numbers would cause the cards to be so full of holes that they'd be at risk of tearing apart like perforated stamp sheets. Would it be worth the increased inefficiency to use a number system like phinary to ensure that there are never too many holes clustered together?

Maybe, but that'd only reduce clustering in columns and doesn't affect row clustering at all. As you can see in the card image above, the inter-column distance was much smaller than the inter-row distance. So even in non-binary encodings, cards that had repeated characters were in danger of developing weak spots. So we'd really need something like phinary that works in 2D...

Hmm. Although it would probably reduce the risk a little bit. To reduce row repetition, you could probably do something like "3 rows of punch card per row of code, 1st row of code is card row 1 xor card row 2, 2nd row of code is card row 2 xor card row 3" which should give the following possibilities to help reduce the number of repeated holes:

Code: Select all

A B | 1 2 3 | 1 2 3
-----+-------+-------
0 0 | 1 0 1 | 0 1 0
0 1 | 1 0 0 | 0 1 1
1 0 | 1 1 0 | 0 0 1
1 1 | 0 0 0 | 1 1 1 (obviously avoid this one)
pollywog wrote:
Wikihow wrote:* Smile a lot! Give a gay girl a knowing "Hey, I'm a lesbian too!" smile.
I want to learn this smile, perfect it, and then go around smiling at lesbians and freaking them out.

MacOrlando
Posts: 1
Joined: Wed Sep 25, 2013 9:33 am UTC

### Re: What-If 0063: "Google's Datacenters on Punch Cards"

What-If #63 (Google's Datacenters on Punch Cards) quotes the worldwide production of hard disks as 8EByte/year (in Quote [12]).
This is badly off. 8EByte/year is the production of external storage systems.

The worldwide production of hard disks is about ~180 million units per quarter which adds up to ~700 million units/year (see: e.g.(Quote denied by \$!*\$ forum filter). Assuming about 1TByte/unit we have 700EByte/year of produced hard disks.
Thus Google owning 10EByte of hard disks, does not really make a big dent in the worldwide hard disk market.