murgatroid99 wrote:I really liked this comic, and I wanted to say that ReCAPTCHA actually does something very similar: ReCAPTCHA is the one with 2 words instead of a bunch of random letters. What they do is take two words from a book they are digitizing: one the computer can read and one it cannot, and they don't say which is which. They test whether the human got the readable one right, and if a bunch of people get the same answer for the other one they assume it is the right answer. That's why you sometimes get weird untypeable characters; the computer doesn't know what it says. That way people, including spammers, who answer CAPTCHAs are doing something useful.
Well, yes, when I type in a ReCAPTCHA I am helping digitize a book and that is useful. But TANSTAAFL. Just because the work is distributed in small unnoticeable chunks doesn't mean it is efficient or free work is obtained.
The ReCAPTCHA subjects are doing extra work, albeit a small amount of work. We're typing in two words, one of which is totally unnecessary for bot detection, because the bot detector has only one word (the one it knows) to work with to tell if you are human or a bot. Taking the unit of one "WHOO" (Word-Human OCR Operation, for lack of better term), each ReCAPTCHA test requires one WHOO more than the work of just one bot detection.
To complete the digitization of a text, say a classic manuscript or whatever, you run an automated scan and some computer determines that some words are not confidently OCR'ed. Let's say there are 200 words that go onto the list for human intervention. You'll need at least 200 WHOOs, but more likely some "confidence multiplier" on that, to ensure the text is correctly OCR'ed.
From what I understand how ReCAPTCHA works, they have a high confidence multiplier, i.e., they give one word to many different people and only accept the result when sufficient consensus exists over many, many people. If the words go to people (paid or volunteer) who are intentional human OCRs, let's say you reach confidence after five people WHOO each word (i.e., their confidence multiplier is 5 and the manuscript gets digitized for 1,000 WHOOs of effort). When using ReCAPTCHA subjects, you'll need a lot more people looking at the same word to get the same confidence, because ReCAPTCHA adds noise (which also makes each individual WHOO a little harder) and because ReCAPTCHA subjects are not as careful as intentional human OCRs (or they are lazy and know that there is a 50% chance they make it past the bot detector by always typing "x" for the second word). So, that is several thousand WHOOs to get that same manuscript digitized. Thus, ReCAPTCHA approach is less efficient, but somehow it is OK because it is thinly spread and somewhat unnoticeable to many people.
With that logic, we should allow airlines to save on the tedious work they do disposing of airplane sewage, so long as they disperse it at 35,000 feet in small enough drops that nobody would notice.
That said, I don't mind helping out the ReCAPTCHA folks now and then and doing a little extra free work.