CAPTCHA stands for
"Completely Automated
Public Turing test to tell
Computers and Humans
Apart." That's not good enough. We need a
"Completely Automated
Public Test to tell
Humans and other Humans
Apart" or, CAPTHHA.
Why? Read this ZDNet article: http://blogs.zdnet.com/security/?p=1835
(note: ZDNet menu completely busted in Firefox).
The article talks about the availability of broken CAPTCHAs for
sale. Companies in India (possibly elsewhere, but this article
highlights India) will, for a small fee ($2 for 1000), provide the
means for a computer to bypass the CAPTCHA - thus disabling the
CAPTCHA's sole purpose: keeping automated systems out.
Often CAPTCHAs are used to stop spammers. Hotmail, GMail, etc.
use CAPTCHAs to ensure that new emails are requested by humans, not
computers (since a computers can and do get 1000s of email
addresses from which to send spam). Spammers can automate the
sign-up, the CAPTCHA is what stops them. And enterprising companies
are making it possible to break them.
I'm going to skip the potential discussions on the ethics /
morality of this. It's happening, let's just look at why and what
can be done about it. I'm also going to pretend that spam is the
main problem being stopped. It's not.
I think the problem: the reason this happens - is a cultural
understanding of labor costs has caused a fundamental flaw in the
understanding of the problem. That's not to say CAPTCHAs aren't
useful, even needed. CAPTCHAs make it harder or at least adds a
cost associated with the abuse and that difficulty or cost will
stop some would-be spammers.
But the thinking underlying CAPTCHAs is this:
Tools are cheap - we'll buy them whenever we need them. Labor is
expensive, so we'll buy tools to save us labor costs.
The computer is the tool in this case. The problem that they're
having is that this assumption doesn't hold true globally.
Cement Truck or Wheel Barrows?
I spent many of my early years in Ecuador. I can remember one of
my classrooms was on the top floor that overlooked the walls of the
school. The next-door hospital was expanding and I watched as
people went to and fro pouring cement floors in a new hospital
building.
They did this manually with wheelbarrows.
In North America and Europe we might pour
it directly from a truck. 3 people are involved (2 to hold the
hose, 1 to drive the truck) and 1 large, expensive tool.
However, when the labor is cheap, it's cost-effective to get
people using wheel barrows to move cement around and skip the
truck. You might have 18 people, with 18 cheap tools (wheel
barrows). In North America, the cost of the additional 15 laborers
is way higher than the cost of the truck - so you buy a truck and
save 15 people and some money.
I think the CAPTCHA approach stems from the same mindset.
CAPTCHAs differentiate between computer and humans and the premise
is that spammers won't have a fleet of people to break them.
They'll try to use a tool (a computer) to break the CAPTCHA. The
premise is based on the same equation: labor = expensive, tools =
cheaper.
So the solution (the CAPTCHA) is specifically aimed at being
unbreakable to computers). But the premise doesn't hold everywhere
and so the solution doesn't work.
What to do?
Aside: Leverage work with reCAPTCHA
I've talked about reCAPTCHA
before: /2007/8/15/recaptcha.
I love its premise.
But, while not solving the problem, it bears an interesting side
effect in this situation.
Since a reCAPTCHA has 2 words: one known and the other unknown -
and the unknown word is from a scanned book. The unknown words they
gather together and, as people identify the word, they then know
what that word is. They are effectively crowdsourcing the
transcription of books.
With reCAPTCHA, spammers who are paying for the breaking of the
CAPTCHAs, would actually help in the effort. Effectively paying
people to transcribe these books.
I'm sure that there would be ways of rendering this good work
ineffective, but they may not bother. Since those providing the
reading service has not motivation to stop it: they are happy to
break the CAPTCHA: whether it is useful to society or not.
Flip It
But to actually solve the problem: allow only "legitimate" users
into the system, you have to re-evaluate the problem statement.
CAPTCHA, as a test to "tell Computer and Humans Apart" works, but
it doesn't solve the problem.
To stop this, we need to tell legitimate users apart from
illegitimate users. And this is really "by their actions."
You might build a system that would monitor the activity in some
big-brother-esque manner. But I'd suggest a much simpler
approach:
- Skip it: let them spam and just have good spam tools
To have good spam tools you need:
- The spam tool itself (to detect & stop the unwanted
behavior)
- Computing power to run the tool
The
costs of the needed computing power is approaching to 0. Amazon Web
Services charges $0.15 for a GB of storage: my first computer cost
well over $0.15 but stored just 40 MB. You can get a computer for
$0.10 per hour with 1.7GB of RAM: my first computer again, cost
more than that, but had 128 MB of RAM. The cost of storage,
processing, and memory required to stop spam is continually
decreasing.
And we have tools to stop spam. I am
regularly surprised that many people just aren't using good tools.
GMail works fairly well - I regularly have hundreds of emails in my
spam folder, but I've received 6 spam emails in my inbox - for the
lifetime of my GMail usage. At work, I moved our email system to
the open-source Spam Assassin. Open-source means that we can use it
freely. And it works very well (which is what matters).
HAM: Everything Else
The article lists more than just email (which is the existing
SPAM tools are targeted at). What about Craigslist, MySpace,
YouTube, and Facebook?
The only trouble spot I see is YouTube. The others are text
based - and spam as text comments or spam as email can be handled
in a similar fashion.
YouTube? That's trickier. If they are doing comment spam, then
that's still text and can be handled by the same tools as
anti-spam. To a computer: text is text.
But if they want to post to YouTube - their pre-built video that
is, essentially, just Spam. That's a harder proposition. You could
after the text of the title (since SPAM subject lines are usually
give-aways), but that could be spoofed by real humans (we're
working under the proposition that we can get real human labor for
cheap): writing new subject lines just as soon as one is flagged as
spam.
We have a problem: the current state of technology can't
understand video in a meaningful sense (more on this topic in the
future - starting Thursday).
I'm sure that there are other solutions, but let's get real
simple about this. We got into this problem because of our
worldview. Now that we've learned that labor isn't overly expensive
everywhere, we can solve it that way too.
We can set up shops of people that review the videos to
determine if they are spam. Actual human determination of "spam" or
"not spam."
A few points on this:
- You might be able to get people for a lower wage than the
spammers, since the hired people feel like they are helping
societal good.
- You are a big company and you have the resources &
understanding of tools to build better tools to make your reviewers
more productive than the CAPTCHA crackers
There is a potential problem in that it's more work to watch a
video and determine that it is spam than it is to post a video that
is spam.
But you can use crowdsourcing to help
you. The existing "flag" button allows humans watching the video to
tip you off. Then you review those flagged videos more closely. Or,
once flagged by enough users, remove the video until it is reviewed
and is determined to be valid.
Such an approach isn't perfect, but if it works well enough,
then the spam won't be as economically interesting. You don't have
to have a perfect system: it just has to be more costly to spam
than the potential value available when spamming.
Summary
I think that spam is going to happen. I also think that global
economics change how we should think about spam - along with other
things. Rather than trying to stop it, you have to change the
outcome. If it's easy to do, it will happen. The less
cost-effective it is, the fewer it will be done.
Also, your mindset is, by definition, attached to the cultures
that you have experienced. And most of the world is not you and may
be thinking differently about things. You need to see things
differently to see the dangers and opportunities that exist in a
different world.