Is the Turing test passé?

My editor Suzan Troutt sent me an email this morning, pointing out the recent Turing test conducted by the University of Reading in collaboration with the EU RoboLaw project. Their results have been sufficiently debunked by various analysts and comment threads, so I will not spill more ink on it here.

In the novel SuSAn the title character faces her own Turing test, as a kind of trial-by-fire or rite of passage. Susan’s creator tells her that no machine has ever passed before. My editor was worried that the scene would need to be rewritten because it has now been rendered obsolete.

Not even remotely. The test in the novel is much harder than the one reported in the news. I describe elements of it below. Most things we call a “Turing test” these days are not precisely what Turing himself proposed. Instead we try to follow the spirit of it.

  • Embodiment — Turing proposed using something like a teletype to communicate. The idea is to remove all clues that are irrelevant to intelligence itself. As a firm believer in the necessity of embodiment for “true” intelligence, I suggest that the communication go through a robotic avatar. This raises the bar to include body language, the ability to interact with objects, etc.
  • Boundaries — Unlimited time. No pre-specified topics. No boundaries of any kind, except those constructed in the moment by social interaction.
  • Crowd sourcing — In the novel the interviews are public. This may or may not be a good way to do science. The advantage is efficiency. You get many more judges for a given amount of face time with the avatar by allowing others to observe and form an opinion. Judges can choose which contestants to vote on, so they will tend to pick those they are more certain about. Also, some amount of vetting would be necessary to prevent ballot stuffing.
  • Priors — Often a modern Turing test involves parallel interviews, where one contestant is a machine and the other human. Why not make the sample fully random? Let the judge weigh each contestant on its own merits. We can still promise the judge that on average there are 50% humans and 50% machines.
  • Scoring — Human contestants could pretend to be machines, just like the machines pretend to be human. However, it makes a better control if everyone tries to be human. The results should include how well the humans scored at being human. A machine must score as high as most of the humans to pass the test. This is where the University of Reading results lack credibility.
  • Judging the judges — Keep track of how accurate each judge is. This creates a 3-way game, where everyone tries to maximize their score: the judges, the human contestants, and the machines. A judge’s track record could be used to weight the results. If a machine fools a lot of idiots, so what? But if it fools experts in the field, that is an achievement!