Motivation

A while ago I got a bit annoyed with some 'IQ tests'.

(See What is going on here?)

It occurred to me that the main problem with them was that they have very strong practice effects.

I'm not sure that you can talk about measuring something if by measuring it you change it for ever.

Most people seem to think that IQ tests are telling you something about the speed of your brain, and something about your ability at abstract reasoning on novel problems.

I don't think they're telling you anything of the sort. They're telling you whether you've solved problems like this before.

From that point of view the reason for the mysterious 'Flynn Effect' (IQ scores are rocketing over time / our ancestors were morons) seems pretty obvious, and it throws all sorts of beliefs about intelligence differences between classes, races and nations into a new light.

Psychologist friends tell me that in order for a test to be an IQ test, it has to be pretty much immune to practice effects, but I can't see how this can possibly be true. Certainly I got much better at IQ tests by doing a few, and the same is true for my Mum, who went from barely being able to answer the questions to happily knocking off all the puzzles quickly with a couple of lessons from me.

Anyway, I reckon that what an IQ test is measuring is a combination of things:

And I wondered how to get rid of the second two, which aren't that interesting.

Stripping it down

Starting from a real IQ test I stripped out everything that I thought was dodgy, and ended up with a simple symmetry spotting game.

It's got huge practice effects. (Once you've figured it out, there's a speed you can do the puzzles at, and you can get about four times faster with practice).

But because it can generate puzzles at random, you can practise as much as you like.

So my hope is that it 'benchmarks the brain', by measuring the speed you can do a task which feels like a model of abstract/mathematical thought, once you've learned how to do it well.

And now my question is:

"Are there differences between people, or is everyone equally good at this once they've figured it out?"

Please help me decide by having a go at it: IQ test

Scoring

The scoring of this test is up in the air at the moment. I am having trouble constructing a scoring system that reflects my subjective impression of how well people have done.

I intend to explore various different ways of looking at the data and find a metric that rewards people appropriately for how quickly they can do the various puzzles, and how difficult the puzzles they can do are.

At the moment I'm using a Bayesian method to try to estimate how people's speeds compare to my own speed when I first made these puzzles and practised a bit.

So a score of 100 means that you're roughly the same speed as I was when I calibrated it.

There are currently two different scoring metrics displayed on the home page, where the puzzles are:

The 'Bayesian Score' is an estimate of how fast you are compared to some 'calibrated typical scores'

90% 40 - 95
50% 45 - 72

Means that the program is confident that your speed is between 40% and 95% of 'standard', and if pushed we'd take a guess that it's somewhere between 45% and 72%

These scores:

eye:1 5 <- (-4-5-3138)
eye:2 6 <- (-6)

are the raw data on how quickly you've done the various puzzles. eye:1 is a grid of nine coloured circles, eye:2 is a grid of nine pairs of concentric coloured circles, which look like eyes. So I called them 'eye charts'

In this example, the victim has taken 4, 5 and 3138 seconds to correctly do three different examples of the simplest puzzle type. To try to get a representative score I've taken the median, or middlemost score, so his overall score on the eye:1 charts is 5 seconds.

What is a good score?

Most of the people who've tried this test so far are either Cambridge Maths graduates, or people who read my Learning Clojure blog about programming in Clojure, a rather advanced programming language. And at least a few are both.

Amongst this very elite group, a score between 40 and 50 is good on the first go. If you score that or above that on a first try, well done indeed!

If you then have another go the following day, you should be much better at it, and this effect seems to continue until you eventually level out at a very much higher speed.

The highest speeds so far have been reached by Ramana Kumar (167) and James Burberry (148), who are two of the cleverest people I've ever met.

This gives me hope that my little symmetry-spotting game has at least some relevance to real cunning.

James has been playing with my test since the first prototype, but Ramana has had four goes. His scores were 51, 67, the rather startling 167, and then finally 94. He's been reluctant to do it a fifth time.

Stereotype Threat

A problem with most types of testing is a thing called 'stereotype threat', which is when you're so busy worrying about whether 'your kind' are good at this sort of thing that your performance drops off.

Using this, people have apparently reversed some of the strongest results in experimental psychology about the differences between men and women. This is amazing when you think about it. Imagine if you told a group of people that women tended to be stronger than men, and then they arm-wrestled, and the women came out on top.

I can imagine exactly how this might work, and to this day I thank the Good Lord that I though of myself as English when I was growing up. By blood I'm about half-Irish, and the English have the idea that Irish people are stupid. There were lots of jokes on this subject when I was a boy. Luckily I never realised that the English stereotype of Irish stupidity was supposed to apply to me. Of course, it's possible that I didn't notice because I am half-Irish, and we are just a bit dim that way.

I hope that my test will be immune to this sort of thing too, because you can just practise peacefully with no-one watching until you know how good you are at it, and play until you've got a score that you think you can't improve on.

Evidence For and Against

I think it might be measuring something important because:

A lot of my friends have tried it. Most of my friends are very clever, but I've noticed that the ones I've always thought were particularly sharp have been extremely good at this. It seems to be an elite-level mathematician and computer scientist detector. Although of course that could be more to do with that sort of person finding such puzzles interesting.

If I sort the friends who've tried it into my subjective ordering of their cleverness, then the rankings of the test are similar to my subjective rankings. (47 inversions out of a possible 210)

It feels as though it exercises the mind in the same way as doing all sorts of other abstract pattern spotting activities which are traditionally taken as signs of intelligence. Examples would be chess, cryptic crosswords, mental arithmetic, etc.

I think it might not be because:

The practice effects are much greater than I thought. I've been playing with it a lot and my score has pretty much doubled since I first wrote it.

I don't seem to slow down at it much if I try to do it with people around me who are talking or on the phone.

That's not true for me when I'm trying to program a computer, or when I'm trying to solve a crossword. But it is true if I'm doing maths or playing chess, both of which I've always been happy to do while in cafes and pubs. That makes me think that programming and crosswords use verbal parts of the brain which get interfered with when they're processing ambient conversation, whereas this test doesn't.