Monday, February 24, 2014

Somebody's been lying

I must have Carole Bayer Sager on the brain at the moment, for the above title is the opening lyric to one of her songs. But today I intend to write about something somewhat less trivial than CBS's love song lyrics: the problem of discovering whether someone is lying when taking a psychological analysis test.

This is something around which the Occupational Psychologist (OP) and I have been circling for some time. We have discussed this issue a few times and came to the conclusion that when someone is giving a false answer (in other words, lying), then their response time is longer than when they are telling the truth. Whilst this is intuitively true, someone (I forget who) wrote a research paper on this and divided the process in the brain into three sections. The brain has to analyse and understand the question, then formulate a response; if the response is true, then the brain simply has to retrieve the already existing answer, but if the respondent wishes to lie, then the brain has to work much harder. From a practical point of view, a few years ago I added to our flagship product the ability to measure the response time for every question.

Of course, not all people were born equal. Some people finish the test in 30 minutes (400 questions), some in 35 minutes, some in 40 and some in 50. Sometimes people have a break - either for personal reasons or because they are called for an interview - so one might get the impression that measuring response time is not necessarily so easy. There is also another train of thought which we have noted but yet to implement seriously - the complexity of questions. Some questions are very simple ("I am very slow in making up my mind") whereas others are more complex ("When a person 'pads' his income tax report so as to get out of some of his taxes, it is just as bad as stealing money from the government"). It takes longer for the brain to process the complex questions, so the response time will be longer.

A paper was presented on the subject in an Israeli Psychologists' conference in January. Whilst neither of us went to this conference, we have been reading the conference proceedings on the net. The paper on this subject was interesting, but told us things that we already knew. After discussing it, we decided on a new approach to detect when people are lying.

The first step was to build a control scale - this is composed of thirty questions in which there is 90%+ agreement in the answers; presumably these are questions whose answers are so obvious that no one has to lie (I should point out at this stage that each question can belong to more than one scale and that a 'correct' answer for one scale may not be the 'correct' answer in another scale). I can then calculate what the average response time is for this scale for each person. I already know what the average response time is for all the other scales for each person, but this absolute datum doesn't interest us; what we want to examine is the average response time as compared to the control scale.

To give a numerical example, person A has an average response time of 3.5 seconds for the control scale and 5.0 seconds for a given scale; thus the ratio is 1.428. Person B has an average response time of 5.0 seconds for the control scale and 6.2 seconds for the given scale, so her ratio would be 1.24. In other words, person B answers questions relatively faster than person A (although absolutely slower). In all the measurements of time, I am ignoring cases when the response time is over three minutes; if someone needs to go to the toilet in the middle of the exam, the current question's response time will be exceedingly long, but I won't take it into account.

Once we have the ratios for each scale/person, it is a simple matter to calculate the mean and standard deviation for each ratio, and then calculate for each scale/person the 'standard corrected value'. In other words, going back to the above example, person A has a ratio of 1.428 for a given scale; if the mean ratio for that scale is 1.300 and the standard deviation is 0.15, then the value for this person will be 58, statistically insignificant (or to put it another way, the value is 0.85 standard deviations from the mean). Should someone have a statistically significant variation (three standard variations from the mean, ie a ratio using a mean of 1.300 and sd 0.15, this would mean a ratio of 1.75), then presumably they lied on some of the questions contained within the given scale.

I spent several hours implementing all of the above (fortunately this was Saturday when I can work undisturbed for hours). I calculated the ratios for 4,000 people, 65 scales each; naturally, this took the computer quite some time. After having done so, I then calculated the mean and standard deviation for each scale, then ran a check against a random person. In doing so, I discovered that I had made a mistake when initially calculating the ratios! Possibly I should have debugged the process with a much smaller sample size, but I was fixated on creating a large enough sample for the mean and standard deviation to be correct and ignored the possibility of creating a small sample with albeit 'wrong' means but at least enough data to check my calculations.

Once I figured out what was wrong, I ran the lengthy calculation process again then recalculated the means and standard deviations. Once I had these figures, checking a given person's data (ie calculating the standard corrected value) was fairly quick. Adding the check against this value in the printed report took five minutes.

Obviously, I checked only a few people to see that the resulting values were calculated correctly (checking against my hand calculated figures and logic). Statistically, someone has to be lying, but depending on the ratio of standard deviation to mean ("how narrow the distribution is"), this may be only one person in one scale. Let us not forget that I calculated 65 scales for 4,000 people - in other words 260,000 values! - so finding that one value may take some time. I'm not going to bother. 

It will be interesting to see how this works for all the future examinees.

Incidentally, I note that the information presented in 'Introduction to Business Research 3' made the statistical analysis relatively painless for me. The OP (and presumably other psychologists) use the 'standard corrected value' whereas the IBR material uses the z-statistic. In a sense, they're the same; the ccv converts the z-statistic to a value between 0-100, where 50 is the mean. This makes it easier to understand for people who aren't inclined to statistics (and who is?).

No comments: