Until now, in discussing the inbasket exam, I've only written about how the exam has been implemented and the various programming techniques necessary. Now it's time to stand back a little and look at the larger issues of the exam. But first a little debugging....
This time last week, the exam underwent its baptism of fire and was used with 'real live' examinees. The person running the lab contacted me as he was unable to get the database version of the exam to run, so I made a remote connection and transferred the new dll version of the exam (which had not been completed debugged).
In the evening, I remote connected again to the testing lab and downloaded three result files for examination. It immediately became clear that there were several errors in the file structure, some of which were easily correctable and some were somewhat harder. After 'massaging' the text file (and simultaneously correcting the exam program so that it will produce the correct format the next time it will be run), they were in a form which could be read in to the database and could produce reasonable results.
Meeting with the OP the next day, she informed me that four people had undertaken the exam. Where was the fourth person's results? I hadn't seen a fourth text file anywhere, but we found written confirmation that there was a fourth person who had undergone the test (it seemed possible to me that a fourth person was supposed to have undergone the test but did not). When checking on which computer this fourth person had used, I saw that the dll version of the exam had not been installed there; had the person undergone the test using the database version, the results would have gone directly into the database - and I overwrote the database in the morning as its structure had changed slightly. What could we do?
The answer of course is use the backup (which is done every evening to one of two external hard drives). We found the backed up database file and I was able to ascertain that the fourth person's results were indeed stored within the database. I took this file (taking care to rename it) and then wrote a one-off program to extract the results in the required text format.
When this file was read into the (new) database, a few new bugs became apparent, mainly connected with the handling of instant messages. I wondered why the first three files had not surfaced these problems; checking the answers carefully, it became apparent that there were no results for the im's in the result file. I checked the program code and saw that the results should have been written to the file, and even underwent the quick and dirty demo exam in order to confirm that the im's results were written.
It then became clear that the instant messages had not been displayed in the 'real' exam. Via the debugger, I ran the exam with the 'real' inbasket dll and saw that there was a bug in the code which was responsible for finding an instant message to be displayed (the equivalent of an sql query). The bug itself was in the resource file, not in the program code, meaning that the real bug was in the code which outputted the resource file. Like all bugs, hard to find but easy to fix.
So now we have a (hopefully) completely debugged exam. What can we do with it? Unfortunately, most of the analysis of the results would seem to be 'analogue' - text only, analysis by the psychologist, based on what was written, how it was written, to whom it was written, etc. There doesn't seem to be a standard 'key' by which the exam results could be marked. This is in contrast with the accounting exam which I converted; this exam differs from the 'real' exam in that the accounting exam requires the examinee to place several tasks in correct order (there are various constraints which force the order) but require minimal replies, whereas the 'real' exam's value is based on the replies themselves and not on the order in which they were dealt with.
What 'digital' results can I produce from the data? I had considered something like 'average time that message is displayed on screen', as we had noticed that one examinee seemed to take a long time to answer messages whereas another answered very quickly. This idea was discarded because theoretically an examinee could open all the messages in the inbox (thus having several simultaneously open windows) and then decide which to answer first. So the only metric that we currently have is how many times each message has been answered. Most messages should only be answered once, but some might (and some have to) be answered more than once. This metric shows the results in a concise format.
Following on from this, I am considering a metric in which we will see how many times a message was closed without it being answered. I am not convinced of the value of such a metric; just because it will be easy to write does not mean that it has any value. I noticed than one of the examinees was constantly opening and closing messages without answering them, and this metric will give such behaviour a numerical value. I am not qualified to determine whether such a numerical value has any psychological value.
As I wrote earlier, the successful implementation of the inbasket exam (as a framework) allows us to incorporate exams which are suited to different fields, thus allowing the OP and her staff to enter the recruitment field, either as primary recruiters (which might be considered as diluting the consultancy's core business) or as secondary recruiters (supplying evaluations to an outside recruiting company). I am also considering showing the exam to people at work; we no longer have a HR function within the company and people seem to be hired willy-nilly. Use of the test will allow us to winnow out people and concentrate on more successful applicants (not that there are many jobs being offered, if at all).