Sunday, October 13, 2024

Correlations and threads

I have a non-specific memory of sitting at my desk when I was a student in London, calculating correlations that I needed for graphing data. Correlations measure the connection between two non-connected variables; if one increases as does the other by the same percentage, then there is a perfect correlation between the two. The two variables can have no correlation whatsoever or they also could have a negative correlation (one goes up while the other goes down). I don't recall how I learned about correlations - presumably we had a course in statistics during the degree studies, but I have no memory of this whatsoever. Correlation is another arrow in the statistics quiver*.

On Friday, in the context of our flagship database that stores data originating in psychological questionnaires, the OP wanted to know the correlation factors between two scales. This is done by first finding all the clients from a given date who have values for both scales, then calculating the mean and standard deviation for each scale. Then one sums (the difference between the observed value for one scale and its mean) multiplied by (the difference between the observed value for the other scale and its mean); this value is divided first by the product of the two standard deviations and then by number of data points (i.e. clients) less one. A good explanation of this can be found here. I wasn't fazed by her request as I knew about correlation.

Enough mathematics. Two years ago, I added a correlation function in the program for clients who had answered the questionnaires twice, in order to see if there was any change in their attitude to life (no change would mean perfect (1.0) correlations between the two values). I gave a brief look at what I had written there and shuddered as the code seemed over-complicated. These days I have temporary files1, so the task was somewhat simplified. I wanted to write a query that would return the sum of the multiplied differences, but I messed this up and instead used a cursor that would iterate over the population (i.e. clients) and calculate the total line by line. This worked.

In the afternoon, I took the dog for her afternoon walk. As usual, my mind considered2 the code that I had written in the morning and I realised where my mistake had been in the query (brackets in the wrong place). I then realised that I could use the idea that originated in the blog manager and then transferred to the documentation program of showing the results in a separate window. This way, one can run several correlations and see the differences between them (the original version showed the values calculated by the current parameters; using different parameters would cause the earlier values to be 'lost'). Good idea.

As the Yom Kippur fast was approaching, I didn't have any time to implement these changes, but I wrote the ideas down for later. During the fast, with nothing very much to do (I don't listen to music, work on the computer or watch the television; I either read or rest), I considered the ideas for future implementation. I noted to myself that the correlation function requires quite heavy calculating that grows according to the number of data points (a perfect correlation!) and during this time, the program is not responsive. Could I create a separate thread and have the calculation performed there? This would free up the user interface, making it more responsive. Assuming that this is possible, how would the thread tell the main program that it had finished and that the results could be displayed? Could the thread send a message3 to the main program that would cause a window to open with the newly calculated data?

I have written thread code before with Word4 and with Excel5, but in these cases, the database 'work' was done in the main program and the data extracted was passed to the thread. There is no display problem as either Word or Excel open with the data. I don't know enough about threads to know in advance whether my idea would work.

When the fast had finished and I had drunk and eaten sufficiently, I got to work. First I corrected the correlation code then I added the use of another temporary file and the separate windows showing results. This worked perfectly. Then I created a new unit that had the interface code of the previous unit, but passed the parameters from this unit onto a thread. All the computation code went into the thread; this also means that queries have to be created manually and their SQL code added in the text. I was initially dubious as to whether the queries would need an SQL connection defined within the thread, but using the connection defined in the data module worked fine.

Once I got all the syntax problems sorted, I added a final line to this unit:

sendmessage (mainhandle, WM_ShowCorrel, rinstance, 0);

This is meant to cause the main program to open up a specific window and pass to it the data contained in the temporary table with instance 'rinstance'. To my surprise, this almost worked correctly the first time; in the main program I was checking the value of 'wparam' instead of 'lparam'. Once this was fixed, all the pieces worked together perfectly.

The only problem that I've found seems to be that if the window with the user interface (which calls the thread) is closed before the thread finishes its work, the thread appears to die. I don't know enough about threads to know whether this is standard behaviour or some kind of bug; the thread doesn't have to 'know' who created it, only how it was created (i.e. the parameters). I'll have to find some sources to read about this.

Now I can look for other places that have heavy calculations (mainly those calculating means and standard deviations for a few thousand values) and see how I can use the thread and message method to ease the user interface.

* When I was thinking about writing this blog, the word 'quiver' had disappeared from my brain. It's as if the left hemisphere of my brain created a thread to find the word, as about fifteen minutes later the word 'quiver' popped into my mind when I was thinking about something else. Very appropriate.

Internal links
[1] 1548
[2] 1829
[3] 1310
[4] 1436
[5] 1443



This day in history:

Blog #Date TitleTags
13913/10/2008Fotheringay 2Sandy Denny, Fotheringay
41413/10/2011KindleKindle
64013/10/2013TasksProgramming, Delphi
126513/10/2019Acting like an MBA (a 'suit')Personal, MBA
167713/10/2023Time outPersonal, Song writing

No comments: