Friday, July 19, 2024

Sorting on two columns

I faced an interesting situation today in one of the OP's programs. There is a suite of statements that refer to behaviours that are non-standard, such as alcoholism, drug taking and morality. One would expect that only a few people (or a low percentage of respondents) would agree with the statements and so these questions can't be analysed by standard statistical methods. The OP wants to compare the percentage of respondents that agreed with the statement in the nine months before 7 October 2023 with those in the nine months after this date.

We have a similar report that works on a per-question level, so the first thing that I did was copy the unit then update the copied version so that it would work with the non-standard statements (accessing them is slightly different from accessing the regular questions). All went well until I got to the stage of outputting the data. Over ten years ago (!), I wrote about sorting data in an Excel spreadsheet via automation. That code works for sorting on one column. Here we wanted to sort first by scale (alcoholism, etc)  then  by difference of percentage per question; in other words, sorting on two columns.
 
 sheet.Range['A1', 'G481'].Sort (sheet.Range['D2'], xlAscending,
                                 EmptyParam, EmptyParam,  // key2
                                 EmptyParam, EmptyParam,  // key3
                                 xlAscending, xlNo, EmptyParam, True,
                                 xlTopToBottom, xlSyllabary);

This code seems amenable to sorting on even three columns - all one has to do (I assumed) is to place a range statement (e.g. sheet.range ['O2']) in place of the first 'emptyparam' place holder, and to replace the second 'emptyparam' with 'xlAscending'. This seems simple in theory, but in practice it didn't work. I kept getting error messages, and as these are in mangled Hebrew, it's very difficult to know what the actual error is and even more difficult to know how to fix it.

As it was Friday, I left things as they were (sorted by scale and question - these are natural sort orders that come from the original SQL statement) and had a nap. When I awoke, I reconsidered; if I can't sort in Excel then maybe I can sort the CSV lines before they get inserted into Excel. This was easier said than done as the field by which the second sort has to be done was at the end of a csv line. Eventually after two hours' work, returning to the sort [pun not intended] of Pascal code that I would write before Delphi (including a bubble sort!), I had the final csv and hence the Excel spreadsheet ordered initially by scale then by difference. My algorithm has to create a temporary string list, insert the strings created for a given scale into this list, sort this string list then add the sorted strings from the temporary list into the main string list (the temporary string list itself is not sorted; I created an array of indices and values and did a bubble sort on this array).

Eventually I finished and it was time to take the dog for a walk. It occurred to me during the walk that it might have been better to use the original code to insert the strings into Excel when they are sorted by scale and question number, then sort different ranges of the spreadsheet in Excel. In other words, if lines 3-10 in the spreadsheet belong to scale ALC, then sort only these lines according to column O. Then sort lines 11-18, etc. I suppose that this is possible although it would be somewhat awkward (although no more awkward than the current code). Automating Excel is considered to be a slow operation, so doing this eight times might be quite slow. The current 'sort the csv in advance' approach works faster than I expected so this 'sort Excel by sections' approach might well be slower.

Then I remembered the old adage, 'if the only tool that you have is a database then every problem looks like a new table'; how would I solve this problem if I were working in Priority? I would output the intermediate data into a table then use SQL to return the data sorted by however I would want it. Why couldn't I do the same thing here? As the code was based on a unit that used the csv approach, I had simply continued in the same vein without considering other options. But now that I think of it, using a temporary table (i.e. a clientdataset that is not connected to the database) might well be more effective. There would be no need to convert percentages into strings and generally the code would be more streamlined. The 'sort the csv in advance' approach that I used today has one expensive section of code that has to be repeated once per scale (this didn't happen in the original version) and of course an SQL approach would require this expensive section to be executed only once. I might even convert the original unit to this SQL approach.

I'll leave that for tomorrow.


This day in history:

Blog #Date TitleTags
9419/07/2007What I did at work todayProgramming, ERP, Thermal printers
74019/07/2014Statistics with SQL (Firebird)Programming, Delphi, SQL, Firebird, Statistics
96419/07/2016When the music's overDCI Banks, Peter Robinson
124319/07/2019Making the Greece holiday videoHoliday, Home movies, Andros, Athens, Greece
164319/07/2023If this is jazz then I'm all for itAmbient music, Matthew Halsall

No comments: