Friday, August 17, 2018

Using indexes

One of my "rules of life" is never to program after 9pm, as I won't sleep well afterwards. Last night I broke that rule and indeed did not sleep well.

Earlier in the evening, I had been working on a problem in the "ERP" program which I wrote for the Occupational Psychologist. One module is concerned with what are called in English 'reconciliations', although a better name might be 'match ups': these record which bill has been paid for by which receipt. There was a problem with one reconciliation for a customer, but I had difficulty in finding this problematic reconciliation.

After walking the dog (at nearly 9pm), it occurred to me that I could write a report which listed all the dubious reconciliations: these would have one or more bills and one or more receipts sharing the same reconciliation number. The query needed was not difficult to write
select bills.recon, count (*) from bills inner join receipts on receipts.recon = bills.recon where bills.customer = :p1 and bills.curdate >= :p2 and bills.recon > 0 group by bills.recon having count (*) > 2;
The only unusual part of that query is the final line: the 'having' clause allows one to compare a calculated value with an ordinal value. The ':p1' and ':p2' variables are parameters to the query.

I ran this query with the appropriate values for the variables and waited ... and waited ... and I thought at first that I had written a query which causes the database manager to crash. But no, after a few minutes - which is comparatively a lifetime - I got results: there was a dubious reconciliation. As I already had determined the same reconciliation manually, I was pleased that the query agreed with me, but I was more concerned with the performance of the query.

Adding a few indexes to the database should help - one on bills.customer (strangely, this was missing as it would help many queries), one on bills.recon and one on receipts.recon (these last two indexes wouldn't have much general use) - and indeed they did. After adding them, the time required for the query was reduced from about 2:15 minutes to 0.15 seconds, which makes the second query 900 times faster than the first! This speed increase was not a function of having the data cached, as I ran the first query a few times to see whether the long time was repeatable (and it was).

Theoretically, there should be an index for every foreign key in a table, but sometimes the value of doing so (faster queries) is outweighed by the cost of the index (slower inserting and updating).  The index on 'bills.customer' should have been there from the start as there are many queries which will benefit from this. The indexes on the 'recon' fields are less important but make a definite improvement here.

And to sign off: yes, I know that index is a Latin word, and as such its plural should be indices. But the original SQL programmers presumably did not have a classical education and so thought that the plural of index is indexes. It makes me shudder every time I write that false plural.

No comments: