Wednesday, April 13, 2011

Advanced SQL for me - NULLIF

It all began so innocently. The Occupational Psychologist (OP) sent me an email yesterday asking that a certain list box (containing customers; one chooses a customer and then sees a report based on that customer) contain only active customers. When I pointed out that at present, there is no way of determining what an active customer is, she wrote back suggesting that I add an 'active' field to the customers table. So I added an 'active' field to the customers table, added this field to the 'Edit Customers' dialog, and modified a few queries so that they loaded only active customers into certain list boxes.

I woke up in the middle of the night with all kinds of ideas about active/inactive customers. When making the additions yesterday evening, I decided to define a new customer as being inactive. Whenever a new docket is opened, its customer will be marked as active. This saves the users from manually defining active customers. Could I do something similar in order to define an inactive customer, where such a customer would be one who has no connected dockets whatsoever, or all the connected dockets are closed (each docket has a status, so I would be looking for dockets of a certain status)?

No problem. The first thing to do would be to add a field 'Closed' to the 'statuses' table; only one status can have this value. Then I could write a query which runs at program startup, listing all customers and the number of open dockets connected to them; all the customers with a count of 0 should be marked as inactive and all the others active.

Then I started considering how I could implement the necessary changes in the 'statuses' table: only one status can be considered 'future' and only one status 'closed'. This would require checking in a dialog which until now has not carried out such checks.

Following on from this, I started envisioning a non-modal dialog which runs on program startup and carries out all kinds of checks (at the moment, I only have the above check to run, but who knows what will happen once I implement this and we get inspired). The dialog should give feedback on what it's doing but shouldn't close automatically so that the user can see this feedback. On the other hand, the dialog should close automatically after a few minutes in order to prevent screen clutter, which means a timer has to be involved....

The query to discover which customers are currently active originally was
select customers.id, count (*)
from customers, dockets, statuses
where customers.id = dockets.customer
and dockets.status = statuses.id
and statuses.closed = 0
The only problem with the above is that customers who don't have any connected dockets would not appear. Then I thought that I should use a left join -
select customers.id, count (*)
from statuses, customers left join dockets
on customers.id = dockets.customer
where dockets.status = statuses.id
and statuses.closed = 0
group by customers.id
but this is liable to return a count of 1 for every customer, because every customer will be in the left join. Not only that, but if a customer doesn't have a docket, then the join with the statuses table will be invalid.

A better query would be
select customers.id, count (*)
from customers left join dockets
on customers.id = dockets.customer
group by customers.id
but this counts all the dockets, not just those which have been closed. The following seems to be the correct query
select customers.id, count (dockets.id)
from customers left join dockets
on customers.id = dockets.customer
left join statuses
on dockets.status = statuses.id
where statuses.closed = 0
group by customers.id
Unfortunately, this query returns 180 rows whereas the customers table has 182 rows!  It transpires that the two missing rows belong to customers who have one docket each, and both those dockets are closed. As far as I am concerned, a customer who only has closed dockets is equivalent to a customer who has no dockets.

After no small amount of faffing about, I discovered the NULLIF function which can be used inside the count function. This function takes two parameters: if the parameters are equal, then it returns null, else it returns the value of the first parameter. Using this function means that the final query is
select customers.id, count (nullif (dockets.status,
(select id from statuses where closed = 1)))
from customers left join dockets
on customers.id = dockets.customer
group by customers.id
The subquery is necessary because theoretically the program doesn't know what the closed status is.

Now I've worked out the SQL, I can implement it within the program.

No comments: