Wednesday, May 16, 2007

The 'theta' join and SQL arcana

A good day at work yesterday and the day before. I managed to solve a syntax problem preventing the execution of a new report, and in doing so learnt a lesson which I had been taught before and forgot. This gave me renewed confidence regarding the report generator in our new ERP program, and so yesterday I was able to create a few new reports.

These are statistical reports, showing how long the order approval process takes, how we stand regarding supply dates (as opposed to what we promised the customer), how long it takes to repair a chair and how long it takes to complete a service call. The quality management system (which I used to run) has goals for each of these services, and it is important for us to know where we stand in relation to these goals. In case we should surpass them, it's always possible to move the goal posts.

Coming off these reports, I had an 'a-ha' moment when I saw how I could write a report which shows which parts don't have a bill of materials. This can be generalised to "give me a list of something which doesn't have ", for example chairs which no one orders or unordered upholstery colours. I read somewhere once that this is called the 'theta' join: how to find values which are in one table but not another. In 'classical' SQL, this can be written as

select customer.name, customer.ip
from customer left join subscrip
on customer.id = subscrip.customer
where subscrip.ip is null
(this is an example from a program I wrote which manages the billing for internet access on my kibbutz and will find end customers who don't have billing data).

Unfortunately this syntax doesn't work as a 'naked SQL' query in the ERP program, let alone translated into the version needed for the report generator, so I had to shelve the idea originally. Yesterday I remembered seeing a query for a report which someone had written which used a subquery; I realised that I could use this as a replacement mechanism for the theta join. Translating back into SQL, I wrote
select part.partname, part.partdes
from part
where not exists
(select 'x' from part p1, partarc pa
where p1.part = pa.son
and pa.part = part.part)

What is this saying? If the subquery returns a value (that's the select 'x' part), then the where clause will be false and so there will be no value for the query. But if the subquery fails, then the where clause succeeds and so there will be values for part.partname and part.partdes.

A large step for (New) man, a small step for mankind. Or maybe it's the other way round: an hour's work from me and we can now produce data which other companies can only dream of.

No comments: