Friday, May 29, 2009

Who plays with whom?

In 1996, I started writing a cd database program; this would have been one of my first serious efforts at writing database programs in Delphi. The program would store physical information about musical cds (the cd's internal id number, how many tracks, how long each track lasts), and I would add data such as cd name, song title, who made the cd, who played, who composed, etc. Once this was done, I could get out of the database arcane facts such as how many tracks were composed by Richard Thompson, on how many tracks Simon Nicol played and so on. Maybe the highlight of the data mining was calculating on which tracks both RT and SN played.

Under the hood, the program was not written very well. My original source of information about Delphi database programming used the 'ttable' component all the time, so tables were lavishly scattered around the code. Somewhere around 2003, I started replacing tables with queries, at least in the places where I understood what was happening. Such queries made the code much simpler to understand, with a few terse but clear lines replacing such monstrosities as a record by record search of a table.

The 'who plays with whom' form was an exceptional case of using tables, most of them temporary, and presumably because of this form's complexity, I had stayed away from trying to improve it. Yesterday, I was having a routine look through the program, changing bits and pieces to current standards, when I came across this form. I knew what the code was supposed to do, but didn't like the way it was being done - and in fact, I cringe at what I had written twelve or thirteen years ago. Well, I didn't know better.

As it was dog walking time, I had the perfect opportunity to think about improving this function. I was fairly sure that I could replace almost all of the code (some of which dealt with finding which songs were common to two or more people, and some of which dealt with finding data about those songs from various tables - as if I had never heard of the 'join' statement in SQL) with one moderately advanced query. When I came back from our half hour walk, I set to work, and came up with this:

select tracks."track name", artists."artist name", cds."cd title", tracks."track id", count(*)
from musician, tracks, artists, cds
where musician."track id" = tracks."track id"
and tracks."artist id" = artists.id
and tracks."cd id" = cds."cd id"
and musician."person id" in (0)
group by tracks."track name", artists."artist name", cds."cd title", tracks."track id"
having count(*) = :an

The '(0)' gets replaced by a list of people (their id numbers), so basically the query returns a list of tracks (their name, the artist and the cd on which they appear) and how many times those tracks were returned from the musicians' table, given a list of musicians. The musicians table, being a link table between people and tracks, has three fields: a meaningless id, a person id and a track id, where each record means that person 'a' played on track 'b'. In retrospect, the record's id field is unnecessary but harmless; but I have refrained from changing the database structure as this would entail multiple changes in the program code.

The parameter 'an' is the number of people contained in the list passed to the query. The query returns a list of all the songs on which one or more people in the musicians' list played, but the count value can vary: it could be one (in which case only one of the people in the list played on the song), two or more. This value can only be checked after the query has completed, which is why the final line is a 'having' clause, something which I rarely use. The query must return only the songs on which all the people played, so the 'count' field must equal the number of people in the list.

No comments: