Friday, July 04, 2008

Bacon numbers

I see that I've managed to pass an entire month without blogging. There have been experiences which have been too delicate to share here, and other experiences which aren't very interesting (mainly health issues).

There is some new (to me) music which has been floating around. An advance copy of Randy Newman's new cd, "Harps and Angels", his first of original songs since 1999's "Bad Love" has been played several times, but has not been received here as well as some of his earlier material. To my ears, it contains too much 12 bar blues based material, and that's not a musical form which I care for.

I've also been belatedly listening to British soul singer Corinne Bailey Rae's eponymous record. I'd seen some of her videos on VH-1 and hadn't been overly impressed. At the time, I didn't know that she was British. Her name came up whilst reading a detective novel, in which she was described as coming from Leeds; I checked this out and found it to be true. I then heard a song of hers on the radio which I very much liked, so I thought that it was time that I listened to her properly. Unfortunately, the song which I liked isn't on the record, which makes me wonder whether it was really her. Every time I've tried searching for certain key lyrics in the song (I'm sure that she sang about "chasing paper"), I've found myriad references to The Beatles' "Two of us", and none to CBR. The disc itself has worn out its welcome; it's not a keeper.

Over the past few months, I've been recording films from television onto DVD. After a while, I found myself recording the same film more than once, and so thought that it was time that I organise these discs. For someone like myself, writing such a database program would be simple and fun. At first, it started off with a table of actors (name only), a table of films (name only) and a table of cast lists, in which the actors were matched to the films, but after a while, my ambitions grew.

First off was a list of directors, although it turns out that apart from Woody Allen and Robert Altman, there are very few directors who have more than a sample showing in my collection. Then I added a link to the imdb site; clicking on a button would bring up a web browser pointing to the appropriate page. Then of course there were all kinds of reports about actors: how many times each actor has appeared in the films which I have, with which other actors they have worked and with which directors they have appeared. Simple, harmless fun.

Today I got up and thought to myself, "Bacon numbers". This wouldn't necessarily have to start with Kevin Bacon (who only appears in a few of the films which I have); rather, I could check connections between any two actors. Obviously the act of someone who has too much time on his hands. When I told this to my wife, she started laughing, as I expected her to do.

But from a programming point of view, this is a very interesting exercise. The algorithm needed, basically, is the same algorithm which is fundamental to the Prolog language, a language which has always intrigued and inspired me. I've written a few such programs (one was for accounting purposes: a customer has twenty bills outstanding and makes a payment without saying what it was for; the program tries various combinations of bills until it gets to the same total), but this one was a bit different.

Here's the algorithm:
1. Add to the list the id of the actor whose Bacon number we are trying to calculate
2. Remove the first name from the list and store it in a variable
3. If the target actor (eg Kevin Bacon as per the original) and this actor have appeared together in a play, then print the distance, and set a finishing flag
4. If they haven't appeared together, then get all the actors who have appeared with this actor and add them to the list, increasing the distance
5. If the query in stage 4 returns no answers, then finish, otherwise
6. Go to stage 2.

The actual implementation was much easier than I expected. I had to add a flag to each actor to show whether s/he had been inspected during the current search, but apart from this, I was able to implement the algorithm using two database queries and a listbox.

I've checked various combinations of people and have got back the expected results. For example, I don't have a film in which Meg Ryan and Gary Sinise have appeared together (maybe there isn't one at all). Their mutual Bacon number is 2, via Tom Hanks (MR and TH have appeared in two films which I have together, and TH and GS have appeared in one). For Woody Allen and Sandra Bullock, the number is 3 at the moment (WA -> Diane Keaton -> Keanu Reeves -> Sandra Bullock). Eventually the program will surprise me and return a smaller number than the one I am expecting.

The only real problem with this program is that it uses the Prolog convention of a closed universe. I was checking Warren Beatty with Diane Keaton; I know that they appeared together in "Reds", so their number is 1. But I don't have this film in my library, and so the program couldn't find any link. The only WB film which I have is "McCabe and Mrs Miller", and the actors there don't seem to connect.... Wait a minute! Shelley Duvall was in M&MM with WB, and she was in "Annie Hall" with Diane Keaton. Maybe I didn't add her name to those films....

1 comment:

Anonymous said...

Great work.