Tuesday, July 28, 2009

Speed, speed, speed

As is apparent from my previous blogs about Word Automation, one of the applications which I wrote/develop/support uses WA to write a complicated report. Unfortunately, as the complication of the data rises, so the report takes longer (this isn't because of the WA), so much so that running this report on the client's computer was taking several minutes. I have tried to speed up the report as much as possible, by using techniques such as precalculating values and moving expensive operations (such as SQL selects) out of loops, but there is a limit to how much one can optimise.

Jon Bentley devotes a chapter in his seminal book "Programming Pearls" to optimising, and points out that the kind of optimisation which I was performing can only save a certain amount of time. In order to get big savings in time, one has to change the algorithm. Whilst I can't see how the algorithm can be changed (it's a question of get these data and print them, then get other data and print them), I started thinking of other ways of printing. This led me to consider the option of using a separate thread to print the document. I have never used threads before, but I found an article which demystified them.

I spent most of Thursday evening playing with this thread code. Fortunately, it explains how to make thread-safe calls to the Borland Database Engine (BDE), so I didn't have any problems there, although the lack of a main window meant that I had to write out the sql code manually. Calling Word from the thread worked, and it seemed that I had found a solution to the speed problem. In another life, I cynically call this 'good management' - let someone else worry about the problem. Here, instead of my program worrying about printing the document, a separate thread could do the work, and free my program to the user so that more examinees could be inputted.

I tested the new program extensively, once I had got all the obvious bugs removed, and was dismayed to find that the program would fail frequently but unpredictably. I came to the sad conclusion that Word was not thread-safe, and that all my attempts were for naught.

Today I was reading another of Joel Spolsky's blogs, this time on Office formats. A comment at the end hinted at a new solution:
Use a simpler format for writing files. If you merely have to produce Office documents programmatically, there’s almost always a better format than the Office binary formats that you can use which Word and Excel will open happily, without missing a beat.

In my application, I don't write a Word file, but instead use Word as an automation server. What if I used another program as an automation server? What if I outputted a file with the correct format as HTML, loaded that file in a browser and then printed it? Fortunately straight away I found a snippet of code which shows how to operate IE as a server, and only a few minutes later I had written a test program which worked.

I only have two problems left:
1. Closing IE after printing seems to cause the file not to print; somehow I must add a delay before closing IE.
2. I have to convert my program's output to HTML. Whilst this is going to be extremely tedious, it's not going to be too difficult. Whilst I know how to write code in HTML which produces a table, I don't yet know how to shadow one or more columns.

I'll give this solution an hour or two's work and see how I get on.

No comments: