Sunday, April 07, 2013

A new technique in Word Automation

In the first few months of the year, I spent time creating reports for the Occupational Psychologist in HTML: the reports were intended to be sent by email and this seemed to be the best way of doing this (I probably wrote ad nauseam about doing so). As a result, I managed to restore my HTML skills to the same level that they were about ten years ago. There hasn't really been any need for manual HTML code since, in the same way that there hasn't been any need for manual x86 assembler code. Whilst the latter seems to have gone the way of the dinosaurs, the former can still be useful (indeed, every now and then I manually edit HTML code on this blog; for example, there seems to be no other way of adding a table).

In a totally unconnected fashion, the OP has updated a few of her computers to use Office 2013. I don't know what this version is supposed to bring to the party, but her son was able to obtain licences for a paltry sum ($9?) and so she decided to update. As a result, it seems that certain parts of my Word Automation code don't work as well as they used to; the problem seems mainly to be with tables. In the last few days, there seem to have been all kinds of glitches with tables, making them almost unreadable, and so we needed a new solution.

And so I leveraged my HTML skills. Over the weekend, I spent several hours converting automation code into HTML code; the process is fairly easy but every now and then there are problems. Word allows one to view a table as a two dimensional array of data which is very familiar to programmers (although I find that I hardly ever use arrays any more) and allows one to access cells on a random basis. HTML, by contrast, works in a line by line fashion.

As it happens, the first Word code which I converted to HTML worked in a vertical manner: the Word code would fill in the first column for twenty rows, then the second column for twenty rows, etc. In order to keep my sanity, I declared a two dimensional array within my program, wrote the necessary data on a by column basis, then output it to HTML on a by row basis.

Apart from this extreme example, I also had to deal a few times with sparse arrays: in a row of twelve cells, maybe three have values and the rest are blank. This was easy to deal with using random access code but slightly more difficult with HTML.

I should note that the reports are generated quicker: it is easier to create a text file containing HTML code and then insert that file into a Word document than it is to automate Word.

Here are some examples:

(Word)
wrdTable:= WrdDoc.Tables.Add (wrdsel.Range, 1, 8, 1, 2);
wrdSel.paragraphformat.alignment:= wdAlignParagraphRight;
wrdTable.select;
wrdSel.ParagraphFormat.LineSpacing:= 10;
wrdsel.paragraphformat.characterunitleftindent:= 0;
wrdsel.paragraphformat.characterunitrightindent:= 0;
wrdsel.paragraphformat.lineunitbefore:= 0;
wrdsel.paragraphformat.lineunitafter:= 0;
wrdTable.cell (1, 1).range.text:= 'Scale';
wrdTable.cell (1, 2).range.text:= 'Mark';
wrdTable.cell (1, 3).range.text:= 'Percentile';
wrdTable.cell (1, 4).range.text:= 'Niner*';
wrdTable.cell (1, 5).range.text:= 'Value';
wrdTable.cell (1, 6).range.text:= 'Average';
wrdTable.cell (1, 7).range.text:= 'Std Dev';
wrdTable.cell (1, 8).range.text:= 'Num Items';
(HTML)
with html do
 begin
  add (tabheader + divgrey);
  add (AnsiToUtf8 ('!th!Scale!/th!!th!Mark!/th!!th!Percentile!/th!' +
                               '!th!Niner*!/th!!th!Value!/th!!th!Average!/th!' +
                               '!th!Std Dev!/th!!th!Num Items!/th!!/tr!!/div!'));
end;
As always, I can't actually post the real HTML code here as the Blogger editor interprets my quoted HTML as real HTML (even if using the <pre> tag). The exclamation marks in the above snippet need to be replaced with angle brackets.

There is less fine control with HTML: tables are rendered in some default style. Most of the time, this is not a problem, but here and there I have noticed some functionality which was in the Word program which I cannot reproduce with HTML.

The code is certainly less verbose although I have yet to decide whether this is an advantage. Normally, verbose code is easier to debug and theoretically one can easily get lost in all the HTML tags. One advantage of using HTML is that during development, I can always display what I have coded so far; it is easy to see where tags are missing or code has not been aligned correctly. With Word, the program starts and then crashes whenever the automation is wrong.

I should point out that some reports were a mixture of Word and Excel automation: text would be written, followed by a table, followed by a graph (create by Excel automation, then copied into the Word document). In these cases, I have not touched the Excel automation; most of the Word automation too has been left untouched, and only the tables have been created with HTML (meaning multiple insertions of HTML code).

I await to hear from the OP how the HTML automation is received.

[edit from a few days later: I am chagrined to discovered that I stumbled upon this technique of exporting HTML then reading it into Word several years ago. At the time, the HTML technique seemed to speed things up only by about 10% so I quietly dropped the idea then promptly forgot about it. Today I reread all my blogs about Office Automation and came across this again. Of course, today I have a much more serious problem than speed]

No comments: