Thursday, December 15, 2011

Front end program for converting HTML to PDF

I was approached by someone (not in my company) with a peculiar problem, connected with our mutual ERP program, Priority. This person has to send to a medical insurance company copies  of invoices issued to certain customers each month, in PDF format.

Until she approached me, she had been selecting and displaying each invoice separately (Priority uses Internet Explorer as its default display mechanism for all reports, including invoices), then sending the invoice to be 'printed' via a PDF printer (thus creating a PDF file) and then renaming the file to be the name of the customer (unfortunately, Priority gives an arbitrary name to files it creates). This process was taking a few hours every month and I was asked to speed it up.

My initial action was to change a definition in the PDF printer driver, so that the driver would create a separate file for every page sent to it. I then showed the person how to select and display all the necessary invoices in one go and send the data to the PDF printer, thus creating several files. Unfortunately, there still existed the need to rename each file. This simple action maybe cut the time needed by 25% but it wasn't enough.

There had to be a better way. The first stage in improving the process was to write a little program (a stored procedure, really) in Priority which creates one html file per invoice, where the file is named according to the customer name and the invoice number. Unfortunately, it transpired that it was not possible to create such a file when the customer name was in Hebrew, and as all the customers' names are in Hebrew, this was quite a problem. So I substituted the customer identity number (not the customer number) for the customer name and was able to create separate html files.

Then there only remained the problem of creating pdf files from these html files. At one stage, I asked whether the medical insurance company would be willing to receive html files, but the answer was negative. After contemplation, I realised that I needed a program to convert html to pdf in order to provide a complete solution.

Like many problems, this was easier said than done. I spent several hours googling html pdf convert and then checking out the links. Most of the answers which I received pointed to online services, which would not be suitable. Of the answers remaining, most of these were for commercial programs. I found one program which seemed to be free; I downloaded it and tried it out but saw that it lacked a batch mechanism (my idea required to convert 10+ files in one go). I tried a trial version of a commercial program: whilst this had a batch interface, it included a 'trial message' version in the pdf, and more damning, failed to convert a Hebrew html file.

Eventually I found a command line program (CLP) called wkhtmltopdf, written by a enthusiast programmer (as opposed to a commercial program written by a professional programmer). My experience is that such programs are often better than commercial ones, although getting them to work can be awkward. First I checked that this program could convert a Hebrew html file correctly (yes). Then I set about writing a front end interface program - this program would list all html files found in a specific directory, allow the user to choose which files to convert and then pass these files to the CLP for conversion.

My first version of the front end program wasn't too successful as it tried to start about ten simultaneous instances of the CLP; this brought my computer to its knees. Whilst waiting for my computer to reboot, I realised that I needed to send one file to the CLP, wait for it to finish converting, and then send the next file.

Another problem which I encountered was that the CLP couldn't handle files which were stored in a subdirectory of c:\program files. I solved this program by using a routine which creates short directory names (eg c:\progra~1). The 'executing' flag was added as a precaution that the user not close the program before all the chosen files were converted; the program's CanClose procedure checks the value of this variable and refuses to close if executing is true. Edit1.text hold the location of the CLP.

Here is the interesting part of the code.
procedure TForm1.ConvertBtnClick(Sender: TObject);
var
 i: integer;
 ExitCode: DWORD;
 mydir, htmlname, pdfname: string;
 SEInfo: TShellExecuteInfo;

begin
 ConvertBtn.enabled:= false;
 executing:= true;
 FillChar (SEInfo, SizeOf (SEInfo), 0);
 with SEInfo do
  begin
   cbSize:= SizeOf (TShellExecuteInfo);
   fMask:= SEE_MASK_NOCLOSEPROCESS;
   Wnd:= Application.Handle;
   lpFile:= PChar(edit1.text);
   nShow:= SW_SHOWNORMAL;
  end;

 mydir:= IncludeTrailingPathDelimiter (shortdir (dlb.directory));
 for i:= 1 to lb.items.count do
  if lb.Checked[i-1] then
   begin
    htmlname:= mydir + lb.items[i-1];
    pdfname:= copy (htmlname, 1, length (htmlname) - 4) + 'pdf';
    seInfo.lpParameters:= PChar(htmlname + ' ' + pdfname);
    if ShellExecuteEx (@SEInfo) then
     repeat
      Application.ProcessMessages;
      GetExitCodeProcess (SEInfo.hProcess, ExitCode);
     until ExitCode <> STILL_ACTIVE;
   end;
 executing:= false;
end;
It's not enough to use the Windows API ShellExecute procedure as this simply executes the given program; this is how I managed to create all the simultaneous instances of the CLP.

No comments: