Thursday, May 15, 2008

Unicode resource files

I have mentioned in passing the 'flagship' product which I have developed with my occupational psychologist. This is not so much a product as a suite of programs -
  1. Admin - this is where all the tables are defined, and basically all the knowledge is here. I am constantly developing this program as we are always adding changes.
  2. Exam - this is what the user sees. This program presents 400 statements with which the examinee can either agree or not. The answers are stored in a file which gets passed on to the next stage. Very little development occurs with this program as it is supposed to be frozen.
  3. Results - in this program, the output file from the exam is read and various reports are created, based on the values in the output file and the in the various data tables.
'Admin' and 'Results' were separated into two programs in order to allow a certain amount of security. The worker who prints the results is not able to make any changes in the knowledge database.

The exam was initially in Hebrew only, but at some time, we translated the statements into English, and allowed the possibility of running the exam in this language. The statements were held in a table with a simple structure (id, text, alive [some statements are no longer presented]), and in order to allow the program to be run in English, I added a field to this table in which the translated statements are stored.

Originally, the exam was run in our lab against the database, but when the wish arose to deploy the exam on the Internet, I had to find an alternative solution. I did this by extracting all the statements from the database and storing them in stringtable format in a resource file. This had the added benefit that I could strip from the exam program all the code which handles the database, and as a result, the final program (including all the statements twice) is smaller in size than the original db-based exam.

As we say in Hebrew, "with the food comes the appetite". Once this had been done, the desire to add a Russian interface was aired. This makes a great deal of sense, as there are far more Russian speakers in Israel than there are mother-tongue English speakers, and in most cases, the Russians' Hebrew skills are far lower than the English speaking Hebrew skills.

So we sent the statements file off for translation, and this returned last week as a Word file filled with Cyrillics. I copied this file into Notepad, gave it the structure needed for a stringtable, and saved it with the 'rc' extension, signifying to all and sundry that this is a resource file. When I saved the file initially, I received a warning saying that if I saved it as an ANSI file, I would lose important information. So I saved it as Unicode.

Trying to compile this file with Borland's resource compiler, brcc32, did not meet with much success (although in retrospect, this might well have been because there were still errors in the resource file). Then I realised that even if I managed to create a compile resource file (with the 'res' extension), my program would be unable to read it, as unicode characters take up twice the space of 'ordinary' characters, and need special routines to read and display.

Following this, I spent the next few days learning about unicode characters under Windows, how to save them in resource files, how to extract them and how to display them. I found a freeware resource compiler which works admirably (gorc), and eventually found Delphi components which can handle unicode. I put everything together, and to my surprise and delight, the exam program can now be operated in Russian mode.

Now that this has been done, we can add even more languages, such as Spanish and even Swahili (gasp) without having to undergo the learning cycle of the past week.

I'm quite chuffed with myself.

No comments: