Friday, March 12, 2021

The continuing saga of saving unicode text - this time in the 'manager' program

Due to Covid-19, for the past year almost all of the customers/examinees for the Occupational Psychologist have been completing their exams via the Internet; someone developed a method using Google Forms (or docs, I'm not sure) to display one or two exams and to receive the output. These output files are very simple text files, normally in the form of an INI file, and there has been no difficulty in reading these files.

There is one form - not an exam - that is causing problems now: the personal details of  the examinees. I've received two files in which all the data has been written in Russian, and of course all our tools refuse to read this (or rather, replace each Russian character with a question mark). My previous forays into unicode (for example, this blog) had a program reading data into a database consisting of one or two tables, and so it was very easy (once I understood what to do!) to create a new database for these data. But we want to read the examinees' personal data into the 'manager' program (which is conceptually an ERP system) that has, at present, 101 tables in its database and around 260 different forms. There is no easy way of converting this system!

On the basis of the above-mentioned blog, I created a new test database consisting one table. There are two very important definitions: the database itself has a default charset UTF-8, and most of the fields within the single table have the UNICODE_FSS charset. I was able to input manually the Russian data into this table, save it and then display it (albeit in the database manager - I haven't got as far as trying this out in Delphi yet). 

This is as far as I have got so far, and I wanted to document all this before I go any further.

A solution that I am considering is to maintain this UTF-8 database along with the regular database. The 'people' table will contain an id field - not an autoincrement, but the same number as in the regular database - along with the fields that need to be in unicode. Fortunately, these fields do not appear in any reports so they can be maintained separately. The form that displays the data will have to get half of its fields from the regular database and the other half from the UTF-8 database; I think that this is feasible. 

I will create a stand-alone form that will display this double data so that I can solve all (?) the problems in a simple test harness before I try to integrate this form into the rest of the ERP program. 

No comments: