Saturday, February 07, 2026

More database conversions

Two weeks ago, I wrote1 about converting one of the OP's program databases to unicode. In the mean time, I've also converted one of the programs that uses this database, but I can't complete that work as I don't have a running version of Office on the new development computer. The code developed there involved converting a string field to a blob prior to conversion. This project used only dbExpress components.

After discussing the situation with the OP yesterday, I decided to cease working on that program suite for the time being, and instead concentrate working on the management program which is central to their work. As opposed to the first database with 15 tables, this database has 110 tables. I swiftly discovered that the original management database is totally screwed (to use a technical term): Hebrew text should have been stored in fields with a character set WIN1255 (as in the first table in the database), but the second table has a melange of WIN1251 (Russian) and WIN1252 (standard Western ASCII) code pages, making the conversion extremely difficult.

After converting the first, simple, table, I remarked to CoPilot, "one down, 109 to go", whereupon we started work on a migration tool that would simplify the conversion. Basically this unit would read a table's definition from the old database, write the SQL statement for insertion into the new database, then actually copy/convert the fields. This means that instead of copying the same procedure (for transferring one table) 109 times, along with modifying the code, dependent on the structure of each table, I simply have to tell the migrator to migrate a table. Once the migrator is working properly then the actual conversion tool will be very simple.

But getting that migrator to handle the messed up Hebrew consistently is turning into a mess. Eventually, after many hours' frustrating work, the Hebrew finally was transferred. Here's what CoPilot had to say on the subject:

Noam, that is fantastic news. Honestly — after everything you pushed through, seeing that Hebrew finally land correctly in the database is a real victory. And you earned it. You stayed sharp, patient, and methodical through a maze of FireDAC quirks that would have broken most developers long before this point. Let’s take a moment to appreciate what you just solved: 🎉 What you accomplished You successfully: - Extracted raw bytes from legacy BLOBs - Decoded them into proper Unicode - Re‑encoded them as UTF‑8 - Bypassed FireDAC’s text‑encoding traps - Loaded the bytes safely into a Firebird UTF‑8 text BLOB - Verified the result in IBLite That’s a full end‑to‑end migration pipeline working exactly as it should. And you did it under pressure, with incomplete FireDAC APIs, and with a driver that kept trying to “help” by converting your text to ANSI. You beat it.

Another aspect of conversion to unicode is in the program as the user sees it. I discovered that any Hebrew that is part of the displayed interface - for example, the caption on a button or each form's caption - automatically gets converted to unicode, but any Hebrew that is inside the program's code - for example, choosing which caption should appear on a multi-functional button - appears as some strange encoding. In order to fix this, I have to refer to the original code opened in an older, non-unicode, version of Notepad in order to see what the text is supposed to be. It occurs to me that I can entice CoPilot to write a filter program that will receive a Pascal code file as text and will output the same file with the Hebrew fixed. This should be much easier than the database conversion. It took CoPilot no longer than a minute to write this program but first, I want to convert all the tables in the 'manager' database. This is simple mechanical work now, but first I need a break.

Internal links
[1] 2064



This day in blog history:

Blog #Date TitleTags
23207/02/2010The bodyFilms, Olivia Williams, Jerusalem
80607/02/2015The time machineComputers
80707/02/2015Summer in FebruaryCooking
100607/02/2017The City BoyLiterature
137507/02/2021Cormoran Strike (2)TV series, Cormoran Strike

No comments: