Saturday, September 26, 2020

Priority procedures cross-referencer


I spent a few hours on Thursday evening, more than a few hours yesterday and several hours today working on the Priority procedures cross-referencer and hopefully completing it.

As in the army, everything divides into three. For this program, the first stage is parsing the input file, then displaying the references and finally displaying the analysis. The first stage can also be split into three: the tokeniser, the lexical analysis and the storage. A token is a string extracted from a text file; for example, if the current line is 'select part, partname from part', then there are five tokens: 'select', 'part', 'partname', 'from' and again 'part'. In programming languages with regular syntax, the tokeniser is normally quite straight-forward, but it turns out that the procedural SQL language of Priority does not have regular syntax and cannot be considered to be context free.

Two examples of the ad hoc syntax: I want to note when a variable is initialised and when it is not. Initialisation can occur in one of two forms: either there is an equals sign after the token (e.g. :SEARCHNAME = '12345') or the keyword INTO precedes the token (e.g. SELECT DAY INTO :DAYS). These two opposite options (one prefix and one postfix, to use the technical terms) make it complicated to program. Another syntactic problem is the colon - :. Normally this serves to mark variables, e.g. :DAYS, but it can also be used to separate between two clauses in a ternary comparison (e.g. :DAYS < 7 ? 3 : 5). 

The correct tokenisation of table aliases (e.g. GENERALLOAD F1) took quite a bit of time.

Storage of the identifiers and their references is by means of a binary tree; this part was based on the cross referencer that I found a few days ago which was written in standard Pascal. The references are stored in a queue for each node. I added a few fields to these variable types in order to store further information: the type of identifier (variable, cursor, table) and the operation in progress at the reference (e.g. variable initialisation, opening a cursor, linking a table). This part was simple. Displaying the references was also fairly straight-forward.

The analysis part is dependent on the type of identifier: there are certain checks for variables, certain checks for cursors and certain checks for tables. I found a method to make these checks as stream-lined as possible.

I tested the program by running it alternately on a short test file into which at times included deliberate errors (so that I could check that the errors were being picked up) and on the file for the procedure that I wrote a few days ago. Every time I would look at the references, noting mistakes that had to be fixed. Now I'm 99% confident that I've correctly parsed the files and have correctly denoted variable initisalisation (this was very complicated). Running the finished program on my procedure finds three variables that were initialised and never used. These can be safely deleted from the procedure.

My next step is to publicise the program within a small community, inviting examples of procedures whose analysis appears to be wrong. Maybe there are other checks that need to be added.

No comments: