Sunday, October 16, 2022

Indexing blogs

Towards the end of 2014, I wrote a simple database program to record all the blogs that I write. The need for this came when I reached 700 blogs and wanted to table the most popular tags in the past 100 blogs. Whilst the blog site allows one to retrieve blogs according to tag, the possibility to retrieve by date is more limited. The external interface allows one to see how many blogs were written in a given month and in a given year, but the blogs themselves can only be accessed individually (the internal interface lacks this). As a database programmer, I am used to more sophisticated options, such as showing all the blogs that were written in a given time period with a given tag. There is no option whatsoever to access blog entries by number.

So I wrote that simple program: it allows me to record the blog titles and their tags (a blog can have more than one tag). The most sophisticated part of the program automatically outputs a 'league table' of tags and counts that I can paste into this blog, as I have done here, for example. I've been content with the program for nearly eight years, but the last few days have seen various developments (or more accurately and in the language of my thesis, enhancements).

The picture on the left shows the parameters tab - one can choose to retrieve from a given date, until a given date, from ID until ID and by tag (ignore for the time being the radio button 'choose multiple tags') and the combobox 'tag 2'). As can be seen (with a little difficulty), I have chosen to display all the blogs written since 01/10/2022.
The lower picture displays the result of that database query. At the moment, 6 blogs have been written since the beginning of the month: they are displayed along with their id number, which is not particularly useful, and the date they were posted (ignore the 'web' column for the time being).

My first improvement was to handle the problem that I couldn't simply switch to the 'data' tab in order to enter new blogs: first I had to retrieve a list of blogs (e.g. from today or the beginning of the month) and only then could I make new entries. In technical terms, the query that displays the blogs hadn't been opened and so I couldn't make new entries. The fix was first to determine when this event would occur (pressing on the 'add' button when the query is closed), then to retrieve the id of the last blog entered, then to increment this number by 1 and place it in the 'from id' edit box, and finally to simulate pressing the 'show' button. This may sound like a lot but it's only a few lines of code. Then I could add new blog entries that would be displayed, because their id would be greater or equal to the value in the 'from id' edit box.

Writing the above makes me realise that I have not made the other buttons ('edit', 'set tags' and others) 'unabled' when there are no blogs displayed. This is a program for my personal use and I am not likely to press one of these buttons! Anyway, I don't think that anything would happen if I pressed those buttons when no data is being displayed.

The second improvement embodied thoughts on a more general topic: I use the above two-tab dialog box frequently in the OP's management 'ERP' program and I use the blog program as a test bed for the ERP program. The 'tag' combobox allows the choice of only one tag: in English, the query would be 'show me all the blogs that have the tag <something>'. In the ERP program, this would be 'show me all the entries for customer X'. I would like to extend this by allowing what is technically called an 'or' query: show me all the blogs that have the tag <X> or those that have the tag <Y>.

I solved this by reusing the dialog box that allows me to choose multiple tags for a given entry (see the picture on the left). Instead of attaching tags to an entry, I created a new table, tmptags,  that consists of one field only, tag; the chosen tags go into this table instead of the 'tag2entry' table.

Then the query in the main form had to be changed in order to use the 'tmptags' table; this was very easy. More difficult was deciding when to bring up this dialog and choose the tags, and how the code called by pressing the 'show' button should use the 'tmptags' table or the tag chosen in the 'tag 1' comboxbox, if at all.

Once I figured this out, it occurred to me that I could make a useful extension, and allow my program to execute the Internet browser and display a given blog. So I added an 'address' field both to the 'entries' table and to the form that allows me to enter or edit an entry. Going back to one of the earlier pictures, the tick in the (new) 'web' column means that double pressing a line will cause that blog to be displayed.

My final improvement (as of last night, of course) was to allow the possibility of choosing two tags and displaying only the entries that have both tags (in SQL, this is an 'and' query). The implementation was fairly simple: add another combobox and alter once again the SQL code for the query. Unfortunately, this time the SQL is complicated, as the 'tag2entry' and 'tags' table have to be referenced twice, meaning that aliases have to be used. Not only that, an alias is required in both usages of each table. This leads to code that looks like this...
select id, case when address is null then 0 else 1 end as web, curdate, name, address from entries inner join tag2entry t2e1 on t2e1.entry = entries.id inner join tags t1 on t1.id = t2e1.tag inner join tag2entry t2e2 on t2e2.entry = entries.id inner join tags t2 on t2.id = t2e2.tag where 1 = 1 and t1.name = '....' and t2.name = '....'
The 'where 1 = 1' is simply a placeholder line for the 'where' clause; this will always be true and is probably ignored by the SQL engine. Its purpose is to allow the following clauses to begin with 'and'; there's no need to check whether a 'where' line has already been added when building the dynamic query.

I have to remember that if I alias one appearance of a table (e.g. tags) in a query, then all appearances of the same table have to be aliased. This probably is the basis for the tendency of beginning programmers in SQL Server to alias everything, regardless of whether there is a need to do so.

The next improvement will be probably be something like 'when was each tag last used', which could be extended to 'which tags haven't been used since date X'.



Following the recent entry about Gillian McPherson, I was interested in seeing what else I had written with the tag '1971'; almost nothing appeared. I thought that this was due to a problem in my code, but eventually I remembered that I hadn't entered the first 600 blogs into my program. As from last night, I intend to enter maybe ten old blogs a day; this isn't easy as noted above: the blogger interface doesn't allow me access by id number. I'll use the external interface (as shown in the picture on the left) to access all the posts in 2005 - these obviously will be numbered 1 - 22. From thereon, I'll have to calculate what will be the id of the next entry that I intend to save in my program.

I thought that somewhere I had documented how to insert two pictures side by side in a blog; here I wrote that I've learnt how to place two pictures side by side in the blog, but I can't find now how I did this. I remember that it had to be done in the HTML view and that no positioning be chosen, but I'm sure that there's more to it than that. I found this morning a blog that shows how to do this with a table, but that wasn't too successful either. After another search, I've found the page that I remember. I'm going to reproduce part of the solution here; this assumes that the pictures have already been loaded into the HTML page.

Each image is surrounded by an opening DIV line.

<div class="separator" style="clear: both;">

and a closing one

</div>

You need to remove all those lines of code.

Then finally for each image you will see this code

style="display: block; padding: 1em 0; text-align: center; "

Remove the display:block; part so that it looks like this.

style="padding: 1em 0; text-align: center; "

Then if you have done it all, when you goto Preview or Compose mode you should see all your images together.

No comments: