Friday, October 10, 2025

How to define the 'sister' relationship/predicate in Prolog?

Previously, I defined the 'sister' predicate as
sister (X, Y):- female (X), parent (Z, X), parent (Z, Y).
In English, this says that one person is the sister of another if the first person is female and the two people share a parent. There are several problems with this definition that derive from Prolog itself and not from my interpreter.

The first problem is that Shaked will be shown as her own sister (as will all the other names defined as females that have a parent)! This is because X binds to Shaked as does Y. Nowhere was it said that X has to have a different value than Y. This is easily solved by adding another term to the rule, using the intrinsic operator \= to signify inequality:
sister (X, Y):- female (X), parent (Z, X), parent (Z, Y), X \= Y.

Whilst this solves the problem in Prolog, it opens up a can of worms in my implementation. Until now, every term has been in the form functor (any arguments); this can be described as prefix notation. Unlike this, the inequality relation as an infix statement, where the functor (or operator) \= appears between the two operands. This causes one routine in the parser to fault, as no parentheses are in this term. So the procedure ParseLine has to be extended to allow for infix expressions that effectively turns that term into the canonical representation \= (X, Y).

But more importantly, the ResolveGoals function has to know how to resolve such a term; obviously \= is not going to match any term in the database so first a check must be made to see if the current goal contains the inequality operator, and if it does, ResolveGoals should succeed if the operands - actually, the values of the operands - are equal (and fail if not). Otherwise, the current goal should be matched against the terms in the database.

Once that problem is out of the way, the query returns four solutions (the actual order of the solutions depends on the order in which the terms are stored in the database):
Solution 1: X = shaked Y = lior Solution 2: X = lior Y = shaked Solution 3: X = netta Y = nir Solution 4: X = netta Y = nir
The first two solutions are as expected, even though as humans we would regard the second solution as redundant. Ths requires us to know that both Lior and Shaked are girls, or at least are represented as females in the database; both names are unisexual in Hebrew so in other databases, Shaked (female) could be the sister of Lior (male) but not the other way around, or Lior (female) could be the sister of Shaked (male) - or maybe not at all, if both names are defined as male.

But what about solutions 3 and 4? This had me scratching my head for some time as to what could be the cause of this duplication as the interpreter is perfectly correct in displaying two solutions. After a while, I understood what was happening: in the first case, the facts parent (noam, netta) and parent (noam, nir) were being matched, whereas in the second case, the facts parent (sarit. netta) and parent (sarit, nir) were being matched.

As CoPilot puts it, accept that duplicates reflect multiple derivations.
This is actually faithful to Prolog’s semantics: each solution reflects a distinct proof path. If you want to show provenance, you could annotate which parent led to the match.

One minor piece of programming that came from me and not from CoPilot was the addition of a crude profiler: this counts how many times ResolveGoals is called and how many times Unify is called. These data show how hard CProlog is working and hints at the efficiency of the rules. I wrote a few paragraphs back that the actual order of the solutions depends on the order in which the terms are stored in the database; the order of the solutions also depends on the order of the terms in the query. 

But more importantly, a distinction has to be made between the logical meaning of a query and its practical (procedural) meaning. The two rules that I show below have the same logical meaning, but one is more efficient:
sister (X, Y):- female (X), parent (Z, X), parent (Z, Y), X \= Y.
sister (X, Y):- parent (Z, X), parent (Z, Y), female (X), X \= Y.

The first rule requires 48 calls to ResolveGoals and 346 calls to Unify, whereas the second requires 68 calls to ResolveGoals and 548 calls to Unify (the actual numbers depend on the number of terms in the database and how many match; if the queries are run agains the same database then the profiler information will show the relative efficiency). Obviously the first rule is more efficient; why is this so? This requires real-world knowledge: one would expect that 50% of the people mentioned in the database are defined as female and 50% are defined as male, so putting the female term first should exclude quite a few facts. The second rule checks the gender of the parent only after finding matching terms for parent, so one assumes that many facts with a male parent will be checked before the gender check, at which time Prolog discovers that it's been wasting its time.

This is a heuristic that I use frequently in SQL: order the terms in terms of their specifity. In other words, if a term will ignore 50% of the data, then it's better to put this before a term that will ignore only 25% of the data.

As it happened, because the 'parent' clauses about Netta came before those about Shaked, the solutions were displayed in reverse order. This is merely a curiosity and not imporant.

Two more very subtle problems arose. Here's the first:
?- father (X, Y). One of the solutions will be X = noam, Y = netta ?- father (noam, X). Did not return any solutions ?- father (noam, Y). Returned solutions as expected.
This is clearly the case of what might be called the variable renaming bug. Putting it briefly, the X in the rule father (X, Y) should not be the same as the X in the query father (noam, X). That wasn't too difficult to fix.

I wanted to check wheter recursive rules work. Here is a simple example
ancestor (X, Y):- parent (X, Y). ancestor (X, Y):- parent (X, Z), ancestor (Z, Y).
Thus I am the ancestor of any of my children, but I am also the ancestor of their children and will be of their children's children (the oldest is only 9 so there are no great-grandchildren!). Again, if I simply queried ?- ancestor (X,Y), then all the correct solutions would be presented (although at first there was a problem that might be called 'environmental leak'), but if I queried ?- ancestor (noam, X), there would be presented five solutions where in all of them X = noam. The number of solutions is correct (two children, three grandchildren) but of course the displayed answer is wrong. This took quite a time and several iterations of code before it was fixed. Part of the problem was the hack that CP had added to prevent the same variable appearing more than once.

I am stopping development here, as the interpreter now does everything that I want it to - which isn't actually very much. I suspect that I am much more interested in learning how to write the interpreter than actually use it!

As I wrote previously, [it's] more interesting is that CP didn't really 'understand' at first what was necessary. I could compare the development process as someone (for example, the OP) asking for something complicated from me; I would create a first version and test it to a certain extent. Then she would say, yes but it doesn't do something, so I would have to rewrite part of the program in order that it could do that specific something. Yes, but it doesn't do something else ... so on and on. The complete solution was arrived at after time - this is called the spiral methodology of programming. In this respect, CP was 'human': we started off with a short program that was continually expanded as more and more functionality was required. This interpreter get quite complicated - and only implements a subset of pure Prolog. For example, the whole topic of lists is not included.



This day in blog history:

Blog #Date TitleTags
41310/10/2011Watching the weight / 2Diet, Acupuncture
76410/10/2014Continuing to watch the weightTV series, Health, Walking
98110/10/2016Pain whilst walkingWalking
108010/10/2017HersonissosHoliday, Greece
126310/10/2019CosmologyPersonal, Gateway, Non-fiction books
134510/10/2020Simple baked salmonCooking

No comments: