\chapter{Testing and Analysis of the Performance}
In this chapter, we test the performance of the \TR implementation under
{\em XSB Prolog} (Version 1.4.1, 1994), {\em Quintus Prolog} (Version 3.1, 1994),
and {\em Sicstus Prolog} (version 0.6, 1987). Tests on XSB Prolog and Quintus
Prolog are carried out in Sun Sparc-5, 32MB Main Memory, 64MB Virtual Memory.
Although some tests are carried out on Sicstus Prolog, since it is an
older version and is running on a different machine (Sun Sparc-10, 64MB
Memory), it is used as a reference only. In the following tests, most of
them involve using graph algorithms, with the graph stored in the database.
The tests are divided into several categories, each showing a different
feature of the implementation. These categories are:
\begin{enumerate}
\item simple updates;
\item connected components, which tests the performance in large, growing
databases in which many tuples are created and destroyed;
\item all pairs shortest paths problem, which shows the use of keys in defining
tuples and is the first test with non-linear complexity;
\item counter, which tests the use of the ``{\em set}'' command and the performance
of recursion depth on a small, fixed database; and
\item backtracking, which consists of two tests: Hamiltonian cycle and clique,
which test backtracking and undoing of updates in small and large
database, respectively.
\end{enumerate}
Some notes on the tests:
\begin{enumerate}
\item All tests were performed as the only application running on the machines
to avoid any possible effect from other processes running on same
machine.
\item Each test was run once only, so possible variance in results from
run-to-run is unknown.
\item A fresh Prolog session was started for each test.
\item Runtime was measured by executing $statistics/1$ ( a standard Prolog
predicate ( before and after each transaction query. The predicate
statistics is defined in most Prolog (including XSB, Quintus, Sicstus and
C Prolog) which prints a lot of information on the use of memory,
including cputime usage since the start of the Prolog session. The
difference in cputime between two call to statistics is the runtime
of the transaction query.
\item All tests on XSB Prolog started with the allocation of 30,000 Kbytes
to local/global stack area and 15,000 Kbytes to trail/choice-point
stack area, i.e., XSB is started with the command {\em xsb -i -m 30000 -c
15000}.
\item Tuples were declared with suitable size for reserved space. For example,
in Section 5.1, reserved space for $p/1$ should be the number of
tuples to be created in the test. These indexes have effect on tests
in XSB Prolog only. For Quintus Prolog and Sicstus Prolog, predicates
have indexes on the first argument by default. \\
\end{enumerate}
In all graphs, results of Prototype 3 in XSB Prolog, Quintus Prolog,
and Sicstus Prolog are labelled by XSB3, Quintus3, and Sicstus3 respectively.
Results of Prototype 4 are labelled by XSB4, Quintus4, and Sicstus4 for the
corresponding prototypes.
In the theory of algorithms, graphs are often represented as arrays,
so that any node or edge can be accessed in constant time. Using the
\TR program, by declaring proper indexes on graph predicates, tuples can
be retrieved in constant time. With this reason, graph problems can be
run in expected time in XSB version of \TRR.
All test results are plotted in two dimensions. We use three types of plots:
\begin{enumerate}
\item time versus number of nodes, edges, or components, etc.;
\item logarithm of time versus logarithm of number of nodes, edges or
components etc., ({\em log-log} graph); and
\item time versus logarithm of number of nodes, edges or components, etc.,
({\em log} graph).
\end{enumerate}
The first type of plot gives a general picture of the complexity of a
program. A straight line means the complexity is linear.
The log-log graph is used to find the degree of the polynomial when t
he complexity is polynomial time. For instance, suppose for an algorithm
with polynomial complexity,
\begin{eqnarray}
t & = & n^p \nonumber \\
log(t) & = & p\; log(n) \nonumber
\end{eqnarray}
Then, if the graph of {\em log(t)} versus {\em log(n)} is a straight line,
the complexity of the program is polynomial and the slope of the graph is the
degree of polynomial. Finally, The {\em log} graph is used to find the degree of the
exponent. For an algorithm with exponential complexity,
\begin{eqnarray}
t & = & a\; k^{bn} \nonumber \\
log(t) & = & log (a)\; +\; n\; b^{log (k)} \nonumber
\end{eqnarray}
Therefore, if the graph of {\em log(t)} versus {\em n} is a straight line, the
complexity is exponential with the slope of the line being $b\; log(k)$
where $b$ is the degree of exponent and $k$ is the base of exponent. Some notes for the analysis:
\begin{enumerate}
\item Logarithm with base 10 is used in this chapter.
\item Only Prototype 3 and Prototype 4 are tested because Prototype 1 and
Prototype 2 are designed for development purpose only.
\item Slopes of each graph are estimated using a least-square fit method, i.e.,
\begin{eqnarray}
slope & = & \frac{\sum x\; \sum y\; - n\ \sum xy}{ (\sum x)^2 \; - n\ \sum x^2} \nonumber \\
& & {\normalsize where}\; n\;
{\normalsize is\; the\; number\; of\;
points\; in\; the\; graph} \nonumber
\end{eqnarray}
\item In some of the tests, the complexity is measured in terms of the number
of edges rather than the number of nodes. Since we have chosen the
graphs with the number of nodes proportional to the number of
edges, the complexity of a problem in terms of the number of edges
is the same as in terms of the number of nodes.
\end{enumerate}
\section{Basic Updates}
\subsubsection{Insertions:}
In \TR programs, an insertion of a tuple is different from an assert in
Prolog, since it takes an extra step to check if the tuple already exists
in the database. To test the implementation with different prolog systems,
we first compare their performance in inserting a number of tuples. We use
the following \TR program:
\begin{center}
\begin{tabular}{l}
$declare\; p/1\; size\; 20000.$ \\
$test1(0)$. \\
$test1(N) \leftarrow \; create(p(N))\; sc\; M\; is\; N-1\; sc\; test1(M).$ \\
\end{tabular}
\end{center}
\begin{figure}[t]
\hfil\epsfbox{fig51a.ps}
\caption{Graph for time required to create a number of p(X) tuples.}
\end{figure}
\begin{figure}[tb]
\hfil\epsfbox{fig51b.ps}
\caption{Log table of time required to create p(X) tuples}
\begin{center}
\begin{tabular}{l}
slope(XSB3) $\approx$\ slope(XSB4) $\approx$\ slope(Quintus3)
$\approx$\ slope(Quintus4) $\approx$\ 1.0
\end{tabular}
\end{center}
\end{figure}
If each tuple creation takes constant time, then the time for the execution
of $test1(N)$ should be linear in $N$. {\em Figure 5.1} shows the performance
of the program in Prototype 3 and Prototype 4. Since the graphs in the
$log-log$ plot in {\em Figure 5.2} are linear, the complexity of all
implementations is polynomial. In all cases, the slopes are approximately
1 and the degree of polynomial is 1, i.e., linear. Thus, the time required to
create a single tuple is constant. Note that we did not perform this test on
Sicstus Prolog because of its lengthy execution time. In both Sicstus prototypes,
it takes over 30 minutes to create just 20,000 tuples!
\subsubsection{Indexes:}
The previous program creates a number of p-tuples in the database. As expected,
the execution time increases linearly with the number of p-tuples. However,
consider the following slightly different example for creating 20,000 binary
tuples:
\begin{center}
\begin{tabular}{l}
$declare\; p/2\; size\; 20000.$ \\
$test2(0).$ \\
$test2(N) \leftarrow\; create(p(1,N))\; sc\; M\; is\; N-1\; sc\; test2(M).$
\end{tabular}
\end{center}
\begin{figure*}[t]
\hfil\epsfbox{fig51c.ps}
\caption{Graph of creating p(N,1) tuples.}
\end{figure*}
\begin{figure*}[t]
\hfil\epsfbox{fig51d.ps}
\caption{Log-log graph of creating p(N,1).}
\begin{center}
\begin{tabular}{ll}
slope(XSB3) = 0.998 & slope(Quintus3) = 1.152 \\
slope(XSB4) = 0.994 & slope(Quintus4) = 0.994
\end{tabular}
\end{center}
\end{figure*}
The program works well in XSB Prolog. However, in Quintus Prolog, the program
takes over half of an hour in both prototypes 3 and 4 to create just 20000
tuples! The poor results are due to the default index in Quintus Prolog,
which is set on the first argument. Since the first argument of $p/2$ is
always 1 in this example, searches in database are exhaustive in this case.
If we interchange the arguments of $p/2$, we obtain the graph in {\em Figure 5.3}
and {\em Figure 5.4}, then complexity is linear in Quintus Prolog as well.
\subsubsection{Alternating Inserts and Deletes:}
Apart from inserting tuples, we can also test the performance of the
program by repeatedly creating and destroying a tuple. In the next example,
there is at most one tuple exists in database at any time. However, we
should still reserve considerable tuple space in Prototype 3 (but not in
Prototype 4) because there are N insert tags and delete tags for the key
inserted in the database (where N is the number of of tuples).
\begin{center}
\begin{tabular}{l}
$declare\; p/1\; size\; 20000.$ \\
$test3(0).$ \\
$test3(N) \leftarrow\; create(p(N))\; sc\; destroy(p(N))\; sc\;
M\; is\; N-1\; sc\; test3(M).$ \\
\end{tabular}
\end{center}
The result is shown in {\em Figure 5.5} and {\em Figure 5.6}.
\begin{figure*}
\hfil\epsfbox{fig51e.ps}
\caption{Results of repeatedly creating and destroying a tuple.}
\end{figure*}
\begin{figure*}
\hfil\epsfbox{fig51f.ps}
\caption{Log-log graph of repeatedly creating and destroying tuples.}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 1.004 & slope(Quintus3) = 1.069 \\
slope(XSB4) = 1.001 & slope(Quintus4) = 1.014 & slope(Sicstus4) = 1.19
\end{tabular}
\end{center}
\end{figure*}
The difference between Prototype 3 and Prototype 4 is that in Prototype 4,
tuples are removed from database immediately after they are created but in
Prototype 3, a delete\_tag on the key of the tuple is added to remove a tuple
from database. So, in general, Prototype 4 should be faster because database
size remains fairly constant (and small) throughout the execution.
Interestingly, the above transaction program has better performance
in Prototype 3 than Prototype 4 on XSB prolog. This shows that it is faster
to assert a predicate than to retract it from database. In the implementation,
$retract/1$ is used to remove a predicate from database. But in XSB prolog,
$retract/1$ reclaims program space every time it is executed. Another predicate
$retract\_nr/1$ is provided by XSB prolog which can remove a predicate from
database without reclaiming program space. It has faster execution time, at
the expense of no space reclamation.
\clearpage
Consider the results of Prototype 4 only, surprisingly, Sicstus
Prolog has the best performance. There are three possible explanations:
first, Sicstus prolog has optimizes tail recursion; second, retraction of
tuples is the fastest in this Prolog; third, Sicstus Prolog has fast updates
on small database. However, comparing the results of Prototype 3, Sicstus
Prolog has the worst performance (which is not shown in the graph). The
execution of $test3(20000)$ requires over 2000 seconds to complete whereas
same executions on other Prologs require less than 20 seconds as shown in
the graph. One can explain this result as the slow search time for large
database in Sicstus Prolog (or more exactly, slow search time for predicates
with large hash tables).
\section{Bulk Updates}
\subsubsection{Bulk Insertion:}
One of the new features provided by Transaction Logic the ability to perform bulk updates.
It provides a convenient way to copy one set of tuples to another. The following example
shows the performance of the program on bulk inserts from $p(X,Y)$ to $q(X)$. The tuple
$p(X,Y)$ is declared with the first argument as key, and the whole tuple of $q(X)$ is declared as
the key. So, if the database contains $p(1,1), p(2,1), p(3,1), \ldots , p(N,1)$,
$bulk\_ins(p(X,Y),q(X))$ should create $q(1), q(2), q(3), \ldots , q(N)$ in the database.
{\def\topfraction{.5}
\begin{figure*}[t]
\hfil\epsfbox{fig52a.ps}
\caption{Graph of the performance of bulk\_ins(p(X,Y),q(X)).}
\end{figure*}
}
\begin{figure*}[t]
\hfil\epsfbox{fig52b.ps}
\caption{Log table of the performance on bulk\_ins(p(X,Y),q(X)) with N p(X,Y) tuples.}
\begin{center}
\begin{tabular}{ll}
slope(XSB3) = 1.05 & slope(Quintus3) = 1.15 \\
slope(XSB4) = 1.00 & slope(Quintus4) = 1.00
\end{tabular}
\end{center}
\end{figure*}
\begin{figure*}[t]
\hfil\epsfbox{fig52c.ps}
\caption{Graph of comparing the performance on bulk deletion of N q(X) tuples.}
\end{figure*}
\subsubsection{Bulk Deletion:}
{\em Figure 5.7} shows that bulk insertion is linear in all cases. However, when considering bulk
deletion, in {\em Figure 5.9}, the complexity is also linear but Prototype 4 performs much better
than \mbox{Prototype 3}. The reason is that without the need for backtracking, the built-in
directive $retractall/1$ is used to bulk-delete tuples. So, it is a subject for future work to
develop a more efficient implementation of $bulk\_del/1$ in Prototype 3.
\clearpage
\section{Connected Components}
Having tested some basic database updates, we now test compare the performance of the
\TR implementation on some graph problems. The program for finding connected
components uses a while\_loop to repeatedly apply depth-first search to all unvisited nodes.
During execution, each edge is traversed once, so the complexity should be linear in the
number of edges, and thus linear in the number of nodes for the graphs used in our tests.
It is interesting to note that the program exhaustively searches all the nodes and there are
no choice points in the rules, so there is no backtracking during the execution. We can
therefore use the Prototype 4 on this problem for faster speed.
\begin{tabbing}
tabb \= findComponentsss \= :-- \= \kill
\> $declare\; unvisited/1\; size\; 1000.$ \\
\> $declare\; component/2\; size\; 1000.$ \\
\> $findComponents$ \> $\leftarrow$ \> $unvisited(P)\ :=\ node(P)\; sc$ \\
\> \> \> $while\; unvisited(N)\; do\; dfs(N,N).$ \\
\> $dfs(N,N1)$ \> $\leftarrow$ \> $create(component(N,N1))\; sc $ \\
\> \> \> $destroy(unvisited(N1))\; sc$ \\
\> \> \> $with\; [N,N1]\; while\; (edge(N,N1),\ unvisited(N2))\;
do\; dfs(N,N2).$
\end{tabbing}
The above rules show the \TR program for finding the connected components of a graph
with 1000 nodes and a single component. The database for the graph is loaded explicitly,
and space for the nodes and edges is reserved during the declaration.
The graph used in the test is just a simple cycle, i.e., a line in as the following figure:
\begin{figure*}[h]
\hfil\epsfbox{cycles.ps}
\caption{Cycles with 4 nodes and 12 nodes.}
\end{figure*}
The graph has only 1 component and the number of edges is proportional to the total
number of nodes. It is clear from the program that each node is visited once during the
execution. Since there is no backtracking involved in the execution, we can use Prototype 4
for the test. The program was tested by varying on the number of nodes, and the
complexity of the program is expected to be linear.
The graph of time versus number of nodes, and the corresponding log-log graph, are
plotted in {\em Figure 5.11} and {\em Figure 5.12}.
\begin{figure*}[tb]
\hfil\epsfbox{fig53a.ps}
\caption{Graph comparing the performance for finding connected components.}
\end{figure*}
\begin{figure*}[t]
\hfil\epsfbox{fig53b.ps}
\begin{center}
\caption{Graph of log(number of nodes) against log(time) for the
connected component test.}
\begin{tabular}{lll}
slope(XSB3) = 1.157 & slope(Quintus3) = 1.982 & slope(Sicstus3) = 1.983 \\
slope(XSB4) = 1.335 & slope(Quintus4) = 1.834 & slope(Sicstus4) = 1.989 \\
\end{tabular}
\end{center}
\end{figure*}
From {\em Figure 5.11}, as expected, the program is running faster in Prototype 4 than in
Prototype 3. The improvement is not too significant because the declared predicates,
$unvisited/1$ and $component/2$, use their arguments as a key. In this case, the cost
of adding tuples to database in Prototype 3, is similar to that of Prototype 4. For
this problem, the speed of each Prolog can be ranked from fastest to slowest as follows:
XSB $>$ Quintus Prolog $>$ Sicstus Prolog.
Consider the graph in {\em Figure 5.12}, the complexity of Prototype 3 in XSB is linear, as we
expected. However, the degree of polynomial is 1.3 in Prototype 4. One explanation for
such poor behavior is that as the number of nodes increases, the database size increases
linearly, but the time required for allocating trail points, check points, etc., does not
increase linearly.
Apart from the slight defect of Prototype 4 in XSB Prolog, abnormal results also appear
for both prototypes in Sicstus Prolog and Quintus Prolog. As shown in {\em Figure 5.11}, the
complexity of the program becomes quadratic under these two languages. The reason is
that since there is limited indexing in Sicstus Prolog and Quintus Prolog, the search times
for some predicates are exponential. This is significant for the predicate ``$component/2$''.
The first argument of component is regarded as the component name, and the second
argument is a node in the component. In this example, since there is only 1 component, the
number of tuples for component is same as the number of nodes, but the first argument of
component is always the same. When the default index is set on the first argument for
Sicstus Prolog and Quintus Prolog, the search time for the component predicate increases
quadratically with the increase in the number of nodes.
In this case, there is a way to overcome the quadratic complexity by interchanging the
arguments of $component/2$. The \TR program then becomes:
\vspace{3mm}
\begin{tabbing}
tabb \= findComponentsss \= :-- \= \kill
\> $declare\; unvisited/1\; size\; 1000.$ \\
\> $declare\; component/2\; key\; 1\; size\; 1000.$ \\
\> $findComponents$ \= $\leftarrow$ \= $unvisited(P)\; :=\; node(P)\; sc$ \\
\> \> \> $while\; unvisited(N)\; do\; dfs(N,N).$ \\
\> $dfs(N,N1)$ \> $\leftarrow$ \> $create(component(N1,N))\; sc$ \\
\> \> \> $destroy(unvisited(N1))\; sc$ \\
\> \> \> $with\; [N,N1]\; while\; (edge(N,N1),\ unvisited(N2))\; do\; dfs(N,N2).$
\end{tabbing}
\vspace{3mm}
\begin{figure}
\hfil\epsfbox{fig53c.ps}
\centering
\caption{Graph comparing the performance of finding connected components
after reversing the arguments in component/2.}
\end{figure}
In this way, the indexing will be on the name of the node rather than the name of the
component. Since each node is in one and only one component, the search time should be
linear. The result of testing this program is plotted in {\em Figure 5.13} and {\em Figure 5.14}.
As shown in the above graphs, the performance of XSB Prolog is roughly the same in both
versions of the problem. However, the performance in Quintus Prolog has improved a lot.
In the second version, the complexity is much closer to linear (degrees of polynomial are
1.35 and 1.2 for Prototype 3 and Prototype 4, respectively). This means that the order of
arguments in storing tuples is important for running \TR program in Quintus Prolog.
Although Sicstus Prolog has a default index on the first argument, the complexity of the
Second version running on Sicstus Prolog are still quadratic in both prototypes. The poor
performance can be explained by Sicstus's slow tuple access time in large databases as
shown in Section 5.1. Although the first argument of the predicate is indexed, the
improvement is not significant enough to overcome the deficit of the increase in search
time caused by the growth of the database.
\begin{figure}
\hfil\epsfbox{fig53d.ps}
\caption{Graph of log(time) versus log(number of nodes) for the problem of
finding connected components after reversing the arguments in component/2.}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 1.02 & slope(Quintus3) = 1.352 & slope(Sicstus3) = 1.985 \\
slope(XSB4) = 1.181 & slope(Quintus4) = 1.204 & slope(Sicstus4) = 1.989 \\
\end{tabular}
\end{center}
\end{figure}
One may complain that the graphs are too simple in the previous test, and doubt if the
program will work well on more complex graphs. To address this issue, we have performed
some tests on cycles of graphs with a clique-size of 4 (as in {\em Figure 5.15}).
\begin{figure*}[h]
\hfil\epsfbox{clique.ps}
\caption{Cycles of 4 cliques and 8 cliques with clique-size 4.}
\end{figure*}
{\em Figure 5.16} and {\em Figure 5.17} shows the results of executing the second version of connected
components program to a cycle of cliques with clique-size 4. As a result, the performance
of the programs using the clique graphs is very similar to that of using the simple cycle.
\begin{figure}
\hfil\epsfbox{fig53e.ps}
\caption{Performance of finding connecting components using a cycle of
cliques with clique-size 4}
\end{figure}
\begin{figure}
\hfil\epsfbox{fig53f.ps}
\caption{Log-log graph of finding connected components with cycle of cliques with
\mbox{size 4}}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 0.968 & slope(Quintus3) = 1.156 & slope(Sicstus3) = 1.970 \\
slope(XSB4) = 1.036 & slope(Quintus4) = 1.138 & slope(Sicstus4) = 1.986 \\
\end{tabular}
\end{center}
\end{figure}
\clearpage
\section{All Pairs Shortest Paths Problem}
This example shows the advantage of using keys. Intuitively, the $dist(X,Y,Z)$ predicate
means that the minimum path length from node $X$ to node $Y$ is $Z$. Since each path has a
unique minimum length, the predicate can be defined with the first two arguments as key,
leaving the last argument as the value. We can use the $set$ predicate to change these
distance values. The program executes the minimum path algorithm n times. Since the
minimum path algorithm searches each edge once, with the number of edge proportional to
the number of nodes in our graph, the minimum path algorithm is of order of n. So, the
complexity of the program should be on the order of $n^2$. Detailed explanation of the
program can be found on Section 3.5.1.
\begin{tabbing}
tabbingg \= declarrr \= with r \= \kill
\> {\em declare tempD/1 key 0.} \\
\> {\em declare front/1.} \\
\> {\em declare dist/3 key 2 size 100.} \\
\\
\> $apsp\ \leftarrow$ \> $create(tempD(0))\; sc$ \\
\> \> $forall\; [R]\; in\; node(R)\; do\; bfs(R).$ \\
\\
\> $bfs(R)\ \leftarrow $ \> $bulk\_del(front(X))\; sc$ \\
\> \> $set(tempD(0))\; sc$ \\
\> \> $create(front(R))\; sc$ \\
\> \> $create(dist(R,R,0))\; sc$ \\
\> \> $with\; [R]\; while\; newFront(R,\_)\; do$ \\
\> \> ( \\
\> \> \> $tempD(X)\ sc$ \\
\> \> \> $D\ is\ X+1\ sc$ \\
\> \> \> $set(tempD(D))\ sc$ \\
\> \> \> $front(M)\ :=\ newFront(R,M)\; sc$ \\
\> \> \> $bulk\_ins(front(N),\ dist(R,N,D))$ \\
\> \> ).
\end{tabbing}
Note that since $newFront(\_,\_)$ is in the condition of a while loop, it must be defined in the
database, rather than in the transaction base. Therefore, rules defining $newFront/2$ are not
included in the above transaction program. These rules should be loaded into the database
separately. The definition of $newFront/2$ is as follows:
\begin{eqnarray*}
newFront(R,N) & :- & front(M),\; edge(M),\; not(visited(R,M)). \\
visited(R,N) & :- & dist(R,N,\_).
\end{eqnarray*}
In the worst case, to calculate the minimum path from one node to another, all edges are
visited once. Therefore, the complexity is in the order of $e \times n$, or $n^3$ in worst case,
where $e$ is number of edges, and $n$ is number of nodes. In our graphs, $e$ is proportional
to $n$, so the expected complexity is on the order of $n^2$.
Like the connected-components problem, the above program is tested on a sequence of
graphs of increasing size, each consisting of a single cycle of nodes, Such graphs force the
loop to iterate as much as possible. The test results are shown in {\em Figure 5.18} and the
corresponding log-log graph is shown in {\em Figure 5.19}.
\begin{figure}
\hfil\epsfbox{fig54a.ps}
\caption{Result of All Pairs Shortest Path Problem.}
\end{figure}
As expected, the complexity of the program is approximately quadratic when running
under XSB Prolog. The complexity of the program under Sicstus Prolog has polynomial
degree 1 higher than XSB Prolog, again because of the $O(n)$ access time to individual
tuples. However, Quintus Prolog good complexity in this problem, which is also close to
quadratic, especially in Prototype 4. One explanation for Quintus's good performance on
this problem is that Quintus may be using larger hash tables for indexing predicates than
Sicstus. So, when the number of nodes is not too large (less than 100), the increase in
access time is negligible with the increase in number of nodes.
\begin{figure}
\hfil\epsfbox{fig54b.ps}
\caption{Log-log graph for all pair shortest path problem.}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 2.40 & slope(Quintus3) = 2.79 & slope(Sicstus3) = 3.62 \\
slope(XSB4) = 2.40 & slope(Quintus4) = 2.21 & slope(Sicstus4) = 3.68 \\
\end{tabular}
\end{center}
\end{figure}
\begin{figure}
\hfil\epsfbox{fig54c.ps}
\caption{Performance of All Pairs Shortest Path Programs on clique graphs.}
\end{figure}
\begin{figure}
\hfil\epsfbox{fig54d.ps}
\caption{Log-log graph of All Paths Shortest Path Programs on Clique Graphs.}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 2.30 & slope(Quintus3) = 2.53 & slope(Sicstus3) = 3.67 \\
slope(XSB4) = 2.11 & slope(Quintus4) = 2.29 & slope(Sicstus4) = 3.71 \\
\end{tabular}
\end{center}
\end{figure}
{\em Figures 5.20} and {\em 5.21} show the result of testing the program on cycles of cliques (as in
Section 5.3). The program seems to perform better with clique graphs, since the results are
close to quadratic in both XSB and Quintus Prolog.
\clearpage
\subsubsection{Floyd's Algorithm:}
The all pairs shortest path problem can also be solved using an algorithm called
``{\em Floyd's Algorithm}''. The following shows a transaction program for a modified version
of Floyd's algorithm:
\begin{tabbing}
tabbb \= apsp arrow \= forall \= fff \kill
\> {\em declare dist/3 key 2 size 100.} \\
\> $apsp\; \leftarrow $ \> $dist(X,Y,Z)\ :=\ distInit(X,Y,Z)\; sc$ \\
\> \> {\em forall [K] in node(K) do} \\
\> \> ( \\
\> \> \> {\em with [K] forall [I,J,D] in newDist(I,J,D,K) do} \\
\> \> \> ( \\
\> \> \> \em if dist(I,J,X) \\
\> \> \> then set dist(I,J,D) \\
\> \> \> else create dist(I,J,D) \\
\> \> \> ) \\
\> \> ).
\end{tabbing}
And the following database rules define the predicates $distInit/3$ and $newDist/4$:
\begin{small}
\begin{eqnarray*}
distInit(I,J,1) & :- & edge(I,J). \\
distInit(I,I,0) & :- & node(I). \\
newDist(I,J,D2,K) & :- & dist2(I,J,D2,K), dist(I,J,D1), (D2 < D1). \\
newDist(I,J,D,K) & :- & dist2(I,J,D,K), not(dist(I,J,\_)).\\
dist2(I,J,D,K) & :- & dist(K,I,D1), dist(K,J,D2), (D is D1+D2).
\end{eqnarray*}
\end{small}
The predicate $distInit(I,J,N)$ returns the initial distance between node $I$ and node $J$. If $I$
and $J$ are the same, then $N$ is 0. If there is an edge between $I$ and $J$, then $N$ is 1.
The predicate $dist2(I,J,D,K)$ returns the distance $D$ from node $I$ to node $J$ via
node $K$. The predicate $newDist(I,J,D,K)$ returns the distance $K$ from node $I$ to
node $J$, if $dist(I,J,K)$ is not in database or it is shortest that the distance recorded
in database.
The transaction $apsp$ first initializes all possible distance tuples from the graph using
$distInit/3$. Then, repeat for every nodes $K$, use $newDist(I,J,D,K)$ to search for any new
distance from node $I$ to node $J$ using node $K$ as an intermediate node. If there is no
distance recorded from node $I$ to node $J$, then $dist(I,J,D)$ is created. Otherwise, change the
distance from $I$ to $J$ into $K$ because it is shorter than the distance currently stored in the
database.
\subsubsection{\em Example:}
\begin{enumerate}
\item With the a ``line'' graph of 5 nodes, the initial distance is marked as shown in the graph
(path with zero distance is omitted).
\begin{figure}[h]
\hfil\epsfbox{floyd1.ps}
\end{figure}
\item Suppose the program iterates by choosing nodes in the order $n1$, $n2$, $n3$, $n4$, $n5$.
In the first iteration, no new distance is added.
\item In the second iteration, distance between $n1$ and $n3$ is added because $dist(n1,n2,1)$
and $dist(n2,n3,1)$ is in the database.
\begin{figure}[h]
\hfil\epsfbox{floyd2.ps}
\end{figure}
\item In the next iteration, since $dist(n1,n3,2)$ and $dist(n3,n4,1)$ is in the database, the
new distance between $n1$ and $n4$ is added. The tuple $dist(n2,n4,2)$ is also added in
this iteration.
\begin{figure}[h]
\hfil\epsfbox{floyd3.ps}
\end{figure}
\item In the next iteration, distances between $n3$ and $n5$, $n2$ and $n5$, $n1$ and $n5$ are
added as shown in the figure.
\clearpage
\begin{figure}[h]
\hfil\epsfbox{floyd4.ps}
\end{figure}
\item In the last iteration, $n5$ is used but no more distance tuple is found.
\end{enumerate}
Note: In the program, if $dist(X,Y,N)$ is added in an iteration, then $dist(Y,X,N)$ is also
added.
{\em Figures 5.23} and {\em 5.24} show the performance of the program on ``line'' graphs of the
following form:
\begin{figure}[h]
\hfil\epsfbox{floyd5.ps}
\caption{Line Graph with 9 nodes.}
\end{figure}
\begin{figure}[h]
\hfil\epsfbox{fig54e.ps}
\caption{Performance of modified Flogd's algorithm on line graphs.}
\end{figure}
\begin{figure}[t]
\hfil\epsfbox{fig54f.ps}
\caption{Log-log graph of modified Floyd's algorithm on line graphs.}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 2.77 & slope(Quintus3) = 3.11 & slope(Sicstus3) = 4.52 \\
slope(XSB4) = 2.55 & slope(Quintus4) = 2.84 & slope(Sicstus4) = 4.52 \\
\end{tabular}
\end{center}
\end{figure}
\section{Binary Counter}
This program runs on a relatively small database. Also, unlike previous tests, the database
size does not change during execution. Predicates for the counter program are defined as
follows:
\begin{center}
\begin{tabular}{l}
declare bit/2 key 1 size 13. \\
declare max/1 key 0. \\
declare pos/1 key 0. \\
declare num/1 key 0.
\end{tabular}
\end{center}
Intuitively, $bit(I,V)$ means that bit $I$ of the counter has value $V$. Since each bit
has just one value, $I$ can be selected as the key. The remaining tuples can be thought of
as global variables. The names of the predicates correspond to variable names, and each
variables has just one value, so for each of these predicates, the predicate name alone is a
key. The predicate $Max/1$ stores the maximum bit for the counter. $Pos/1$ stores the current
position of a bit pointer during execution. $Num/1$ is a temporary predicate counting the number
incremented in $count/1$.
\begin{eqnarray}
initialize(M) & \leftarrow & create\; num(0)\; sc \\
& & create\; max(M)\; sc \nonumber \\
& & create\; pos(0)\; sc \nonumber \\
& & with\; [M]\; while\; (pos(N),\; N {\em declare unvisited/1 size 10.} \\
\> {\em declare start/1.} \\
\> {\em declare component/2 size 10.} \\
\> {\em findhamo} \> $\leftarrow$ \> {\em unvisited(P) := node(P) sc} \\
\> \> \> {\em destroy(unvisited(X)) sc ! sc} \\
\> \> \> {\em searchpath(X).} \\
\> {\em searchpath(S,X)} \> $\leftarrow$ \> {\em edge(X,Y) sc} \\
\> \> \> {\em destroy(unvisited(Y)) sc} \\
\> \> \> {\em create(component(X,Y)) sc} \\
\> \> \> {\em searchpath(Y).} \\
\> {\em searchpath(S,X)} \> $\leftarrow$ \> {\em not(unvisited(Y)) sc} \\
\> \> \> {\em edge(X,S) sc} \\
\> \> \> {\em create(component(X,S)).}
\end{tabbing}
Explanation of the above \TR implementation was shown in Section 3.5.2.
The query ``{\em findhamo sc fail}'' explores all paths in the graph. Our tests are based on this
query. Without selecting the graph carefully, the number of possible paths may grow
exponentially in the number of nodes. For example, a complete graph has $n!$ Hamiltonian
cycles, where n is the number of nodes. Therefore, a complete graph with 10 nodes will
already have over 3 million possible cycles, which will take a very long time to search. In
our tests, each graph is a cycle of C3 graphs, as in {\em Figure 5.27}.
\begin{figure}[t]
\hfil\epsfbox{c3cycle.ps}
\caption{Cycles with 4 and 10 C3 Graphs.}
\end{figure}
Such graphs have just one Hamiltonian cycle. As described above, the Hamiltonian graph
problem is a NPC problem, so the execution time should grow exponentially with the
number of nodes. The expected complexity of the program is $2e$ because an addition of 1
edge doubles the possible number of search paths, the original search paths, and new paths
resulted from including the new edge to the original paths. Again, since the number of
edges are chosen to be proportional to the number of nodes, the complexity of the program
should be $2^{O(n)}$.
{\em Figure 5.28} shows the graph of time versus the number of nodes, and {\em Figure 5.29} shows a
graph of $log(time)$ against number of nodes in order to deduce the degree of the
exponential.
As shown in the beginning of this chapter, the slope of log-n graph is the product of the
degree of the exponential and the log of the base of the exponential. Since the base of
exponential is 2, the degree of exponential is $slope/log(2)$. The slopes in {\em Figure 5.29} are
approximately 1.42 in all three versions of Prototype 3:
\begin{eqnarray*}
degree\; of\; exponential & = & slope/log(2) \\
& = & 1.42/1og(2) \\
& \approx & 1
\end{eqnarray*}
Therefore, the program is running with the expected complexity, i.e., $2^{0(n)}$. Again, from
these graphs, Sicstus Prolog is the fastest Prolog to run this program. This again shows
the strength of Sicstus in small databases (note that there are only 24 nodes and 36 edges in
the largest graph). However, it is disappointing that we cannot test the program in large
graphs, because a 20 nodes graph takes over 5 hours to test due to the exponential
complexity of the program.
\begin{figure}
\hfil\epsfbox{fig56a.ps}
\caption{Graph of number of nodes versus time for executing Hamiltonian-cycle
searching program.}
\end{figure}
\begin{figure}
\hfil\epsfbox{fig56b.ps}
\caption{Log-n graph of number of nodes versus log(time) for finding Hamiltonian cycles.}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 1.42 & slope(Quintus3) = 1.42 & slope(Sicstus3) = 1.46
\end{tabular}
\end{center}
\end{figure}
\clearpage
\section{Clique}
As discussed above, the Hamiltonian-cycle problem is suitable for small graphs only. To
exercise the backtracking capabilities of \TR on larger databases, this section uses the
k-clique problem. The maximum clique size problem is a well-known NP-complete problem.
However, it becomes a polynomial-time problem if the question is rephrased to that of
finding a clique of size $k$. The following \TR program (described in Section 3.5.2) succeeds
if and only if the graph in the database has a clique of size k.
\begin{tabbing}
tabb \= buildClique)K,N,K)) \= arrowww \= \kill
\> {\em declare candidate/1.} \\
\> {\em @ Return true iff the graph has a clique of size K} \\
\> {\em cliqueSize(K)} \> $\leftarrow $ \> {\em node(N) sc} \\
\> \> \> {\em candidate(CN) := edge(N,CN) sc} \\
\> \> \> {\em buildClique(K,N).} \\
\\
\> {\em @ Attempts to build a clique of size K containing node N1 from candidate nodes.} \\
\> {\em buildClique(1,\_).} \\
\> {\em buildClique(K,N1)} \> $\leftarrow$ \> $K\; >\; 1\; sc$ \\
\> \> \> {\em candidate(CN) := newCandidate(N1,CN) sc} \\
\> \> \> {\em candidate(N2) sc} \\
\> \> \> {\em K1 is K-1 sc} \\
\> \> \> {\em buildClique(K1,N2).}
\end{tabbing}
The graphs in our tests are chains of complete graphs having 5 nodes each, as in {\em Figure
5.31}. Such graphs have maximum clique size of 5. If we ask the above program to find a
clique of size 5, the execution should take constant time for all the graphs, because all
nodes belong to a clique of size 5, and the first one will be chosen. However, when asking
the program for a clique of size 6, then since there is no clique of size 6, all cliques will be
exhaustively generated. The graphs are constructed so that the number of cliques is linear
in N, the number of nodes. Thus, execution time should be linear.
\subsubsection{\em Explanation of the complexity:}
Consider the graph with cycle of K cliques of size 10 (similar to {\em Figure 5.30}). It is obvious
from the program that before a clique with clique-size 10, cliques with clique-size 9, 8, ...,
1 are found. Therefore, in the worst case, the complexity of finding a clique with clique-
size 10 starting from a particular node N is equivalent to finding all cliques with node N
which have clique-size less than 10. If a node is not at the intersection between two cliques,
since it is in a complete subgraph with clique-size 9, there are in total of 9! cliques with
clique-size less than 10. For nodes which are at the intersections, the possible number of
cliques is 2*9!. So, if a graph has K cliques of clique-size 9, it has 9K nodes which is not
in the intersection, and 9 nodes which is in an intersection. So, the worst case complexity
of the problem of finding clique with clique size 10 is $9K*9! + 2K*9! = 10K*9!$. (linear).
The program above is exponential in K, the requested clique size. However, by keeping K
fixed, the execution time should increase linearly with the number of nodes in our graphs.
The following tests use graph with a cycle of cliques with fixed maximum clique size.
Again, since we have to test for the worst case of the program, we force the program to
check all possible cliques by setting the requested clique size K to be one higher than the
maximum clique size of the graph. Since the reason of running this program is to test the
performance of backtracking in large database, we will use bigger graphs with small maximum
clique size. {\em Figure 5.30} shows a graph of a cycle of cliques with clique-size 5. With
this graph, we can use a cycle with 500 cliques (or 2000 nodes) for the test.
\begin{figure}
\hfil\epsfbox{5clique.ps}
\caption{A graph of 8 cliques with size 5.}
\end{figure}
To force an exhaustive search of all cliques, the tests were run with the query
$cliqueSize(6)$, i.e., to check if there is any clique with size 6, which
is not there.
\begin{figure}
\hfil\epsfbox{fig57a.ps}
\caption{Performance on graphs with maximum clique size of 5.}
\end{figure}
\begin{figure}
\hfil\epsfbox{fig57b.ps}
\caption{Log-log graph of log(number of cliques) versus log(time) for clique problem.}
\begin{center}
\begin{tabular}{lll}
slope(XSB3) = 0.998 & slope(Quintus3) = 0.991 & slope(Sicstus3) = 1.887 \\
\end{tabular}
\end{center}
\end{figure}
Apart from Sicstus Prolog, the observed complexity is linear, which is what we expected.
When the graph is small, the performance of Sicstus Prolog is very close to that of the
others. However, as the size of the graphs increases, Sicstus Prolog becomes much slower
than the others, and its complexity is quadratic. Again, this is the weakness of this language
for large databases. In this example, the complexity of the program in Quintus Prolog is
also linear because apart from the $edge/2$ predicate, none of the tuples is defined with
more than one argument as a key. Although indexing is needed on both arguments, and the
indexing of Quintus Prolog is provided on the first argument only, since each node has at
most ten edges on it, access time is still linear for the $edge/2$ predicate.
\subsubsection{Conclusions}
The above tests show that among three Prologs, XSB Prolog performs in the most
understandable way, and can achieve the expected complexities in most cases. On the other
hand, although Sicstus Prolog showed its superiority on small databases, it did not achieve
the expectedly complexity in tests with large databases. Quintus Prolog performs very
unpredictably in many tests. Sometimes, it performs well, but sometimes it performs the
worst without any obvious reason. For example, Quintus performs much better in the
clique graph problem than in the Hamiltonian-cycle and connected-component problems.
Also, although Prototype 3 on Quintus Prolog shows non-linearity in the counter test, but
Prototype 4 on Quintus Prolog performs even better than XSB Prolog on the same test.