cell patterns. Fig.3.2shows a torus world of size 4 and the state transitions by Rule 110 with the initial configuration(1,2,3,4) = (0,0,1,0). The columns numbered (4) and (1) are thus identical to columns 4 and 1, respectively. Note that the configurations reach the attractor,(1,1,1,0)→(1,0,1,1)→(1,1,1,0).
Due to these approximations, the number of possible state transitions can be made smaller in the case of elementary CAs like Fig.3.2. Our learning framework can handle both limited frames and torus worlds by considering adequate state transitions representation as input. For example, to represent a torus world of size 4, a configuration is represented by a vector with 6 elements (0,1,2,3,4,5):1,2,3,4respectively represent their values in the corresponding cells, and0,5 respectively represent the values of cells 4 and 1 (colored gray when the value is 1). This last information can be represented in a background theory as the two rules with the time argument:
c(0, t)←c(4, t), c(5, t)←c(1, t),
wherec(x) represents a cellx andc(x, t) is its state at time stept. Unfortunately, these two rules do not have corresponding rules without the time argument, since the head literals refer the time steptinstead oft+ 1. Hence, simple removal of the time argument from the both sides changes the dynamic meaning of the NLP in application of theTP operator that infers about the next time step. Then, without the time argument, we should copy the rules forc(4)to those for c(0), and copy the rules forc(1)to those forc(5).
Now we use non-ground resolution and consider the following two biases:
• Bias I: The body of each rule contains at mostnneighbor literals.
TABLE3.7: LF1T algorithm with Bias I on Rule 110 in Torus world
Step I →J Operation Rule ID P
1 000100→001100 R001c(2) c(2)← ¬c(1)∧ ¬c(2)∧c(3) 1 1 R010c(3) c(3)← ¬c(2)∧c(3)∧ ¬c(4) 2 1,2 2 001100→011101 R001c(1) c(1)← ¬c(0)∧ ¬c(1)∧c(2) 3
lg(3,1) c(x)← ¬c(x−1)∧ ¬c(x)∧c(x+ 1) 4 2,4 R011c(2) c(2)← ¬c(1)∧c(2)∧c(3) 5
res(5,4) c(2)← ¬c(1)∧c(3) 6 2,4,6 R110c(3) c(3)←c(2)∧c(3)∧ ¬c(4) 7
res(7,2) c(3)←c(3)∧ ¬c(4) 8 4,6,8 3 011101→110111 R011c(1) c(1)← ¬c(0)∧c(1)∧c(2) 9
lg(9,6) c(x)← ¬c(x−1)∧c(x)∧c(x+ 1) 10
lg(10,4) c(x)← ¬c(x−1)∧c(x+ 1) 11 8,11 R110c(3) c(3)←c(2)∧c(3)∧ ¬c(4) 12
R101c(4) c(4)←c(3)∧ ¬c(4)∧c(5) 13
res(13,11) c(4)← ¬c(4)∧c(5) 14 8,11,14 4 110111→011101 R110c(1) c(1)←c(0)∧c(1)∧ ¬c(2) 15
lg(15,8) c(x)←c(x−1)∧c(x)∧ ¬c(x+ 1) 16 8,11,14,16 R101c(2) c(2)←c(1)∧ ¬c(2)∧c(3) 17
lg(17,14) c(x)←c(x−1)∧ ¬c(x)∧c(x+ 1) 18
res(18,11) c(x)← ¬c(x)∧c(x+ 1) 19 8,11,16,19 R011c(3) c(3)← ¬c(2)∧c(3)∧c(4) 20
res(20,19) c(3)←c(3)∧c(4) 21
res(21,19) c(3)←c(4) 22 11,16,19,22 res(8,22) c(3)←c(3) 23 11,16,19,22,23
• Bias II: The rules are universal for every time step and for any position. This means that the same states of the neighbor cells always implies the same state in the center cell at the next time step.
Combining these two biases, we can adaptLF1Tto learn dynamics of CAs. Using Bias I, the rule construction process only considersnliterals (heren = 3) in the neighbors of the cell in the body of a rule. With Bias I, ground resolution is not sufficient to compare non-ground rules with ground rules, for that we need non-ground resolution. We apply anti-instantiation (AI) for getting universal rules with Bias II, whenever a newly added ruleRIAis not subsumed by any rule in the current program. We can guarantee the soundness of this generalization under Bias II. However, without Bias I, we cannot determine the body literals for construction of each universal rule, so that we must examine the effects from non-neighbor cells too.
TABLE3.8: LF1T algorithm with Biases I and II on Rule 110 in Torus world
Step I →J Operation Rule ID P
1 000100→001100 R001c(2) c(2)← ¬c(1)∧ ¬c(2)∧c(3) 1
ai(1) c(x)← ¬c(x−1)∧ ¬c(x)∧c(x+ 1) 2 2 R010c(3) c(3)← ¬c(2)∧c(3)∧ ¬c(4) 3
ai(3) c(x)← ¬c(x−1)∧c(x)∧ ¬c(x+ 1) 4 2,4 2 001100→011101 R001c(1) c(1)← ¬c(0)∧ ¬c(1)∧c(2) 5
R011c(2) c(2)← ¬c(1)∧c(2)∧c(3) 6 ai(6) c(x)← ¬c(x−1)∧c(x)∧c(x+ 1) 7
res(7,2) c(x)← ¬c(x−1)∧c(x+ 1) 8 8,4 res(7,4) c(x)← ¬c(x−1)∧c(x) 9 8,9
R110c(3) c(3)←c(2)∧c(3)∧ ¬c(4) 10 ai(10) c(x)← ¬c(x−1)∧c(x)∧ ¬c(x+ 1) 11
res(11,9) c(x)←c(x)∧ ¬c(x+ 1) 12 8,9,12 3 011101→110111 R011c(1) c(1)← ¬c(0)∧c(1)∧c(2) 13
R110c(3) c(3)←c(2)∧c(3)∧ ¬c(4) 14 R101c(4) c(4)←c(3)∧ ¬c(4)∧c(5) 15 ai(15) c(x)←c(x−1)∧ ¬c(x)∧c(x+ 1) 16
res(16,8) c(x)← ¬c(x)∧c(x+ 1) 17 8,9,12,17 4 110111→011101 R110c(1) c(1)←c(0)∧c(1)∧ ¬c(2) 18
R101c(2) c(2)←c(1)∧ ¬c(2)∧c(3) 19
R011c(3) c(3)← ¬c(2)∧c(3)∧c(4) 20 8,9,12,17
Using Bias I only,LF1Tin a limited frame of width 4 learns the following rules for Wolfram’s Rule 110:
c(2)← ¬c(2)∧c(3), c(3)← ¬c(2)∧c(3), c(3)←c(3)∧ ¬c(4),
c(x)← ¬c(x−1)∧c(x+ 1), c(x)←c(x−1)∧c(x)∧ ¬c(x+ 1).
(3.3)
Instead, when we use a torus world of length 4 for Wolfram’s Rule 110 inLF1T with Bias I only, Table3.7shows the learning process2and the following NLP is obtained:
c(3)←c(3), c(3)←c(4),
c(x)← ¬c(x)∧c(x+ 1), c(x)← ¬c(x−1)∧c(x+ 1), c(x)←c(x−1)∧c(x)∧ ¬c(x+ 1).
(3.4)
2In Tables3.7and 3.8, interpretationsI andJare represented as configurations, that is, c(i) ∈ Iiff c(i)is true. Operationlg(R1, R2)takes the least generalization ofR1andR2with the same pattern, which generalizes the common terms inR1andR2into variables, andai(R)takes the anti-instantiation ofR.
Both programs (3.3) and (3.4) are quite different from the original rules in Table3.6. On the other hand, if we use Biases I and II in either a limited frame of width 4 or a torus world of length 4, we get the following NLP (the process is in Table3.7), which are equivalent to the original transition rule in Table3.6:
c(x)←c(x)∧ ¬c(x+ 1), c(x)← ¬c(x−1)∧c(x), c(x)← ¬c(x−1)∧c(x+ 1), c(x)← ¬c(x)∧c(x+ 1).
(3.5)
In learning an NLP for Rule 110 with Biases I and II, we get interesting generalizations. The NLP obtained from the trace of Rule 110 withLF1Tbecomes more compact in 4 rules, whereas the original transition rule representing the dynamics of this CA in Table3.6consists of 5 rules.
However, there still exists a redundancy here; we can omit either the second or the third rule from (3.5). But both rules are minimal rules, their body are prime implicant conditions of the head.
to reach to the attractor [79]. Then we can do reverse engineering to get the corresponding state transitions rules for the Boolean network.
3.2 BDD Algorithms for LF1T
In the previous section we presented the first version of the LF1Talgorithm. This algorithm has two main weak points. The first one concern its performance: the first implementations were not able to tackle model with more than 15 variables. The second one concern its output:
the NLP learned realize the input transitions, but there is no guaranty on the minimality of the rules learned. Lot of improvement could be done regarding the representation of the rules in the implementation. And it is this problem that we tackle first and that we discuss in this section.
How to solve the second problem is shown in the next section.
Now we present a new LF1T algorithm based on an efficient data structured inspired from OBDD and Zero-suppressed BDD. A BDD is a canonical representation of a Boolean formula [54,55]. The novelty of this approach is the integration of LF1T operations into a BDD structure to perform ground resolution. In this approach, one BDD represents a set of rules that have the same head. Figure3.3show the evolution of the BDD that represents rules ofpin Example3.2:
In this figure, the last schema of step 9 represents a BDD that contains two rulesp←p∧qand p ← q∧r which both havepas their head. The internal nodes of our data structure represent literals, and outgoing edges represent their polarity. In Figure3.3, the first BDD has one root node which represents the literalpand the edge between its child nodeqrepresents the fact that pis positive in the rulep←p∧q.
Like an OBBD [61, 62], our structure respects a total variable ordering: ifp, care two nodes, c is a child ofp andlp, lc their literals respectively,then lp < lc. If there is an edge between two nodesp, cthat are not neighbors in the ordering, it means that all literals between them are absent from the rules encoded by paths includingpandc. Like a ZDD, our BDD structure can have multiple root nodes, but only one leaf; it only represents positive rules. A root node always represents the first literal of one or multiple rules. The leaf node represents the end of all rules;
it is the unique child of the last literal of every rule represented by the BDD.
Usual BDD merging operations are not sufficient to perform the generalization operations of LF1T. In LF1T, these operations are equivalent to the use of na¨ıve resolution without Pold. In Figure 3.3, the generalization obtained in step 2 can be obtained by usual BDD merging operations: the noder has a positive and negative link to the same node (the leaf) and should be removed according to BDD merging operations. But the generalization obtained by ground resolution on step 9 cannot be obtained by usual BDD merging operations.
To use ground resolution within a BDD structure we need to introduce specific merging oper-ations. These operations have to ensure that the set of rules represented by a BDD is always minimal regarding ground resolution. In Figure3.3, the last BDD of each learning step respects this notion of minimality.
p
q
r 1
1
1
p
q
r 1
1
1 0
p
q 1
1
p
q q
r
1
1 1
1 0
p
q q
r 1
1
1 1
Step 1 Step 2 Step 9
Step 11 p
q q
r
q
r 1
1
1
1
1 0
0
p
q q
r
q 1
1 1
1 1
0
q q
r 1
1
1
q
1
+#14
+#1 +#3 +#4 +#8 +#9
+#15 +#16 -#9
FIGURE 3.3: Evolution of the BDD ofpin Example3.2, edge labelled by 0 represents