4.3 Multivalued Delayed Systems
4.3.2 Algorithm
In section4.1we proposed a method to learn delayed influences of Boolean systems: theLFkT algorithm. In this section, we propose a new version of this algorithm that handle multi-valued variables. Furthermore, the delays are now computed dynamically and does not need to be known or fixed to the maximal value (size of the longest trace).
LFkT:
• Input: A set of traces of executionsOof a multi-valued Markov(k) systemS.
• Step 1: Initialize a logic program with fact rules.
• Step 2: Pick a traceT fromOand update the delay considered accordingly.
– Initialize a logic program with fact rules for each new delay.
– Revise these logic programs with all previous traces (like step 3).
• Step 3: Convert the trace into interpretation transitions and revise the logic programs using least specialization.
• Step 4: If there is remaining trace inO, go back to step 2.
• Step 5: Merge all logic programs into one while avoiding rules subsumption.
• Step 6: Remove all rules that are not necessary to explain the observations.
• Output: A set of rules which realizesO.
The detailed pseudo code of LFkT is given in Algorithm 17. Like before, the idea of the algorithm is to start with the most general rules and use least specialization iteratively on each traces to make the rules leanred consistent with the all input observations.
1)The algorithm starts with a logic program that only contains all possible fact rules and as-sumes that the system to learn is Markov(1) (lines 6-9). These different programs are merged at the end to constitute a logic program that realizes all consistent traces ofO.2.1)Before learning from a trace, we need to guarantee that we are considering a valid delay according to the trace (lines 13-20). That is why we check the minimal delay required to explain the trace by using thedelayfunction, whose pseudo code is given in Algorithm18. If this delay is greater than the one currently considered by the algorithm, it updates this delay and generates programs for all missing delays (lines 14-20). All previously analyzed traces are then re-analyzed but only for these new programs. This allows to learn only the missing delayed rules.2.2)Then it checks the consistency of the new trace with previously analyzed ones (lines 21-31). The delay considered is increased if necessary. In practice, the consistency of the new traces with previously analyzed ones can be directly checked from the programs that are learned. If the program that considers the biggest delaykhas no rule that can realize the last transition of T (if6 ∃R ∈ Pk′, b(R) ⊆I with (I, Sn) := the |T|-step interpretation transition of T), then the trace is not k-consistent with at least one of the previous ones. 2.3)The program that is learned is revised according to the new trace using least specialization (lines 34-37). In order to use least specialization, we need to convert the trace of execution into interpretation transitions. This conversion is done by the function interprete, whose pseudo code is given in Algorithm20. Here, min(k,|T|)
Algorithm 17LFkT(O,B) : Learn a set of rules that realizeO 1: INPUT:Oa set of traces of executions,Ba set of atoms
2: OUTPUT:Pa logic program that realizes the transitions ofO.
3: P′a vector of set of rules
4: Ea set of pairs of interpretations(I, J)
5: kan integer
6: // 1) InitializeP′with the most general logic program
7: foreach atomvarval∈ Bdo
8: P1′:=P1′∪ {varval←}
9: end for
10:k:= 1// Assume Markov(1)
11:// 2) Learning phase
12:whileO6=∅do
13: pick a traceT∈O
14: // 2.1) Check delay of the trace
15: if delay(T)> kthen// Extend the delay to learn
16: fori=k+ 1todelay(T)do
17: foreach atomvarval∈ Bdo
18: foreach atomvar′val′∈ Bdo
19: Pi′:=Pi′∪ {varval←var′valt−i′}
20: end for
21: end for
22: end fork:=delay(T)
23: foreach traceT′∈O′do
24: P′:=learn(P, T′, k,B)
25: end for
26: end if
27: // 2.2) Check consistency with previous traces
28: if∃T′∈O′, TandT′are notk-consistentthen
29: foreachk′fromktomin(|T|,|T′|)do
30: ifTandT′arek′-consistentthen
31: fori=ktok′do
32: foreach atomvarval∈ Bdo
33: foreach atomvar′val′∈ Bdo
34: Pi′:=Pi′∪ {varval←vart−i′val′}
35: end for
36: end for
37: end for
38: foreach traceT′∈O′do
39: P′:=learn(P, T′, k,B)
40: end for
41: k := k’
42: else//TandT′are not consistent, cannot happen ifOis consistent
43: EXIT: non-deterministic input
44: end if
45: end for
46: end if
47: // 2.3) SpecifyP′by the interpretations of the trace
48: P′:=learn(P, T ,1,B)
49: O:=O\ {T}
50: O′:=O′∪ {T}
51:end while
52:// 3) Merge the programs into a unique logic program
53:merging:=∅
54:foreachifrom1tokdo
55: remove fromPi′all rules subsumed by a rule ofmerging
56: merging:=merging∪Pi′
57:end for
58:// 4) Keep only the rules that can realize the observations
59:P:=∅
60:foreachT′∈O′do
61: E:=interprete(T′)
62: foreach(I, J)∈Edo
63: foreachR∈mergingdo
64: ifb(R)⊆Iandh(R)∈Jthen
65: P:=P∪ {R}
66: end if
67: end for
68: end for
69:end for
70:returnP
interpretation transitions are extracted from the trace, one for each possible delay inferior to the currently considered one, that isk. Following this method, it produces onemin(k,|T|)-step interpretation, onemin(k,|T|)−1interpretation, . . . , one 1-step interpretation. The function
Algorithm 18delay(T) : Compute the minimal delay of a trace 1: INPUT: a trace of executionT = (S0, . . . , Sn)
2: OUTPUT:delayan integer
3: delay := 1
4: foreachifrom1ton−1do
5: foreachjfromiton−1do
6: ifSi =Sj then
7: k := 1
8: whilek≤iANDSi−k=Sj−kdo
9: k:=k+ 1
10: end while
11: delay := max(delay,k)
12: end if
13: end for
14: end for
15: returndelay
Algorithm 19learn(P, T, min delay,B) : RevisePto avoid the subsumption ofR
1: INPUT:P a vector of logic program,T a trace of execution andmin delayan integer
2: OUTPUT: a vector of logic program
3: E:=interprete(T)
4: foreachifrommin delayto|T|do
5: foreach k-step interpretation(I, J)∈E withk≥ido
6: remove fromI all atomsvart−nval withn > i
7: foreach atomvarval ∈J do
8: foreachvarval′ ∈ B, val′ 6=valdo
9: RvarI val′ :=varval′ ←V
lj∈Ilj 10: Pi:=Specialize(Pi, RIvarval′,B)
11: end for
12: end for
13: end for
14: end for returnP
outputs them as a vector of interpretation transitionsE, where eachEi corresponds to ani-step interpretation transition of a sub-trace of sizeiofT. The algorithm iteratively learns from each pair of interpretations ofE. Now it only needs to apply the least specialization by analyzing each pair of interpretations(I, J) ∈ E. For each atomvarval thatdoes not appearinJ, it infers ananti-rule: RIvarval := varval ← V
Bi∈IBi, Then, least specialization is used to make each corresponding logic programPi′consistent withRIvarval, according to the delay of interpretation transition. Algorithm21shows the pseudo code of this operation. In the functionspecialize, it first extracts all rulesRP ∈ P that subsumesRIA. It generates the least specialization of each RP by generating a rule for each literal in RIvarval. Each rule contain all literals of RP, plus a literal that represents another value of the variable represented by a literal inRIvarval, so that
Algorithm 20interprete(T) : Extract interpretation transitions from a trace 1: INPUT: a trace of executionT = (S0, . . . , Sn)
2: OUTPUT:Ea set of pairs of interpretations
3: E:=∅
// Extract interpretations
4: foreachkfrom1to|T|do
5: T′ := (S0, . . . , Sk)// the sub-trace of sizekofT that start fromS0
6: I :=∅
7: foreach statesk′ beforeskinT′do
8: delay:=k−k′
9: foreach atoma∈sk′do
10: I :=I∪ {at−delay}
11: end for
12: E:=E∪(I, Sk)
13: end for
14: end for
15: returnE
Algorithm 21specialize(P,R,B) : specializePto avoid the subsumption ofR 1: INPUT: a logic programP, a ruleR, a set of atomsB
2: OUTPUT: the least specialization ofP byR.
3: conf licts: a set of rules
4: conf licts:=∅
// Search rules that need to be specialized
5: foreach ruleRP ∈P do
6: ifRP subsumesRthen
7: conf licts:=conf licts∪RP
8: P :=P\RP
9: end if
10: end for
// Revise the rules by least specialization
11: foreach ruleRc ∈conf lictsdo
12: foreach literalvarvalt−k ∈b(R)do
13: ifvarvalt−k∈/ b(Rc)then
14: foreachvarvalt−k′ ∈ B, val′ 6=valdo
15: Rc′ := (h(Rc)←(b(Rc)∪vart−kval′))
16: ifP does not subsumeR′cthen
17: P :=P\all rules subsumed byR′c
18: P :=P∪R′c
19: end if
20: end for
21: end if
22: end for
23: end for
24: returnP
RvarI valis not subsumed anymore by that rule. Thenspecializeadds inP all the generated rules that are not subsumed byP, so thatP becomes consistent with the transition(I, J).
3) After analyzing all traces ofO, the kprograms that have been learned are merged into a unique logic program while taking care that subsumed rules are discarded.4)All rules that are not necessary to explain the observations are discarded. The algorithm only keeps the rules that can be used to realize at least one of the transition of the input traces. Finally,LFkToutputs a logic program that realizes all consistent traces of execution ofO.
Theorem 4.18(Correctness ofLFkT). LetP be a logic program,Bbe the Herbrand base of P andBkbe the timed Herbrand base ofP with periodk. Let S be a Markov(k) system with respect toP. LetO be a set of traces ofS. UsingO as input,LFkToutputs a logic program that realizes all consistent traces ofO.
Proof. LetV be the vector of interpretation transition extracted fromO byLFkT(Algorithm 20). According to Theorem 4 of [64], initializing LF1T with {p.|p ∈ B}, by using minimal specialization iteratively on a set of interpretation transitionsE, we obtain a logic programP that realizes E. SinceLFkT uses this method on each element of V, LFkTlearns a vector of logic programsP′ such that each logic programp′n ∈ P′ realizes the corresponding set of interpretation transitionsvn∈V,n≥1.
Let p′n ∈ P′ be the logic program learn from vn ∈ V, n ≥ 1. p′n is obtained by minimal specialization of{p.|p∈ B}with all anti-rule ofvn(non consistent rule). According to Theorem 3 of [64],p′ndoes not subsume any anti-rule that can be inferred fromvn. Then,p′nrealizes all deterministic transition ofvn, that is∀(I, J)∈vn,6 ∃(I, J′), J 6=J′.
Sincevncontainsn-step interpretation transition that represent all sub-traces of sizenofO,p′n realizes all consistent sub-trace of size nofO. LetPn−1 be a logic program that realizes all consistent sub-traces of size at mostn−1ofO.p′ncan contain a ruleRsuch that(Bn\ Bn−1)∩ b(R) = ∅ (no literal ofR refers to thet−nstate of the variables). In this caseRrealizes a sub-trace of sizenand also some sub-traces of size at mostn−1. If these sub-traces of size n−1 are consistent, then they are necessary realized byPn−1. Pn−1 ∪ {R}does not realize more consistent sub-trace of size at mostn−1thanPn−1. LetSRbe the set of rules ofp′nof the formR, then(p′n\SR)only realizes all sub-traces of sizenofO. Then the logic program Pn=Pn−1∪(p′n\SR)only realizes all consistent sub-trace of size at mostn−1ofOand all sub-traces of sizenofO, that isPnrealizes all consistent sub-traces of size at mostnofO.
Letp′1 ∈P′ be the logic program learned fromv1 ∈V, and letP = p′1. LetR′ be all rules of the logic programp′nsuch that(Bn\ Bn−1)∩b(R′) 6= ∅. Iteratively adding rules R′ into P, starting by the logic programp′2untilp′k, we obtain a logic program that realizes all consistent sub-traces of size at mostkofO. As a result, usingO as input,LFkToutputs a logic program that realizes all consistent traces ofO.
Theorem 4.19(Complexity). LetP be a logic program,Bbe the Herbrand base ofP andBk
be the timed Herbrand base ofP with periodk. LetSbe a multi-valued Markov(k) system with respect toP. Letnbe the number of variable ofS. Letv be the maximal number of value of a variable ofS. LetO be a set of traces of execution ofS. The complexity of learningS from OwithLFkTis respectively:O(n·vnk+1+|O|)for memory andO( P
T∈O
|T| ·nvnk+3+|O| · n2k2+n·vnk+2+n·vnk+1· |O| ·k)for runtime.
Proof. nisthenumberof possibleheadsof rulesofS.nk is the maximum size of a rule ofS, i.e.
the number of literals in the body; a literal can appear at most one time in the body of a rule.
For each rule head of B there are vnk possible bodies: each literal can be present or absent from the body. From these preliminaries we conclude that the size of a Markov(k) system S learned byLFkT is at most |S| = n·vnk. To learn S, LFkTneeds to store kprogramsPi
that are Markov(i) system with respect toP,1 ≤i≤k. The algorithm also needs to store the previously analyzed traces in order to update the considered delay.
Conclusion 1:the memory use ofLFkTisO(
k
P
i=1
|Pi|+O) =O(k·n·vknk+|O|)that is bound byO(n· · ·vnk+1+|O|).
For each traceT ofO,LFkTextracts|T|pairs of interpretations. For each pair of interpretation (I, J), LFkTinfers an anti-rule ruleRIAfor each A ∈ B, A 6∈ J. LFkTcompares eachRIA with all rules of each programsPi. There is at most|B| −nanti-rules that can be infered from (I, J)byLFkTand the size of each programPi is bound byO(n·vknk). Then, the complexity of learning one trace of execution T ∈ O withLFkT isO(|T| · |B| −n·k|Pi|) = O(|T| · nv−n·kn·vknk) = O(|T| ·n2vnk+2−n)that is bound byO(|T| ·nvnk+3). To update the considered delay, the algorithm has to check the delay of each new trace T, this operation belongs to O(|T|2) = (n2k2). And, checking the consistency of a new traces with previous analyzed one is bound byO(|O| ∗n2k2). Merging the programs requires to compare all rules to detect subsumption, it has a complexity ofO(n·vnk+2). Finally, removing the rules that are not necessary to realizeOrequires to compare each rules with allk-step interpretation ofO, thus it requiresO(n·vnk+1· |O| ·k).
Conclusion 2:The complexity of learningSfromOwithLFkTisO(P
T∈O
|T| ·nvnk+3+|O| · n2k2+n·vnk+2).