Algorithm - Multivalued Delayed Systems - 本文 Thesis 総合研究大学院大学学術情報リポジトリ A1800本文

4.3 Multivalued Delayed Systems

4.3.2 Algorithm

In section4.1we proposed a method to learn delayed influences of Boolean systems: theLFkT algorithm. In this section, we propose a new version of this algorithm that handle multi-valued variables. Furthermore, the delays are now computed dynamically and does not need to be known or fixed to the maximal value (size of the longest trace).

LFkT:

• Input: A set of traces of executionsOof a multi-valued Markov(k) systemS.

• Step 1: Initialize a logic program with fact rules.

• Step 2: Pick a traceT fromOand update the delay considered accordingly.

– Initialize a logic program with fact rules for each new delay.

– Revise these logic programs with all previous traces (like step 3).

• Step 3: Convert the trace into interpretation transitions and revise the logic programs using least specialization.

• Step 4: If there is remaining trace inO, go back to step 2.

• Step 5: Merge all logic programs into one while avoiding rules subsumption.

• Step 6: Remove all rules that are not necessary to explain the observations.

• Output: A set of rules which realizesO.

The detailed pseudo code of LFkT is given in Algorithm 17. Like before, the idea of the algorithm is to start with the most general rules and use least specialization iteratively on each traces to make the rules leanred consistent with the all input observations.

1)The algorithm starts with a logic program that only contains all possible fact rules and as-sumes that the system to learn is Markov(1) (lines 6-9). These different programs are merged at the end to constitute a logic program that realizes all consistent traces ofO.2.1)Before learning from a trace, we need to guarantee that we are considering a valid delay according to the trace (lines 13-20). That is why we check the minimal delay required to explain the trace by using thedelayfunction, whose pseudo code is given in Algorithm18. If this delay is greater than the one currently considered by the algorithm, it updates this delay and generates programs for all missing delays (lines 14-20). All previously analyzed traces are then re-analyzed but only for these new programs. This allows to learn only the missing delayed rules.2.2)Then it checks the consistency of the new trace with previously analyzed ones (lines 21-31). The delay considered is increased if necessary. In practice, the consistency of the new traces with previously analyzed ones can be directly checked from the programs that are learned. If the program that considers the biggest delaykhas no rule that can realize the last transition of T (if6 ∃R ∈ P_k^′, b(R) ⊆I with (I, Sn) := the |T|-step interpretation transition of T), then the trace is not k-consistent with at least one of the previous ones. 2.3)The program that is learned is revised according to the new trace using least specialization (lines 34-37). In order to use least specialization, we need to convert the trace of execution into interpretation transitions. This conversion is done by the function interprete, whose pseudo code is given in Algorithm20. Here, min(k,|T|)

Algorithm 17LFkT(O,B) : Learn a set of rules that realizeO 1: INPUT:Oa set of traces of executions,Ba set of atoms

2: OUTPUT:Pa logic program that realizes the transitions ofO.

3: P^′a vector of set of rules

4: Ea set of pairs of interpretations(I, J)

5: kan integer

6: // 1) InitializeP^′with the most general logic program

7: foreach atomvar^val∈ Bdo

8: P1^′:=P1^′∪ {var^val←}

9: end for

10:k:= 1// Assume Markov(1)

11:// 2) Learning phase

12:whileO6=∅do

13: pick a traceT∈O

14: // 2.1) Check delay of the trace

15: if delay(T)> kthen// Extend the delay to learn

16: fori=k+ 1todelay(T)do

17: foreach atomvar^val∈ Bdo

18: foreach atomvar^′val^′∈ Bdo

19: P_i^′:=P_i^′∪ {var^val←var^′val_t−i^′}

20: end for

21: end for

22: end fork:=delay(T)

23: foreach traceT^′∈O^′do

24: P^′:=learn(P, T^′, k,B)

25: end for

26: end if

27: // 2.2) Check consistency with previous traces

28: if∃T^′∈O^′, TandT^′are notk-consistentthen

29: foreachk^′fromktomin(|T|,|T^′|)do

30: ifTandT^′arek^′-consistentthen

31: fori=ktok^′do

32: foreach atomvar^val∈ Bdo

33: foreach atomvar^′val^′∈ Bdo

34: P_i^′:=P_i^′∪ {var^val←var_t−i^′val^′}

35: end for

36: end for

37: end for

38: foreach traceT^′∈O^′do

39: P^′:=learn(P, T^′, k,B)

40: end for

41: k := k’

42: else//TandT^′are not consistent, cannot happen ifOis consistent

43: EXIT: non-deterministic input

44: end if

45: end for

46: end if

47: // 2.3) SpecifyP^′by the interpretations of the trace

48: P^′:=learn(P, T ,1,B)

49: O:=O\ {T}

50: O^′:=O^′∪ {T}

51:end while

52:// 3) Merge the programs into a unique logic program

53:merging:=∅

54:foreachifrom1tokdo

55: remove fromP_i^′all rules subsumed by a rule ofmerging

56: merging:=merging∪P_i^′

57:end for

58:// 4) Keep only the rules that can realize the observations

59:P:=∅

60:foreachT^′∈O^′do

61: E:=interprete(T^′)

62: foreach(I, J)∈Edo

63: foreachR∈mergingdo

64: ifb(R)⊆Iandh(R)∈Jthen

65: P:=P∪ {R}

66: end if

67: end for

68: end for

69:end for

70:returnP

interpretation transitions are extracted from the trace, one for each possible delay inferior to the currently considered one, that isk. Following this method, it produces onemin(k,|T|)-step interpretation, onemin(k,|T|)−1interpretation, . . . , one 1-step interpretation. The function

Algorithm 18delay(T) : Compute the minimal delay of a trace 1: INPUT: a trace of executionT = (S₀, . . . , S_n)

2: OUTPUT:delayan integer

3: delay := 1

4: foreachifrom1ton−1do

5: foreachjfromiton−1do

6: ifSi =Sj then

7: k := 1

8: whilek≤iANDS_i−k=S_j−kdo

9: k:=k+ 1

10: end while

11: delay := max(delay,k)

12: end if

13: end for

14: end for

15: returndelay

Algorithm 19learn(P, T, min delay,B) : RevisePto avoid the subsumption ofR

1: INPUT:P a vector of logic program,T a trace of execution andmin delayan integer

2: OUTPUT: a vector of logic program

3: E:=interprete(T)

4: foreachifrommin delayto|T|do

5: foreach k-step interpretation(I, J)∈E withk≥ido

6: remove fromI all atomsvar_t−n^val withn > i

7: foreach atomvar^val ∈J do

8: foreachvar^val^′ ∈ B, val^′ 6=valdo

9: R_var^I _val′ :=var^val^′ ←V

lj∈Ilj 10: Pi:=Specialize(Pi, R^I_var_val′,B)

11: end for

12: end for

13: end for

14: end for returnP

outputs them as a vector of interpretation transitionsE, where eachEi corresponds to ani-step interpretation transition of a sub-trace of sizeiofT. The algorithm iteratively learns from each pair of interpretations ofE. Now it only needs to apply the least specialization by analyzing each pair of interpretations(I, J) ∈ E. For each atomvar^val thatdoes not appearinJ, it infers ananti-rule: R^I_varval := var^val ← V

Bi∈IBi, Then, least specialization is used to make each corresponding logic programP_i^′consistent withR^I_varval, according to the delay of interpretation transition. Algorithm21shows the pseudo code of this operation. In the functionspecialize, it first extracts all rulesRP ∈ P that subsumesR^I_A. It generates the least specialization of each R_P by generating a rule for each literal in R^I_varval. Each rule contain all literals of R_P, plus a literal that represents another value of the variable represented by a literal inR^I_varval, so that

Algorithm 20interprete(T) : Extract interpretation transitions from a trace 1: INPUT: a trace of executionT = (S₀, . . . , S_n)

2: OUTPUT:Ea set of pairs of interpretations

3: E:=∅

// Extract interpretations

4: foreachkfrom1to|T|do

5: T^′ := (S₀, . . . , S_k)// the sub-trace of sizekofT that start fromS₀

6: I :=∅

7: foreach statesk^′ beforeskinT^′do

8: delay:=k−k^′

9: foreach atoma∈sk^′do

10: I :=I∪ {at−delay}

11: end for

12: E:=E∪(I, Sk)

13: end for

14: end for

15: returnE

Algorithm 21specialize(P,R,B) : specializePto avoid the subsumption ofR 1: INPUT: a logic programP, a ruleR, a set of atomsB

2: OUTPUT: the least specialization ofP byR.

3: conf licts: a set of rules

4: conf licts:=∅

// Search rules that need to be specialized

5: foreach ruleR_P ∈P do

6: ifRP subsumesRthen

7: conf licts:=conf licts∪RP

8: P :=P\R_P

9: end if

10: end for

// Revise the rules by least specialization

11: foreach ruleRc ∈conf lictsdo

12: foreach literalvar^val_t−k ∈b(R)do

13: ifvar^val_t−k∈/ b(Rc)then

14: foreachvar^val_t−k^′ ∈ B, val^′ 6=valdo

15: R_c^′ := (h(Rc)←(b(Rc)∪var_t−k^val^′))

16: ifP does not subsumeR^′_cthen

17: P :=P\all rules subsumed byR^′_c

18: P :=P∪R^′_c

19: end if

20: end for

21: end if

22: end for

23: end for

24: returnP

R_var^I valis not subsumed anymore by that rule. Thenspecializeadds inP all the generated rules that are not subsumed byP, so thatP becomes consistent with the transition(I, J).

3) After analyzing all traces ofO, the kprograms that have been learned are merged into a unique logic program while taking care that subsumed rules are discarded.4)All rules that are not necessary to explain the observations are discarded. The algorithm only keeps the rules that can be used to realize at least one of the transition of the input traces. Finally,LFkToutputs a logic program that realizes all consistent traces of execution ofO.

Theorem 4.18(Correctness ofLFkT). LetP be a logic program,Bbe the Herbrand base of P andBkbe the timed Herbrand base ofP with periodk. Let S be a Markov(k) system with respect toP. LetO be a set of traces ofS. UsingO as input,LFkToutputs a logic program that realizes all consistent traces ofO.

Proof. LetV be the vector of interpretation transition extracted fromO byLFkT(Algorithm 20). According to Theorem 4 of [64], initializing LF1T with {p.|p ∈ B}, by using minimal specialization iteratively on a set of interpretation transitionsE, we obtain a logic programP that realizes E. SinceLFkT uses this method on each element of V, LFkTlearns a vector of logic programsP^′ such that each logic programp^′_n ∈ P^′ realizes the corresponding set of interpretation transitionsvn∈V,n≥1.

Let p^′_n ∈ P^′ be the logic program learn from vn ∈ V, n ≥ 1. p^′_n is obtained by minimal specialization of{p.|p∈ B}with all anti-rule ofv_n(non consistent rule). According to Theorem 3 of [64],p^′_ndoes not subsume any anti-rule that can be inferred fromvn. Then,p^′_nrealizes all deterministic transition ofvn, that is∀(I, J)∈vn,6 ∃(I, J^′), J 6=J^′.

Sincev_ncontainsn-step interpretation transition that represent all sub-traces of sizenofO,p^′_n realizes all consistent sub-trace of size nofO. LetPn−1 be a logic program that realizes all consistent sub-traces of size at mostn−1ofO.p^′_ncan contain a ruleRsuch that(Bn\ Bn−1)∩ b(R) = ∅ (no literal ofR refers to thet−nstate of the variables). In this caseRrealizes a sub-trace of sizenand also some sub-traces of size at mostn−1. If these sub-traces of size n−1 are consistent, then they are necessary realized byP_n−1. P_n−1 ∪ {R}does not realize more consistent sub-trace of size at mostn−1thanPn−1. LetSRbe the set of rules ofp^′_nof the formR, then(p^′_n\S_R)only realizes all sub-traces of sizenofO. Then the logic program Pn=Pn−1∪(p^′_n\SR)only realizes all consistent sub-trace of size at mostn−1ofOand all sub-traces of sizenofO, that isPnrealizes all consistent sub-traces of size at mostnofO.

Letp^′₁ ∈P^′ be the logic program learned fromv₁ ∈V, and letP = p^′₁. LetR^′ be all rules of the logic programp^′_nsuch that(Bn\ Bn−1)∩b(R^′) 6= ∅. Iteratively adding rules R^′ into P, starting by the logic programp^′₂untilp^′_k, we obtain a logic program that realizes all consistent sub-traces of size at mostkofO. As a result, usingO as input,LFkToutputs a logic program that realizes all consistent traces ofO.

Theorem 4.19(Complexity). LetP be a logic program,Bbe the Herbrand base ofP andBk

be the timed Herbrand base ofP with periodk. LetSbe a multi-valued Markov(k) system with respect toP. Letnbe the number of variable ofS. Letv be the maximal number of value of a variable ofS. LetO be a set of traces of execution ofS. The complexity of learningS from OwithLFkTis respectively:O(n·v^nk+1+|O|)for memory andO( P

T∈O

|T| ·nv^nk+3+|O| · n²k²+n·v^nk+2+n·v^nk+1· |O| ·k)for runtime.

Proof. nisthenumberof possibleheadsof rulesofS.nk is the maximum size of a rule ofS, i.e.

the number of literals in the body; a literal can appear at most one time in the body of a rule.

For each rule head of B there are v^nk possible bodies: each literal can be present or absent from the body. From these preliminaries we conclude that the size of a Markov(k) system S learned byLFkT is at most |S| = n·v^nk. To learn S, LFkTneeds to store kprogramsPi

that are Markov(i) system with respect toP,1 ≤i≤k. The algorithm also needs to store the previously analyzed traces in order to update the considered delay.

Conclusion 1:the memory use ofLFkTisO(

i=1

|Pi|+O) =O(k·^n·v_k^nk+|O|)that is bound byO(n· · ·v^nk+1+|O|).

For each traceT ofO,LFkTextracts|T|pairs of interpretations. For each pair of interpretation (I, J), LFkTinfers an anti-rule ruleR^I_Afor each A ∈ B, A 6∈ J. LFkTcompares eachR^I_A with all rules of each programsPi. There is at most|B| −nanti-rules that can be infered from (I, J)byLFkTand the size of each programPi is bound byO(^n·v_k^nk). Then, the complexity of learning one trace of execution T ∈ O withLFkT isO(|T| · |B| −n·k|Pi|) = O(|T| · nv−n·k^n·v_k^nk) = O(|T| ·n²v^nk+2−n)that is bound byO(|T| ·nv^nk+3). To update the considered delay, the algorithm has to check the delay of each new trace T, this operation belongs to O(|T|²) = (n²k²). And, checking the consistency of a new traces with previous analyzed one is bound byO(|O| ∗n²k²). Merging the programs requires to compare all rules to detect subsumption, it has a complexity ofO(n·v^nk+2). Finally, removing the rules that are not necessary to realizeOrequires to compare each rules with allk-step interpretation ofO, thus it requiresO(n·v^nk+1· |O| ·k).

Conclusion 2:The complexity of learningSfromOwithLFkTisO(P

T∈O

|T| ·nv^nk+3+|O| · n²k²+n·v^nk+2).

ドキュメント内本文 Thesis 総合研究大学院大学学術情報リポジトリ A1800本文 (ページ 94-100)