4.3 Left-corner Dependency Parsing
4.3.2 Transition system
Stack: Buffer:
w2
w1 x
w3w5 w4
w6 w7
w8, w9,· · · σ= [h2, x({3,5})i,h6,7i]
β= [8,9,· · · , n]
A={(2,1),(5,4),(6,7)}
Figure 4.3: Example configuration of a left-corner transition system.
SHIFT (σ, j|β, A)7→(σ|hji, β, A)
INSERT (σ|hσ′1|i|x(λ)i), j|β, A)7→(σ|hσ1′|i|ji, β, A∪ {(i, j)} ∪ {∪k∈λ(j, k)}
LEFTPRED (σ|hσ11,· · · i, β, A)7→(σ|hx(σ11)i, β, A) RIGHTPRED (σ|hσ11,· · · i, β, A)7→(σ|hσ11, x(∅)i, β, A)
LEFTCOMP (σ|hσ′2|x(λ)i|hσ11,· · · i, β, A)7→(σ|hσ′2|x(λ∪ {σ11})i, β, A)
RIGHTCOMP (σ|hσ′2|x(λ)i|hσ11,· · · i, β, A)7→(σ|hσ′2|σ11|x(∅)i, β, A∪ {∪k∈λ(σ11, k)}) Figure 4.4: Actions of the left-corner transition system including two shift operations (top) and reduce operations (bottom).
Shift Action As in the left-corner PDA, SHIFTmoves a token from the top of the buffer to the stack. INSERTcorresponds to the SCANoperation of the PDA, and replaces a dummy node on the top of the stack with a token from the top of the buffer. Note that before doing a shift action, a dummyxcan be replaced by any words, meaning that arcs from and toxare unspecified. This is the key to achieve incremental parsing (see the beginning of this chapter). It is INSERTthat these arcs are first specified, by filling the dummy node with an actual token. As in the left-corner PDA, the top element of the stack must be complete after a shift action.
Reduce Action As in the left-corner PDA, a reduce action is applied when the top element of the stack is complete, and changes it to an incomplete element.
LEFTPRED and RIGHTPRED correspond to PREDICTION in the left-corner PDA. Figure 4.5 describes the transparent relationships between them. LEFTPREDmakes the head of the top stack element (i.e.,σ11) as a left dependent of a new dummyx, while RIGHTPREDpredicts a dummyx as a right dependent ofa. In these actions, if we think the original and resulting dependency forms in CFG, the correspondence to PREDICTION in the PDA is apparent. Specifically, the CFG forms of the resulting trees in both actions are the same. The only difference is the head label of the parent symbol, which isxin LEFTPREDwhileain RIGHTPRED.
A similar correspondence holds between RIGHTCOMP, LEFTCOMP, and COMPOSITIONin the PDA. We can interpret LEFTCOMP as two step operations as in COMPOSITION in the PDA (see Section 2.2.3): It first applies LEFTPRED to the top stack element, resulting in x
a , and then unifies twoxs to comprise a subtree. The connection to COMPOSITIONis apparent from the figure.
RIGHTCOMP, on the other hand, first applies RIGHTPREDtoa, and then combines the resulting tree and the second top element on the stack. This step is a bit involved, which might be easier to
LEFTPRED: RIGHTPRED: a
x a
X[a]
a X[x]
X[x]
X[a]
a
a a
x
X[a]
a X[a]
X[x]
X[a]
a
LEFTCOMP: RIGHTCOMP:
b a x b
x a
X[a]
a X[b]
X[x]
X[b]
b X[b]
X[x]
X[x]
X[a]
a X[b]
b
b a x b
a x
X[a]
a X[b]
X[x]
X[b]
b X[b]
X[a]
X[x]
X[a]
a X[b]
b
Figure 4.5: Correspondences of reduce actions between dependency and CFG. We only show min-imal example subtrees for simplicity. However,acan have an arbitrary number of children, so can borx, providedxis on a right spine and has no right children.
understand with the CFG form. On the CFG form, two unified nodes are X[x], which is predicted top-down, and X[a], which is recognized bottom-up (with the first RIGHTPREDstep).3 Sincexcan be unified with any tokens, at this point,xin X[x] is filled witha. Returning to dependency, this means that we insert the subtree rooted ata(after being applied RIGHTPRED) into the position of xin the second top element.
Note that from Figure 4.5, we can see that the dummy nodexcan only appears in a right spine of a CFG subtree. Now, we can reinterpret INSERT action on the CFG subtree, which attaches a token to a (predicted) preterminal X[x], as in SCAN of the PDA, and then fills every x in a right spine with a shifted token. This can be seen as a kind of unification operation.
Relationship to the left-corner PDA As we have seen so far, though our transition system di-rectly operates dependency trees, we can associate every step with a process to expand (partial) CFG parses as in the manner that the left-corner PDA would do.4 In every step, the transparency
3If the parent of X[x] is also X[x], all of them are filled witharecursively. This situation corresponds to the case in which dummy nodexin the dependency tree has multiple left dependents, as in the resulting tree by LEFTCOMPin Figure 4.5.
4To be precise, we note that this CFG-based expansion process cannot be written in the form used in the left-corner PDA. For example, if we write items in LEFTCOMPand RIGHTCOMPin Figure 4.5 in the form ofA/B, both results in the same transition: X[b]/X[x] X[a]7→X[b]/X[x]. This is due to our use of a dummy nodex, which plays different roles
Step Action Stack (σ) Buffer (β) Set of arcs (A)
ε a b c d ∅
1 SHIFT hai b c d ∅
2 RIGHTPRED ha, x(∅)i b c d ∅
3 SHIFT ha, x(∅)ihbi c d ∅
4 RIGHTPRED ha, x(∅)ihb, x(∅)i c d ∅
5 INSERT ha, x(∅)ihb, ci d (b, c)
6 RIGHTCOMP ha, b, x(∅)i d (b, c),(a, b)
7 INSERT ha, b, di (b, c),(a, b),(b, d)
Figure 4.6: An example parsing process of the left-corner transition system.
of two tree forms, i.e., dependency and CFG, is always preserved. This means that at the final configuration the CFG parse would be the one corresponding to the resulting dependency tree, and also at each step the stack depth is identical to the one that is incurred during parsing the final CFG parse with the original left-corner PDA. We will see this transparency with an example next. The connection between the stack depth and the degree of center-embedding, that we established in The-orem 2.1 for the PDA, also directly holds in this transition system. We restate this for our transition system in Section 4.3.4.
Example For an example, Figure 4.6 shows the transition of configurations during parsing a tree aybyc d, which corresponds to the parse in Figure 2.10(b) and thus involves one degree of center-embedding. Comparing to Figure 2.14, we can see that two transition sequences for the PDA (for CFG) and the transition system (for dependency) are essentially the same: the differences are that PREDICTION and COMPOSITION are changed to the corresponding actions (in this case, RIGHT -PREDand RIGTHCOMP) and SCANis replaced with INSERT. This is essentially due to the trans-parent relationships between them that we discussed above. As in Figure 2.14, the stack depth two after a reduce action indicates center-embedding, which is step 4.
Other properties As in the left-corner PDA, this transition system also performs shift and reduce actions alternately (the proof is almost identical to the case of PDA). Also, given a sentence of length n, the number of actions required to arrive at the final configuration is2n−1, because every token except the last word must be shifted once and reduced once, and the last word is always inserted as the final step.
Every projective dependency tree is derived from at least one transition sequence with this sys-tem, i.e., our system iscompletefor the class of projective dependency trees (Nivre, 2008). Though we omit the proof, this can be shown by appealing to the transparency between the transition system and the left-corner PDA, which is complete for a given CFG.
in two actions (e.g., RIGTHCOMPassumes the firstxisa) but the difference is lost with this notation. We thus claim that a CFG-based expansion step corresponds to a step in the left-corner PDA in that every action in the former expands the tree in the same way as the corresponding action of the left-corner PDA, as explained by Figure 4.5 (for reduce actions) and the body (for INSERT); the equivalence of SHIFTis apparent.
However, our system is unsound for the class of projective dependency trees, meaning that a transition sequence on a sentence does not always generate a valid projective dependency tree. We can easily verify this claim with an example. Leta b cbe a sentence and consider the action sequence
“SHIFTLEFTPRED SHIFT LEFTPRED INSERT” with which we obtain the terminal configuration ofσ = [x(a), c];β= [];A={(b, c)}5, but this is not a valid tree. The arc-eager system also suffers from a similar restriction (Nivre and Fernández-González, 2014), which may lead to lower parse accuracy. Instead of fixing this problem, in our parsing experiment, which is described in Section 4.5, we implement simple post-processing heuristics to combine those fragmented trees that remain on the stack.