-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IELR(1) Parser Generation #398
Conversation
2d975c5
to
a49fe82
Compare
A couple months ago I poked around the internet looking for independent IELR(1) implementations, and besides |
a49fe82
to
604737d
Compare
b3078e1
to
0a300f7
Compare
0a300f7
to
0ba3241
Compare
9bf8584
to
7da9676
Compare
On 7da9676 , Lrama can parse CRuby's parse.y at ruby/ruby@028958f . These are the files of parsing results.
|
@@ -92,6 +92,20 @@ def compute | |||
report_duration(:compute_default_reduction) { compute_default_reduction } | |||
end | |||
|
|||
def compute_ielr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: Ref "3.5.4. Algorithm" of the paper.
@@ -524,5 +541,52 @@ def compute_default_reduction | |||
end.first | |||
end | |||
end | |||
|
|||
def split_states | |||
transition_queue = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits]: It seems this variable is not used.
@@ -524,5 +541,52 @@ def compute_default_reduction | |||
end.first | |||
end | |||
end | |||
|
|||
def split_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: "Definition 3.45 (split_states)"
@@ -142,5 +160,274 @@ def rr_conflicts | |||
conflict.type == :reduce_reduce | |||
end | |||
end | |||
|
|||
def propagate_lookaheads(next_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: Definition 3.40 (propagate_lookaheads)
def propagate_lookaheads(next_state) | ||
next_state.kernels.map {|item| | ||
lookahead_sets = | ||
if item.position == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: Corresponding to case 2 of Definition 3.40 because position
is 0-origin in this code base.
def propagate_lookaheads(next_state) | ||
next_state.kernels.map {|item| | ||
lookahead_sets = | ||
if item.position == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits]: I personally like to make the condition order to be aligned with the original paper for readability of the algorithm. For example
if item.position > 1
...
else
...
end
if item.position == 1 | ||
goto_follow_set(item.lhs) | ||
else | ||
kernel = kernels.find {|k| k.predecessor_item_of?(item) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: C[k]
is the kernel (1) whose LHS and RHS are same with the item's one and (2) whose position is item.position - 1
. #predecessor_item_of?
is a method for such calculation.
item_lookahead_set[kernel] | ||
end | ||
|
||
[item, lookahead_sets & next_state.lookahead_set_filters[item]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: This is same with the definition case 1 because case 1 requires tokens which are in both lookahead_set_filters of s'
kernel and item_lookahead_sets of s
kernel.
@@ -92,6 +92,20 @@ def compute | |||
report_duration(:compute_default_reduction) { compute_default_reduction } | |||
end | |||
|
|||
def compute_ielr | |||
report_duration(:split_states) { split_states } | |||
report_duration(:compute_direct_read_sets) { compute_direct_read_sets } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: Compute LA set again, once states are splitted. See Phase 4 and Phase 5 on "3.1. Overview".
!@item_lookahead_set.nil? | ||
end | ||
|
||
def compatible_lookahead?(filtered_lookahead) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: This method is same with "Definition 3.43 (is_compatible)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits]: If so, I prefer to use is_compatible?
as a method name for clarity.
@@ -23,6 +23,12 @@ def initialize(id, accessing_symbol, kernels) | |||
@conflicts = [] | |||
@resolved_conflicts = [] | |||
@default_reduction_rule = nil | |||
@predecessors = [] | |||
@lalr_isocore = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]:
@lalr_isocore
is an implementation oflalr1_isocores
(Definition 3.36)@ielr_isocores
is an implementation ofisocore_nexts
(Definition 3.37)- Their initialization logic is written in Definition 3.45 (split_states).
} | ||
end | ||
|
||
def lookahead_set_filters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: Definition 3.38 (lookahead_set_filters)
[action, nil] | ||
elsif action.is_a?(Reduce) | ||
if action.rule.empty_rule? | ||
[action, lhs_contributions(action.rule.lhs, inadequacy_list.key(actions))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits]: If need to get the key, I like to use inadequacy_list.map
or h = {}; inadequacy_list.each ...
so that the code accesses its key clearly.
|
||
def annotate_predecessor(predecessor) | ||
annotation_list.transform_values {|actions| | ||
token = annotation_list.key(actions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits]: Same with #annotate_manifestation
, it's more clear to use inadequacy_list.map
or h = {}; inadequacy_list.each ...
to access its key.
} | ||
end | ||
|
||
def inadequacy_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: Definition 3.27 (inadequacy_lists). Difference from the original paper is that tuple (state, token, actions)
is used in the original paper but tuple (token, actions)
is used in this implementation.
inadequacy_list.transform_values {|actions| | ||
actions.map {|action| | ||
if action.is_a?(Shift) | ||
[action, nil] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: nil
means "undefined" in the original paper.
if action.is_a?(Shift) | ||
[action, nil] | ||
elsif action.is_a?(Reduce) | ||
if action.rule.empty_rule? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits]: I prefer the order of sub condition follows the same order of the original paper.
if action.rule.empty_rule? | ||
[action, lhs_contributions(action.rule.lhs, inadequacy_list.key(actions))] | ||
else | ||
contributions = kernels.map {|kernel| [kernel, kernel.rule == action.rule && kernel.end_of_rule?] }.to_h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: "C[j] = (l -> r, |r| + 1)" in Definition 3.30 2-(a) means end_of_rule
.
} | ||
end | ||
|
||
def lhs_contributions(sym, token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note]: sym
is the symbol (non-terminal) of the rule's LHS.
end | ||
|
||
def lhs_contributions(sym, token) | ||
shift, next_state = nterm_transitions.find {|sh, _| sh.next_sym == sym } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nits]: Nonterminal transition is called goto. Shift is used for terminal transition.
%token c | ||
%define lr.type ielr | ||
|
||
%% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This grammar example comes from Fig 5 of the original paper. Could clarify it as comments?
@junk0612 I created 2 test cases as reproduction of #398 (comment). |
|
||
def split_states | ||
transition_queue = [] | ||
@states.each do |state| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering it needs to limit the iteration only for the LALR states like @states[0...@states.size].each ...
, in other words do we need to iterate states which are created by #compute_state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be my misunderstandings. We may need to iterate all states which includes newly created one.
To reproduce mysterious behaviors for ruby parse.y, this commit add 2 test cases releated these behaviors. 1. "states/ielr_prec.y" causes meaningless state split. All conflicts are resovled with only LALR, however `#compute_ielr` splits state 8 to state 11, state 10 to state 12 and changes state 10 moves to state 11 with "relop". 2. "states/ielr_nonassoc.y" causes duplicated "error" on state 6. this commit is picked from yui-knk@ca288c8
25ef188
to
fead1a4
Compare
Support the generation of the IELR(1) parser described in this paper.
https://www.sciencedirect.com/science/article/pii/S0167642309001191
This PR will be ready for merging when the following is done.