Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IELR(1) Parser Generation #398

Merged
merged 8 commits into from
Jan 18, 2025

Conversation

junk0612
Copy link
Collaborator

@junk0612 junk0612 commented Apr 23, 2024

Support the generation of the IELR(1) parser described in this paper.
https://www.sciencedirect.com/science/article/pii/S0167642309001191

This PR will be ready for merging when the following is done.

  • Add tests
    • Currently, I have only created a grammar file and verified its operation in my local environment.
  • Refactor
    • The data structure is awful and needs to be improved because the priority was to achieve the expected results.

@junk0612 junk0612 force-pushed the generate_ielr_parser branch from 2d975c5 to a49fe82 Compare June 25, 2024 17:19
@jasone
Copy link

jasone commented Jul 26, 2024

A couple months ago I poked around the internet looking for independent IELR(1) implementations, and besides lrama found only an inactive rust-based attempt. Recently I completed my own implementation and wrote a technical report with the hope that it might help others. IELR(1) turns out to be a real bear to figure out, even if it takes only a moderate amount of code. I hope you find the time and inspiration to complete this effort. Good luck!

@junk0612 junk0612 force-pushed the generate_ielr_parser branch from a49fe82 to 604737d Compare September 4, 2024 06:28
@junk0612 junk0612 force-pushed the generate_ielr_parser branch 7 times, most recently from b3078e1 to 0a300f7 Compare September 25, 2024 18:33
@junk0612 junk0612 marked this pull request as ready for review September 25, 2024 18:33
@junk0612 junk0612 requested review from ydah and yui-knk September 25, 2024 18:33
@junk0612 junk0612 force-pushed the generate_ielr_parser branch from 0a300f7 to 0ba3241 Compare October 11, 2024 09:42
@junk0612 junk0612 force-pushed the generate_ielr_parser branch from 9bf8584 to 7da9676 Compare November 14, 2024 05:10
@junk0612
Copy link
Collaborator Author

junk0612 commented Nov 14, 2024

On 7da9676 , Lrama can parse CRuby's parse.y at ruby/ruby@028958f .

These are the files of parsing results.
ruby path/to/id2token path/to/parse.y | bundle exec lrama -v - lalr
lalr.output

ruby path/to/id2token path/to/parse.y | bundle exec lrama -D lr.type=ielr -v - ielr
ielr.output

@@ -92,6 +92,20 @@ def compute
report_duration(:compute_default_reduction) { compute_default_reduction }
end

def compute_ielr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: Ref "3.5.4. Algorithm" of the paper.

@@ -524,5 +541,52 @@ def compute_default_reduction
end.first
end
end

def split_states
transition_queue = []
Copy link
Collaborator

@yui-knk yui-knk Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits]: It seems this variable is not used.

@@ -524,5 +541,52 @@ def compute_default_reduction
end.first
end
end

def split_states
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: "Definition 3.45 (split_states)"

@@ -142,5 +160,274 @@ def rr_conflicts
conflict.type == :reduce_reduce
end
end

def propagate_lookaheads(next_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: Definition 3.40 (propagate_lookaheads)

def propagate_lookaheads(next_state)
next_state.kernels.map {|item|
lookahead_sets =
if item.position == 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: Corresponding to case 2 of Definition 3.40 because position is 0-origin in this code base.

def propagate_lookaheads(next_state)
next_state.kernels.map {|item|
lookahead_sets =
if item.position == 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits]: I personally like to make the condition order to be aligned with the original paper for readability of the algorithm. For example

if item.position > 1
  ...
else
  ...
end

if item.position == 1
goto_follow_set(item.lhs)
else
kernel = kernels.find {|k| k.predecessor_item_of?(item) }
Copy link
Collaborator

@yui-knk yui-knk Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: C[k] is the kernel (1) whose LHS and RHS are same with the item's one and (2) whose position is item.position - 1. #predecessor_item_of? is a method for such calculation.

item_lookahead_set[kernel]
end

[item, lookahead_sets & next_state.lookahead_set_filters[item]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: This is same with the definition case 1 because case 1 requires tokens which are in both lookahead_set_filters of s' kernel and item_lookahead_sets of s kernel.

@@ -92,6 +92,20 @@ def compute
report_duration(:compute_default_reduction) { compute_default_reduction }
end

def compute_ielr
report_duration(:split_states) { split_states }
report_duration(:compute_direct_read_sets) { compute_direct_read_sets }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: Compute LA set again, once states are splitted. See Phase 4 and Phase 5 on "3.1. Overview".

!@item_lookahead_set.nil?
end

def compatible_lookahead?(filtered_lookahead)
Copy link
Collaborator

@yui-knk yui-knk Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: This method is same with "Definition 3.43 (is_compatible)".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits]: If so, I prefer to use is_compatible? as a method name for clarity.

@yui-knk yui-knk mentioned this pull request Jan 13, 2025
@@ -23,6 +23,12 @@ def initialize(id, accessing_symbol, kernels)
@conflicts = []
@resolved_conflicts = []
@default_reduction_rule = nil
@predecessors = []
@lalr_isocore = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]:

  • @lalr_isocore is an implementation of lalr1_isocores (Definition 3.36)
  • @ielr_isocores is an implementation of isocore_nexts (Definition 3.37)
  • Their initialization logic is written in Definition 3.45 (split_states).

}
end

def lookahead_set_filters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: Definition 3.38 (lookahead_set_filters)

[action, nil]
elsif action.is_a?(Reduce)
if action.rule.empty_rule?
[action, lhs_contributions(action.rule.lhs, inadequacy_list.key(actions))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits]: If need to get the key, I like to use inadequacy_list.map or h = {}; inadequacy_list.each ... so that the code accesses its key clearly.


def annotate_predecessor(predecessor)
annotation_list.transform_values {|actions|
token = annotation_list.key(actions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits]: Same with #annotate_manifestation, it's more clear to use inadequacy_list.map or h = {}; inadequacy_list.each ... to access its key.

}
end

def inadequacy_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: Definition 3.27 (inadequacy_lists). Difference from the original paper is that tuple (state, token, actions) is used in the original paper but tuple (token, actions) is used in this implementation.

inadequacy_list.transform_values {|actions|
actions.map {|action|
if action.is_a?(Shift)
[action, nil]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: nil means "undefined" in the original paper.

if action.is_a?(Shift)
[action, nil]
elsif action.is_a?(Reduce)
if action.rule.empty_rule?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits]: I prefer the order of sub condition follows the same order of the original paper.

if action.rule.empty_rule?
[action, lhs_contributions(action.rule.lhs, inadequacy_list.key(actions))]
else
contributions = kernels.map {|kernel| [kernel, kernel.rule == action.rule && kernel.end_of_rule?] }.to_h
Copy link
Collaborator

@yui-knk yui-knk Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: "C[j] = (l -> r, |r| + 1)" in Definition 3.30 2-(a) means end_of_rule.

}
end

def lhs_contributions(sym, token)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note]: sym is the symbol (non-terminal) of the rule's LHS.

end

def lhs_contributions(sym, token)
shift, next_state = nterm_transitions.find {|sh, _| sh.next_sym == sym }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits]: Nonterminal transition is called goto. Shift is used for terminal transition.

%token c
%define lr.type ielr

%%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This grammar example comes from Fig 5 of the original paper. Could clarify it as comments?

@yui-knk
Copy link
Collaborator

yui-knk commented Jan 14, 2025

@junk0612 I created 2 test cases as reproduction of #398 (comment).
See yui-knk@ca288c8


def split_states
transition_queue = []
@states.each do |state|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering it needs to limit the iteration only for the LALR states like @states[0...@states.size].each ..., in other words do we need to iterate states which are created by #compute_state?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be my misunderstandings. We may need to iterate all states which includes newly created one.

To reproduce mysterious behaviors for ruby parse.y,
this commit add 2 test cases releated these behaviors.

1. "states/ielr_prec.y" causes meaningless state split.
   All conflicts are resovled with only LALR, however `#compute_ielr`
   splits state 8 to state 11, state 10 to state 12 and changes state 10
   moves to state 11 with "relop".

2. "states/ielr_nonassoc.y" causes duplicated "error" on state 6.

this commit is picked from
yui-knk@ca288c8
@junk0612 junk0612 force-pushed the generate_ielr_parser branch from 25ef188 to fead1a4 Compare January 16, 2025 13:43
@junk0612 junk0612 merged commit b1081fb into ruby:master Jan 18, 2025
22 checks passed
@junk0612 junk0612 deleted the generate_ielr_parser branch January 18, 2025 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants