Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: #3758 Extrememly poor performance on poorly written grammars #3880

Merged
merged 2 commits into from
Nov 20, 2022
Merged

fix: #3758 Extrememly poor performance on poorly written grammars #3880

merged 2 commits into from
Nov 20, 2022

Conversation

jimidle
Copy link
Collaborator

@jimidle jimidle commented Sep 10, 2022

fix: #3875 Extremely poor performance on poorly written grammars

The problem was once again a poorly written hash function, which caused ATN config sets to
hash poorly. The poor grammar caused many thousands of DFA states to hash to the same
bucket. In your average grammar, this situation would not arise.

For now, I have changed that hash function to use the memory address of the slice that
contains the configs, though i think that this is sub-optimal. I will revisit this as part of a
general attack at further performance gains.

The problem was once again a poorly written hash function, which caused ATN config sets to
hash poorly. The poor grammar caused many thousands of DFA states to hash to the same
bucket. In your average grammar, this situation would not arise.

For now, I have changed that hash function to use the memory address of the slice that
contains the configs, though i think that this is sub-optimal.  I will revist this as part of a
general attack at further performance gains.

Signed-off-by: Jim.Idle <jimi@gatherstars.com>
@jimidle
Copy link
Collaborator Author

jimidle commented Sep 12, 2022

@parrt Ready to merge when you get a chance

@parrt
Copy link
Member

parrt commented Sep 15, 2022

hash function to use the memory address of the slice

But won't that cause no sharing of any kind? I believe two DFA states with the exact same content should get the same hash otherwise it will look like no DFA state has ever been seen before and will prevent DFA usage I think.

@jimidle
Copy link
Collaborator Author

jimidle commented Sep 19, 2022

Yes, but what had been done was that they all got exactly the same hash - I have to come back to this one I think. Java uses the hash of the enclosing set object. Go does not have that and it was using the hash of each config in the set. I don't think that using the address is quite right, but I have not thought of a better way to reflect Java hash just yet. I do not like using the address because even though Go does not relocate when garbage collecting, it might not always be the case.

We can leave this unmerged for now until I get a little time to re-look. It is the only place where it seems a bit tricky to get hashing working like in Java.

Signed-off-by: Jim.Idle <jimi@gatherstars.com>
@jimidle
Copy link
Collaborator Author

jimidle commented Sep 20, 2022

@parrt This last change fixes Hash() and optimizes allocations of comparators. DFA reuse works as it should.

@parrt
Copy link
Member

parrt commented Nov 20, 2022

Ah. missed the update. I'll try it out.

@parrt parrt added this to the 4.11.2 milestone Nov 20, 2022
@parrt parrt merged commit 1e377b4 into antlr:dev Nov 20, 2022
@parrt
Copy link
Member

parrt commented Nov 20, 2022

Works! Great. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants