Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{WIP} (GH-813) Use AST for Syntax Folder #806

Closed
wants to merge 2 commits into from

Conversation

glennsarti
Copy link
Contributor

@glennsarti glennsarti commented Dec 7, 2018

Requires #825 to be merged first and that merged back to the 2.0.0 branch

DO NOT MERGE

The AST contains the most correct version of how a script is interpreted. This
includes regions of text. Currently the code folder only uses the Tokens which
requires the folder to re-implement some of the AST behaviour e.g. matching
token pairs for arrays etc. The code folder should be implemented using as much
of the AST as possible. This commit;

  • Moves most of the region detection to use the AST Extents and uses a new
    FindFoldsASTVisitor.
  • Modifies the tests and language server to use the new method fold detection
    class.

Managed to go from 32ms to 20ms

@glennsarti
Copy link
Contributor Author

Because the AST drops comments we need to use a combination of AST and Tokens. The AST+tokens method now has parity in features with the tokens only method. Initial performance tests;

Scenario

The same script is give to both folders. It should contain 4096 regions. This is a VERY LARGE number and is considered an edge case. However the large number should show differences easier.

The script is passed through both folders 20 times, but the first result is ignored. Due to C# internal "stuff" the first time it runs the duration is 130+ms but subsequent runs are < 40ms

Results

  • The average time for token-only processing is 32ms whereas the AST+token method is 30ms. While there is a difference, the spread of results, and the ability of a user to perceive a 2ms duration. I consider these equal.

  • The AST profiling showed that the greatest time taken is during the comments processing in the tokens ~66%. Future speed improvements should focus here.

Raw Data;

**************
--- Tokens Mode
**************
Regions: 4096
Duration: 150.8164ms
**************
Regions: 4096
Duration: 34.9555ms
**************
Regions: 4096
Duration: 43.0018ms
**************
Regions: 4096
Duration: 31.0003ms
**************
Regions: 4096
Duration: 33.9498ms
**************
Regions: 4096
Duration: 32.0198ms
**************
Regions: 4096
Duration: 33.858ms
**************
Regions: 4096
Duration: 29.0313ms
**************
Regions: 4096
Duration: 31.2296ms
**************
Regions: 4096
Duration: 32.997ms
**************
Regions: 4096
Duration: 32.0299ms
**************
Regions: 4096
Duration: 29.9988ms
**************
Regions: 4096
Duration: 33.9566ms
**************
Regions: 4096
Duration: 31.936ms
**************
Regions: 4096
Duration: 33.442ms
**************
Regions: 4096
Duration: 29.0231ms
**************
Regions: 4096
Duration: 29.0252ms
**************
Regions: 4096
Duration: 32.0295ms
**************
Regions: 4096
Duration: 32.0215ms
**************
Regions: 4096
Duration: 30.3724ms
Average Duration=32.4146368421053ms
**************
--- AST Mode
**************
AST Time=5.9918ms  Tokens Time=23.2852ms  Post Process Time=2.9986ms  LastLine Time=0
Regions: 4096
Duration: 34.3116ms
**************
AST Time=3.0402ms  Tokens Time=23.9945ms  Post Process Time=3.9651ms  LastLine Time=0
Regions: 4096
Duration: 30.9998ms
**************
AST Time=3.0255ms  Tokens Time=24.0053ms  Post Process Time=4.9631ms  LastLine Time=0
Regions: 4096
Duration: 31.9939ms
**************
AST Time=3.0635ms  Tokens Time=20.9375ms  Post Process Time=4.9969ms  LastLine Time=0
Regions: 4096
Duration: 30.0037ms
**************
AST Time=3.9962ms  Tokens Time=21.0421ms  Post Process Time=3.9663ms  LastLine Time=0
Regions: 4096
Duration: 30.0677ms
**************
AST Time=3.0098ms  Tokens Time=21.3444ms  Post Process Time=2.9644ms  LastLine Time=0
Regions: 4096
Duration: 27.3186ms
**************
AST Time=3.9781ms  Tokens Time=23.0171ms  Post Process Time=2.9937ms  LastLine Time=0
Regions: 4096
Duration: 30.9905ms
**************
AST Time=2.9973ms  Tokens Time=20.0569ms  Post Process Time=3.0119ms  LastLine Time=0
Regions: 4096
Duration: 26.0661ms
**************
AST Time=3.9749ms  Tokens Time=21.8788ms  Post Process Time=3.9959ms  LastLine Time=0
Regions: 4096
Duration: 29.8496ms
**************
AST Time=3.025ms  Tokens Time=19.942ms  Post Process Time=4.9906ms  LastLine Time=0
Regions: 4096
Duration: 27.9576ms
**************
AST Time=2.965ms  Tokens Time=20.0189ms  Post Process Time=5.0233ms  LastLine Time=0
Regions: 4096
Duration: 28.0072ms
**************
AST Time=2.9957ms  Tokens Time=21.0002ms  Post Process Time=3.0115ms  LastLine Time=0
Regions: 4096
Duration: 28.0071ms
**************
AST Time=3.9948ms  Tokens Time=20.0004ms  Post Process Time=2.9992ms  LastLine Time=0
Regions: 4096
Duration: 26.9944ms
**************
AST Time=2.9503ms  Tokens Time=19.0582ms  Post Process Time=3.9449ms  LastLine Time=0
Regions: 4096
Duration: 25.9534ms
**************
AST Time=2.9971ms  Tokens Time=25.0724ms  Post Process Time=3.9313ms  LastLine Time=0
Regions: 4096
Duration: 32.0008ms
**************
AST Time=2.9616ms  Tokens Time=20.9942ms  Post Process Time=5.0282ms  LastLine Time=0
Regions: 4096
Duration: 28.984ms
**************
AST Time=2.9947ms  Tokens Time=20.0403ms  Post Process Time=3.975ms  LastLine Time=0
Regions: 4096
Duration: 27.01ms
**************
AST Time=2.9958ms  Tokens Time=21.9983ms  Post Process Time=2.9993ms  LastLine Time=0
Regions: 4096
Duration: 27.9934ms
**************
AST Time=3.9941ms  Tokens Time=20.947ms  Post Process Time=2.9973ms  LastLine Time=0
Regions: 4096
Duration: 27.9384ms
**************
AST Time=4.0074ms  Tokens Time=21.9958ms  Post Process Time=3.0579ms  LastLine Time=0
Regions: 4096
Duration: 30.0008ms
Average Duration=28.8493157894737ms

@glennsarti
Copy link
Contributor Author

So after some playing around. Enumerating over tokens is relatively cheap ~1-2ms. However running regexs is where the processing time is being taken up. After merging the three comment extraction methods into one, I could remove one of the three regexes.

In the performance example I'm using, I managed to shave off 8ms. Bringing the total to 10ms.

**************
--- AST Mode
**************
AST Time=6.0034ms  Tokens Time=17.0162ms  Post Process Time=5.0015ms  LastLine Time=0.9993
Regions: 4096
Duration: 30.0184ms
**************
AST Time=1.9973ms  Tokens Time=10.004ms  Post Process Time=2.9974ms  LastLine Time=0
Regions: 4096
Duration: 15.9984ms
**************
AST Time=3.0015ms  Tokens Time=12.9987ms  Post Process Time=3.0585ms  LastLine Time=0
Regions: 4096
Duration: 20.0478ms
**************
AST Time=5.031ms  Tokens Time=8.9539ms  Post Process Time=3.0001ms  LastLine Time=0
Regions: 4096
Duration: 16.985ms
**************
AST Time=5.0027ms  Tokens Time=9.9993ms  Post Process Time=4.9937ms  LastLine Time=0
Regions: 4096
Duration: 20.9993ms
**************
AST Time=2.9983ms  Tokens Time=13.5376ms  Post Process Time=2.9997ms  LastLine Time=0
Regions: 4096
Duration: 20.5355ms
**************
AST Time=5.0201ms  Tokens Time=8.9934ms  Post Process Time=3.0121ms  LastLine Time=0
Regions: 4096
Duration: 18.0136ms
**************
AST Time=4.9883ms  Tokens Time=9.9976ms  Post Process Time=5.0662ms  LastLine Time=0
Regions: 4096
Duration: 21.0301ms
**************
AST Time=3.0012ms  Tokens Time=8.999ms  Post Process Time=4.033ms  LastLine Time=0
Regions: 4096
Duration: 16.0332ms
**************
AST Time=4.9813ms  Tokens Time=13ms  Post Process Time=3.0018ms  LastLine Time=0
Regions: 4096
Duration: 21.9839ms
**************
AST Time=5.0048ms  Tokens Time=9.0367ms  Post Process Time=3.0004ms  LastLine Time=0
Regions: 4096
Duration: 18.0069ms
**************
AST Time=4.047ms  Tokens Time=7.9539ms  Post Process Time=5.0069ms  LastLine Time=0
Regions: 4096
Duration: 18.0248ms
**************
AST Time=3.9992ms  Tokens Time=12.0011ms  Post Process Time=3.9977ms  LastLine Time=0
Regions: 4096
Duration: 19.998ms
**************
AST Time=3.0395ms  Tokens Time=10.9603ms  Post Process Time=2.9978ms  LastLine Time=0
Regions: 4096
Duration: 16.9976ms
**************
AST Time=4.9982ms  Tokens Time=7.9584ms  Post Process Time=3.5664ms  LastLine Time=1.0087
Regions: 4096
Duration: 17.5317ms
**************
AST Time=7.9965ms  Tokens Time=16.9999ms  Post Process Time=7.0079ms  LastLine Time=0
Regions: 4096
Duration: 33.0123ms
**************
AST Time=3.0052ms  Tokens Time=11.9927ms  Post Process Time=3.0038ms  LastLine Time=0.9977
Regions: 4096
Duration: 18.9994ms
**************
AST Time=6.0071ms  Tokens Time=11.9919ms  Post Process Time=6.0256ms  LastLine Time=0
Regions: 4096
Duration: 24.0246ms
**************
AST Time=1.9994ms  Tokens Time=10.0014ms  Post Process Time=6.997ms  LastLine Time=0
Regions: 4096
Duration: 19.9984ms
**************
AST Time=1.9989ms  Tokens Time=16.5137ms  Post Process Time=4.0035ms  LastLine Time=0
Regions: 4096
Duration: 23.525ms
Average Duration=20.0918684210526ms

@glennsarti
Copy link
Contributor Author

I tried using a hashtable instead of an array to hold the foldable regions but it only gained 1ms at best.

@glennsarti glennsarti force-pushed the spike-try-ast branch 2 times, most recently from a6f9341 to 22ead9d Compare December 9, 2018 13:25
@glennsarti glennsarti changed the title {WIP} (TODO) Use AST for Syntax Folder {WIP} (GH-813) Use AST for Syntax Folder Dec 9, 2018
@glennsarti glennsarti force-pushed the spike-try-ast branch 2 times, most recently from b24e7c4 to ffe2485 Compare December 12, 2018 05:58
@glennsarti glennsarti changed the title {WIP} (GH-813) Use AST for Syntax Folder (GH-813) Use AST for Syntax Folder Dec 12, 2018
@glennsarti
Copy link
Contributor Author

Rebased. Ready for merge

ping @rjmholt

Copy link
Contributor

@rjmholt rjmholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great. I've left a few comments

src/PowerShellEditorServices/Language/AstOperations.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/TokenOperations.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/TokenOperations.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/TokenOperations.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/TokenOperations.cs Outdated Show resolved Hide resolved
@glennsarti
Copy link
Contributor Author

@rjmholt Managed to fix most of your requests. Only 2 or 3 outstanding questions. I think your suggestions will also have had a minor speed improvement too. Haven't taken the time to prove it yet.

@glennsarti
Copy link
Contributor Author

Bouncing the PR due to transient Appveyor failure in DebuggerBreaksWhenRequested [FAIL].

@glennsarti glennsarti closed this Dec 12, 2018
@glennsarti glennsarti reopened this Dec 12, 2018
@glennsarti
Copy link
Contributor Author

CI is green ready for review.

src/PowerShellEditorServices/Language/TokenOperations.cs Outdated Show resolved Hide resolved
{
tokenCommentRegionStack.Push(token);
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a newline here and in places below where new condition blocks begin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced on this. I kept them grouped because they are all to do with # region detection. I didn't refactor to a method because I merged two functions together to create this. Otherwise I end up doing too many regexes and comparions.

@glennsarti glennsarti changed the title (GH-813) Use AST for Syntax Folder {WIP} (GH-813) Use AST for Syntax Folder Dec 13, 2018
@glennsarti
Copy link
Contributor Author

Bouncing the PR to force appveyor to re-run

@glennsarti glennsarti closed this Dec 13, 2018
@glennsarti glennsarti reopened this Dec 13, 2018
@glennsarti
Copy link
Contributor Author

Bouncing the PR to force appveyor to re-run

@glennsarti glennsarti closed this Dec 13, 2018
@glennsarti glennsarti reopened this Dec 13, 2018
Copy link
Collaborator

@SeeminglyScience SeeminglyScience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Got some optional style feedback to bring it in line with some of the other newer files.

Only one major thing I'd like to see addressed before merge (the comment about class support at the top of FindFoldsVisitor)

int endLineOffset = 0;
// If we're showing the last line, decrement the Endline of all regions by one.
if (this.currentSettings.CodeFolding.ShowLastLine) { endLineOffset = -1; }
foreach (FoldingReference fold in FoldingOperations.FoldableRegions(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a line break in the foreach, you can also save it to a local variable first. That'll be optimized out by the compiler in a release build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you mean here.... do you mean;

var temp = FoldingOperations.FoldableRegions(script.ScriptTokens,  script.ScriptAst)
foreach (FoldingReference fold in temp)
{
....

If so what is that actually getting us?

src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FoldingReference.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FoldingReference.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
src/PowerShellEditorServices/Language/FindFoldsVisitor.cs Outdated Show resolved Hide resolved
/// <summary>
/// The visitor used to find the all folding regions in an AST
/// </summary>
internal class FindFoldsVisitor : AstVisitor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I sound like a broken record, but we're sure this works with classes right? I would have expected this to either need to inherit AstVisitor2 or dip into tokens for classes. Can we get a basic class example added to the tests? Something like

class TestClass {
    [string[]] $TestProperty = @(
        'first',
        'second',
        'third')

    [string] TestMethod() {
        return $this.TestProperty[0]
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp...this is a problem....works on master but not after this PR.... More work for me to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we did indeed need to use ASTVisitor2 for the VisitTypeDefinition method. Added a test for this and is now passing.

Fixed.

The AST contains the most correct version of how a script is interpreted. This
includes regions of text. Currently the code folder only uses the Tokens which
requires the folder to re-implement some of the AST behaviour e.g. matching
token pairs for arrays etc.  The code folder should be implemented using as much
of the AST as possible.  This commit;

* Moves most of the region detection to use the AST Extents and uses a new
  FindFoldsASTVisitor.
* Modifies the tests and language server to use the new method fold detection
  class.
* Moved the code to modify the end line of folding regions to the language
  server code.
…o it's own class

Previously the folding provider created many intermediate arrays and lists and
required post-processing.  This commit changes the behaviour to use an
accumlator patter with an extended Dictionary class.  This new class adds a
`SafeAdd` method to add FoldingRanges, which then has the logic to determine if
the range should indeed be added, for example, passing nulls or pre-existing
larger ranges.

By passing around this list using ByReference we can avoid creating many objects
which are just then thrown away.

This commit also moves the ShowLastLine code from the FoldingProvider into the
Language Server.  This reduces the number of array enumerations to one.
@glennsarti
Copy link
Contributor Author

We can't use the AST as it's heavily dependant on PS version. e.g. I need to use the latest AST Visitor but it doesn't exist in PS3.

This means I need to use tokens, not the AST. I'll raise a new PR with a faster token processor.

@TylerLeonhardt
Copy link
Member

If you would like, you could send this over to the 2.0.0 branch. There, we are only supporting 5.1+

@glennsarti
Copy link
Contributor Author

@TylerLeonhardt Yeah I need ASTVisitor2 :-(

@glennsarti glennsarti changed the base branch from master to 2.0.0 December 14, 2018 02:36
@glennsarti glennsarti reopened this Dec 14, 2018
@glennsarti
Copy link
Contributor Author

Retargetting to 2.0.0... I expect a WHOLE bunch of issues till I rebase.

@TylerLeonhardt
Copy link
Member

Yeah let us rebase 2.0.0 to master. That will help

@glennsarti
Copy link
Contributor Author

Closed in preference to #853

@glennsarti glennsarti closed this Jan 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants