Formatter: Recycle parsed AST and tokens in between rule invocations when no correction was applied to improve performance#1462
Merged
bergmeister merged 5 commits intoPowerShell:masterfrom Apr 28, 2020
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Summary
This saves on the expensive calls to the Parser and the rule itself, which are called at the moment for every rule invocation AND after every applied
DiagnosticRecord. With this PR, parsing is skipped and the previous result is used if a rule did not return aDiagnosticRecordthat changed the analysed script definition string. If there is notDiagnosticRecordreturned from a rule during formatting, this simple change can dramatically speed up the formatting time: In the case of a pre-formatted file (always using PowerShell'sbuild.psm1for reference), formatting is 3 times faster. In reality, I guess it's only around 50%-100% depending on how well formatter the script already is. Because I had to change publicly exposed methods, it is technically speaking a breaking change but I think we can live with that.Since I haven't written this core logic, I can only guess it was written like this because applying a change invalidates the previous analysis and this explains why one cannot simply run the rules in parallel. However, I imagine, one could write code to not re-parse at all between applying a list of
DiagnosticRecords by adapting the line and column numbers (which would also allow for parallelisation of the rules). At the moment one could probably even improve performance further by stopping rule analysis once the first object comes back due to this re-analysis pattern. I've actually tried that locally but it would've broken a few tests so maybe not as easy as it seems to be therefore let's leave that for later now though.PR Checklist
.cs,.ps1and.psm1files have the correct copyright headerWIP:to the beginning of the title and remove the prefix when the PR is ready.