-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata should not depend on (absolute) text spans #11304
Comments
I like the idea of having symbolic source-code references, but there are a lot of syntactic cases that will each need their own solution. The proposed path type (
Another thing to consider is that the LHS of a binding is not strictly an identifier; it is a pattern. I don't think the backend currently supports destructuring-bindings at all, but once that is implemented there won't always be a simple way to stringify the LHS of a binding. If the goal is storing metadata in a way that it is resilient to external edits, there's a simpler way: We can use the module source code as a map of itself. Include a snapshot of the module alongside the serialized metadata; then to load a module from disk:
This way we would preserve all metadata, anywhere in the AST. |
I think this is not much of a problem: the key of a given metadata may be just the entire binding - assuming every binding must introduce a variable, they would have to be unique anyway. As for subexpressions: I think we could just design a "breadcrumb" identification of widgets inside an existing node, which could be even a bit smarter than AST crumbs. The only real problem I see here are the "bindingless" nodes - but here we could make our graph requiring to give them a name when trying to assign any metadata (like position).
How |
Currently it tracks reordered lines, but not lines that are both reordered and mutated: enso/app/ydoc-shared/src/ast/parse.ts Line 890 in d1ee7fa
It would be straightforward to add binding-aware block comparison in order to handle reordered, mutated lines--easier I think than defining an addressing scheme that can identify any of the syntactic constructs we render as components, and any of their subexpression ASTs. |
Well, I think it sounds quite good to me. I would only make sure the code snapshot is "encrypted" for the user, so they won't edit the snapshot instead of the code by accident. Something sort of "compress + base64". |
Essential part of new meta-data format is identification of its version. It doesn't matter that the format isn't good enough for future evolution of the language/engine. Once it is found insufficient, we will define new format and change its version identification.
We are looking for a fast solution that allows users to edit |
@kazcw explained to me:
I see. Such a duplication goes against the attempt to make META-DATA section smaller. Making the enormous meta-data smaller was a huge driver behind We want to make sure the meta-data section is even smaller than right now (ideas described in #7989), not doubling the size of the user code. |
I think a compressed snapshot won't take as much. Also, the doubling of source code is ok for me - code files aren't particularly big after all. And, in files where every node has metadata attached (position, visualization, color...) it will be hard not to double the code size, actually. The problem we had was not that the metadata section doubled the size, but that it increased it two orders of magnitude. |
Agree - I'm not worried about making the metadata smaller than it is now. The previous version where it would be multiple kb for a small file was the problem. The most important goal of this change is to make it more resilient to external edits (we want to enable changing descriptions or editing in VS Code without losing all metadata). Adding versioning should also allow us to evolve it going forward which would be a great win. If we end up with something where a user could rename a variable in a text editor with find and replace, this would be a fantastic. The original suggestion of reffering to |
The |
This feels like a much larger piece of work than the original suggestion. @kazcw how long would take to implement this kind of approach (bearing in mind it couldn't interrupt your work stream you already have)? And presumably, other than us throwing it away later - the other approach wouldn't stop us doing it later. |
3 days, at most. |
Add a test case ensuring that the `Ast.syncToCode` algorithm is able to maintain AST identities when a binding and its reference are renamed. This is an important case, as mentioned here: #11304 (comment)
Here's my proposal. The idea is to add a layer to achieve edit-resilience, thus it is mostly orthogonal to other planned metadata improvements: Metadata format:
Writing a file (ydoc-server):
Reading a newly-opened file (ydoc-server):
|
Note from today's meeting: The engine should not start execution until after the ydoc-server has updated the metadata map (which might cause UUID changes), or determined that no update is needed. I think this the process I described above ensures this (but this should be kept in mind when implementing): The ydoc-server should process the file fully (and send any necessary edit to the backend) before sharing the AST with the GUI, requesting execution, or doing anything else with the source. |
Dmitry Bushev reports a new STANDUP for yesterday (2024-10-21): Progress: Updated the stored metadata with the new snapshot field. Testing in gui. It should be finished by 2024-10-28. Next Day: Next day I will be working on the #11304 task. Continue working on the task |
Dmitry Bushev reports a new STANDUP for today (2024-10-22): Progress: Working on the recovery logic. Updated file opening in Ydoc to read the stored metadata snapshot and compare the file contents. Testing in gui. It should be finished by 2024-10-28. Next Day: Next day I will be working on the #11304 task. Continue working on the task |
Dmitry Bushev reports a new STANDUP for yesterday (2024-10-23): Progress: Fixed an issue with editing the module contents. Finished the overall recovery logic shape. The logic seems to work in gui It should be finished by 2024-10-28. Next Day: Next day I will be working on the #11304 task. Continue working on the task |
Dmitry Bushev reports a new STANDUP for today (2024-10-24): Progress: Implemented saving the updated code snapshot after it was successfully recovered. Eliminated one redundant parsing of the source code. Working on the PR comments. Finished and undrafted the PR It should be finished by 2024-10-28. Next Day: Next day I will be working on the #11304 task. Continue working on the task |
Dmitry Bushev reports a new STANDUP for today (2024-10-25): Progress: Looking into the context management in the gui. Implemented hot reloading of the insight script. Created the PR. It should be finished by 2024-10-28. Next Day: Next day I will be working on the #11304 task. Continue working on the task |
close #11304 Changelog: - update: add `ide.snapshot` optional metadata field containing the source code of the file - update: `syncFileContents` method tries to repair the metadata spans when it detects that the source file was edited and the received code does not match the code stored in the `ide.snapshot` metadata field # Important Notes Tested in gui
User Visible Goal
Let the user "edit source files in an external editor" without completely breaking information persisted in the METADATA section. E.g. without loosing positions, color, etc. of nodes in the graph, state of widgets, etc.
Terminology
The META-DATA section consists of two lines and on the fly mappings that are part of
TextEdit
requests:UUID
s and their source locations - together with on the fly mappings it is calledIdMap
TextEdit
requests - together with the first line of META-DATA section forming so calledIdMap
UUIDs are important for language server protocol. IDE and the engine communicate via language server protocol and they use UUIDs to identify "locations".
Since #10182 the first META-DATA line contains only UUIDs that appear on the second line - e.g. are used by IDE to associate some persistent information with those locations. The rest of the UUIDs is generated on the fly - each
TextEdit
request can contain an on the fly additions to the first META-DATA line in the file forming newIdMap
.Constraints
That way we can satisfy the user goal without impacting the whole system and keep the change located just to the change of the META-DATA section format. Find two alternative solutions (classical and snapshot) to this problem in the next sections.
Snapshot: Persist Code Twice (obfuscated by base64)
Explained by @kaz at #11304 (comment)
Classical: AST Based Anchor & Local Text Spans
The metadata stored in files (by the IDE) currently only relates to nodes - e.g. to
var = expr
statements inside of method bodies. Let's use the AST path to such element as an anchor to identify a semantic location in the source code. E.g. each IDE node can be identified using its path in the AST tree. Let's use following format for the anchor:{method pointer}.{variable name}
. I.e.the
op1
node can be identified asmain.op1
.Local Text Spans
In addition to the above format of AST based anchor identification, we have to have a way to specify an exact location (just like the current system of absolute text spans does). To do so, let's support exact identification by relative text spans. E.g.:
[Span, UUID]
pairs (absolute span from the beginning of the file to UUID mapping),[AST Path, Span, UUID]
trippleAST Path
provides an offset neutral location inside of the AST - an anchor resilient to user edits (all but removal or rename of a method or its variable). The local text span allows to fine tune the location to any expression or element inside the nearest AST anchor.Can it Work?
UUID
in its protocol as usual[AST Path, Relative Span, UUID]
tripple remains stable with all the edits not related to the anchor itself or content up to next anchorYes, it is going to work.
The text was updated successfully, but these errors were encountered: