Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent Undo #2021

Open
jupart opened this issue Apr 30, 2018 · 21 comments
Open

Persistent Undo #2021

jupart opened this issue Apr 30, 2018 · 21 comments

Comments

@jupart
Copy link

jupart commented Apr 30, 2018

Vim has a feature called persistent_undo that is a nice quality-of-life feature I really like.

The implementation in nvim is like: It writes some kind of binary file to VIM_UNDO_DIR for each file. The undo files are written with their full paths separated by %, like %home%name%path%to%changed%file.c.

Does kak plan to support something like this one day? I think I'd like to implement a plugin for it either way, but I'm not sure where to start other than just to dig into other plugins.

@danr
Copy link
Contributor

danr commented May 1, 2018

Note: the undo history is not exposed to plugins right now so there is no way to support this as a plugin without changing the c++ source code as well.

@mawww
Copy link
Owner

mawww commented May 1, 2018

Exposing the necessary data so that a support script can take care of storing/restoring would be my preferred way. The problem is how to expose the undo data. This is related to the requests for a nice interface to the undo tree, we need to expose the undo tree data to the external/script world and let them do their thing (a graphical UI should be able to get the undo data and provide its own interface).

So, first steps would be to make it possible to get the undo tree from Kakoune. For that a text format must be defined. An undo tree is a tree of history nodes. history nodes can be reduced to a list of modifications and a timestamp. modifications are a tuple (insert/erase, buffer position, text content).

Second phase would be to make it possible to set the undo tree from a script. This is tricky because we need to validate that the undo tree is valid with the current state of the buffer.

@yshui
Copy link

yshui commented Jul 17, 2018

@mawww A history node should also contain its parent node. It is a undo tree after all.

And for validating, we can probably store and verify a checksum of the buffer.

@mawww
Copy link
Owner

mawww commented Jul 18, 2018

@yshui Right, the parent would be an index in the array. (so history node would be a tuple (parent index, timestamp, list of modifications).

Validating with a checksum is not safe enough, this is arbitrary data coming from an external source, Kakoune cannot trust it to be valid, it needs to either check it, or tolerate invalid undo data.

@yshui
Copy link

yshui commented Jul 18, 2018

@mawww I was thinking more about "how would the script figure out if the undo data is still valid" aspect.

I agree kakoune needs to check the undo data, but that's probably not difficult to do.

@mawww
Copy link
Owner

mawww commented Jul 26, 2018

Its pretty involved because each undo node in the undo tree applies to its parent buffer state, so in order to check that an undo node is valid, we need to compute its parent buffer state. Trying to do that without having to apply the modifications to the buffer each time might be a bit tricky to implement, I need to take a look.

For the script, it can do that however it wants, Kakoune will only provide an interface to set the undo tree, and complain if the undo tree is invalid based of the current buffer state. Its true that an eventual script doing undo tree restoration will likely use a checksum or similar to early validate the data its loading and notify the user if its out of date.

@dirkson
Copy link

dirkson commented Aug 4, 2018

+1 for this. After having persistent undo, I really can't imagine switching to an editor without it.

@eraserhd
Copy link
Contributor

OK, below is my ... well, third or fourth ... draft of design for a format for history. Thoughts?

(I'm planning on doing the coding, whatever the final design, FYI.)

Goals

  1. Provide all information necessary for a "Persistent Undo" plugin.
  2. Provide all information necessary for a tool like parinfer-rust to inspect
    all the changes to the buffer since it was last seen.
  3. Enable for "undojoin"-like functionality: easily amend the current history
    node.
  4. Be easy to parse from POSIX shell.

Nice to have:

  1. Use one variable, allowing history to be replaced with set-option.
  2. Have the format be compact, to avoid environment variable limits.
  3. Able to support cursor positions or timestamps for each node in the future.

Data Items

.HistoryNode

  1. Parent
  2. Redo Child
  3. Undo group (a vector of modifications)

.Modification

  1. Insert/Erase
  2. Buffer coordinate
  3. Text

Format

The data is represented in ( entity, attribute, value ) triplets. "Entity"
values are the ids of history items being described. "Attributes" are
the single-character identifiers r for redo child, m for modification, and
p for parent. Additonal attributes such as selections or timestamps can be
added at a later time.

 0 r 1                   # node 1 is the redo child for node 0
 0 m '+|1.2|hello world' # node 0 has a modification adding "hello world" at 1.2
 0 m '-|2.7|foo'         # node 0 has a modification removing "foo" at 2.7
 1 p 0                   # node 1 has parent 0
 1 r 2                   # node 2 is the redo child for node 1
 1 m '+|2.2|bar'         # node 1 has a modification adding "bar" at 2.2
 1 a _                   # 1 is the active history_id
 2 p 1
 ...

While possible to further normalize the data by introducing modification ids
and describing modifications with ( entity, attribute, value ) triplets, I
figure this inflates the size of the data unnecesarily: the structure of
modifications seems more fixed than the structure of history nodes, and since
it is only three fields without special cases needing more or fewer attributes
(like our history root), let's not pay the penalty for extensibility.

Modifications are represented as op|coord|text strings. text is at the
end so it can contain arbitrary text, including pipe symbols, and still be
easily parsed by shell, like so:

 op="${change%%|*}"
 change="${change#*|}"
 coord="${change%%|*}"
 text="${change#*|}"

@Screwtapello
Copy link
Contributor

Kakoune has an undo tree, not just linear history. Therefore, it doesn't make much sense for a HistoryNode to have "a redo child" since it may have many redo children. How about:

  • a HistoryNode has a single parent
  • HistoryNodes do not mention anything about their children; whoever's reading the data can reconstruct that information from all the parent links
  • As well as HistoryNodes, the history data should include:
    • the ID of the "tip" HistoryNode, the one that "redo" steps toward
    • the distance from node 0 to the current state of the buffer

For example, if I start with an empty buffer and do:

ihello<esc>i world<esc>ui planet<esc>u

...I would expect the history tree to look like:

0 ""
|
1 "hello"
|\
| 2 "hello world"
|
3 "hello planet"

There are four nodes (0-3) and two tips (2, 3). The current tip is the second one (1, in zero-based numbering), so the linear undo chain would be 0 → 1 → 3. The current state of the buffer is 1 step along that chain, so the history would also need to include:

t 1
s 1

Instead of storing the index of the tip, we could store the actual tip node ID, but what if the given node ID wasn't a tip? Instead of storing the index of the current state, we could store the actual state node ID, but what if the given node ID wasn't on the path from the origin to the selected tip? Doing it this way makes it easier for tools to validate history, since there's fewer things that can go wrong.

There should also be some kind of signature for the current buffer state, to guarantee we aren't loading undo-history for the wrong file, or a file that's been externally modified.

Another thing to think about: do HistoryNodes represent the states of a buffer (like Git commit IDs) or the transitions between states?

Lastly, I know mawww said (insert/erase, position, content), but I wonder if it wouldn't be simpler to model things as: (position, old text, new text):

  • to apply a change, instead of having to check an "insert/erase" flag and do different things, we can just delete the length of the old text and insert the new text every time.
  • to undo a change, just call the "apply change" function with the old and new parameters swapped.
  • this also allows us to model a change or replace operation in a single node, instead of combining an insert and a delete.

@eraserhd
Copy link
Contributor

On the point of redo child and modelling the “tip”:

one of my earlier drafts worked this way, but then I thought about this situation: you jump to a different, internal history node with some kind of undo-tree plugin. This node has multiple children and doesn’t appear on the path from the root to the redo tip. Then you type U to redo. What do we do? We can’t reach the previous tip from our new node, so we have no way of using that to figure out what redo must do.

In reality, there isn’t just one linear undo history, but as many as the tree has leaves, and Kakoune needs to decide what U means for every internal node.

Perhaps there is an algorithm? Maybe redo always travels towards the newest (most recently added) grandchild? Does this work if we jump then edit?

(I’m thinking about it more now, and I think it does work...)

Anyway, the “redo child” already exists in Kakoune’s data structures, which is why I modelled it here. If we can compute it, then we can remove it from the text format.

Anyway, way past my bedtime. I’ll think about the rest tomorrow.

@Screwtapello
Copy link
Contributor

... you jump to a different, internal history node with some kind of undo-tree plugin. This node has multiple children and doesn’t appear on the path from the root to the redo tip. Then you type U to redo. What do we do?

You got me to stop imagining how history might work, and actually look at the code. It turns out that Kakoune's internal state does indeed store a "redo child" for each node, and this value is essentially set arbitrarily as you wander the tree. Therefore, properly saving the undo history requires properly serializing the redo child data.

On the other hand, the "redo child" information will be mostly predictable (I bet most history nodes have exactly one child) and including a lot of redundant data violates "Have the format be compact". It also makes the format more difficult to parse, since the parser now has to validate that the redo_child is an actual child of the node in question.

How about this: we just record the history_id of the current buffer state, and don't record anything about "redo child" or "tip". At load time, we set all the "redo_child" fields to point at the highest-numbered child, except for nodes on the path from the root to the current state. That's simple, and only breaks if the user makes one change, undoes, makes a second change, uses <a-u> to go back to the first change, and undoes again.

Alternatively, do that, but record the "tip" (calculated by following "redo_child" fields as far as you can) as I described. Still pretty simple, and all the history nodes on the undo/redo chain will be correct at load time. Other history nodes will be "wrong", but the only way to get to them is via <a-u> and <a-U>. Those commands rewrite "redo_child" as they traverse the tree, and they can traverse the tree in an arbitrarily complex order (since they iterate by node ID rather than depth-first-search or whatever), so in general it's not a good idea to make assumptions about what "redo" will do after jumping to a history node, even today.

@eraserhd
Copy link
Contributor

I've created this design document, and I'm keeping it updated based on chats.

@pocomane
Copy link

The first step (exposing the history) seems done. Is there some news about the second step (make it possible to set the history)?

@tbodt
Copy link

tbodt commented Jul 14, 2022

Any reason this couldn't be part of kakoune core instead of having to be a plugin? It might well be easier to implement that way

@mawww
Copy link
Owner

mawww commented Jul 19, 2022

@tbodt The hard problem to solve is how to validate the data, this is expected to be provided by a plugin so that we can just provide the necessary command to load history and not bother about where/how it is stored, when to save, etc...

I think the design space is just too wide for this to be built-in, and we would not gain that much from it as we would not be able to trust whatever file we load anyway.

@pocomane
Copy link

pocomane commented Jul 24, 2022

What about minimal checks at set time, and coherence check at runtime? I mean:

  • When a plugin sets the history, just check for cycles in graph; that should be simple with some "Every parent ID (and timestamp) must be lower of the child one" policy.
  • When the user asks to undo a deletion step, do no check anything since they are always consistent with the buffer
  • When the user ask to undo an insertion step, check that at the buffer position there is the expected text; if not, truncate the history branch at current point, and signal an error (maybe blaming the plugin).

These my 2 cents... somebody here knows how vim solves the issue?

PS. I have only read the Change API doc, so sorry if the actual format is more complex.

@tbodt
Copy link

tbodt commented Jul 24, 2022

vim doesn't allow setting the undo tree from a plugin, it just loads from a file which it assumes it was the one to create. So no need for validation.

Edit: I believe if you change the file without using vim while vim is closed, the history is not loaded the next time you open vim. So there is minimal validation, checking that the file contents are as expected.

Please note however that I don't actually know how it works and am just attempting to explain my intuition gained from using the feature.

@pocomane
Copy link

pocomane commented Jul 25, 2022

Loading from a file imho have the same problematic of allowing external plugin to set the info. The file can be corrupted, can be edited outside the editor (as @tbodt said), can be accessed from a plugin (on vim I think it is possible).

I made a quick test: coping an old version of vim undo file back, after some mofication, leads to an empty history. This mean, I think, that some sort of validation is done, since if I simply change the modification timestamp the history is kept.

However I got another "Quick and Dirty" idea. Let the plugin be able to set the history only if the history is correctly signed. The history hash/sign/checksum is provided by kak when a plugin read the current history (and obviously is valid only for that history).

In this way a plugin will be able to set the history only providing a value that was previously generated by kak. A bit limitating, but maybe simple to implement. And practically should reduce the problem to the same scenario in which the core editor is the one in charge to store the history files.

EDIT - I realized that the plugin may still set an history taken from another buffer or an old one not compatible anymore with the buffer. To avoid this cases, a checksum of the current state of the buffer may be added to the signing process.

@Screwtapello
Copy link
Contributor

Signing the data doesn't actually prove the data makes sense, just that the data was written out by some version of Kakoune (or some tool that copied Kakoune's signing algorithm). Even without any malicious activity, some version of Kakoune may have a bug that makes it produce invalid history data which gets signed correctly, and other versions of Kakoune need to reject that data rather than crash or corrupt the user's documents.

@pocomane
Copy link

@Screwtapello, an editor that handles the persistent undo file by itself can have the same issues (an old version, due to a bug, produces a wrong file for the new version, etc). The only difference is that a malicious plugin can reproduce the signing. But I think kak must just be sure that a plugin do not mess up the history by mistake, not that a malicious plugin can not do nothing bad. I think that a malicious plugin can already do a lot more bad things right now (e.g. delete the buffer or try to remove all the files in your home, recursively).

I agree that kak must not crash for a wrong history, but this is different from full validation, right?

However, after all, I would prefer the other system I talked about: cycle detection at load time + consistence check at "undo time". To those who know the kak internals: is this doable with a reasonable effort? Would it work?

@Screwtapello
Copy link
Contributor

Now that #4777 has been merged, and history is a list of items rather than a tree, would it make it easier for Kakoune to validate an alleged undo history provided by a plugin?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants