Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop modelDB and use Yjs directly #253

Open
10 of 72 tasks
davidbrochart opened this issue Jun 27, 2024 · 15 comments
Open
10 of 72 tasks

Drop modelDB and use Yjs directly #253

davidbrochart opened this issue Jun 27, 2024 · 15 comments
Labels
enhancement New feature or request

Comments

@davidbrochart
Copy link

davidbrochart commented Jun 27, 2024

Problem

JupyterLab has an abstraction layer called modelDB in the observables package, which currently serves as a proxy for the Yjs shared types. At the time real-time collaboration was being implemented in JupyterLab using Yjs, there was the question of whether to drop modelDB or keep it as an abstraction layer on top of Yjs, and it was decided to keep it so that another CRDT backend could be used, if needed. There are a number of issues with that:

  • Yjs has features that modelDB doesn't have, which therefore cannot be used, for instance:
    • deep observability,
    • transaction support.
  • since data lives in both models, memory is duplicated.
  • since there is a double binding between modelDB and Yjs, special care must be taken in order to avoid circular dependencies.

All of this results in additional complexity, for really no benefit because JupyterLab already depends on jupyter-ydoc which depends on Yjs, so JupyterLab is not CRDT-agnostic (although I agree the CRDT-specific part is well contained in jupyter-ydoc).
Additionally, as Jupyter is moving towards server-side execution, new issues start to appear because the modelDB abstraction doesn't exist in the backend, where we directly use pycrdt (Python bindings to Yrs, the Rust port of Yjs). Therefore we cannot communicate at the modelDB level between the backend and the frontend.
Finally, the possibility of using another CRDT library in the future is getting smaller as Yjs is getting more and more popular. The only real alternative is Automerge that a team of JupyterLab developers evaluated and which was eventually discarded.

Proposed Solution

I propose to drop modelDB altogether from the JupyterLab code base, and directly use Yjs instead.
See jupyterlab/jupyterlab#16481 for more discussions.

@jupyterlab/jupyterlab-council Let's vote:

EDIT: I added @RRosio, @SylvainCorlay and @Zsailer to the list of voters, according to this list.

@fperez
Copy link

fperez commented Jun 27, 2024

Please take my comments with very limited value as I'm not deeply knowledgeable about this specific part of the codebase, and I may well be missing an important conterpoint.

But a while back in Jupyter (before the name even existed, back when it was all IPython) we tried to really stick to a YAGNI principle, as we were bitten by premature and unnecessary generalizations. A clear example was our early adoption of a concept of "pages" in the notebook format without any UI to actually support it. Eventually there was an Emacs frontend that supported that feature, and it wasn't great when we decided to remove it and offer other simpler navigation functionality, as it meant deprecating something that (small, but still important) user community might have come to rely on.

This seems like a perfect opportunity to apply that stricter principle and remove this otherwise unnecessary layer that is having current real costs for a supposed future benefit that we may never actually realize/need.

@jtpio
Copy link
Member

jtpio commented Jun 28, 2024

Thanks @davidbrochart for opening the issue.

As discussed in jupyterlab/jupyterlab#16481, one of the main challenges with this proposal will be the breaking changes it will introduce, and the need for a new major release of the related package(s).

Posting my comment from jupyterlab/jupyterlab#16481 (comment) here for visibility:

We should be very careful about this, because extension authors already had to migrate their extensions twice in a rather short period of time with JupyterLab 3 (released in December 2020) and JupyterLab 4 (released in May 2023). For lab 3 the main change was about the packaging of extensions. But lab 4 came with a long list of breaking changes in various packages: https://jupyterlab.readthedocs.io/en/latest/extension/extension_migration.html#jupyterlab-3-x-to-4-x

Based on the various discussions in the last dev meetings, it sounded like the general feeling was to wait as long as possible before making breaking changes that would have a direct impact on extension developers. And avoid extension development fatigue and extension compatibility issues for end users (with extensions only compatible in an old version of lab).

If breaking changes can't be avoided, maybe there could be a way to improve the extension migration script to automatically perform the necessary changes on an extension code base. This would help extension authors migrate their code more easily with the less friction.

Also, it would be useful to provide some examples of the kind of changes extension authors should expect, for example in a form of a before / after or diff.

@krassowski
Copy link
Member

krassowski commented Jun 28, 2024

I very much support @jtpio. I am afraid that some folks my not appreciate how dire the situation is and how exhausted extension authors are with constant breaking changes, and users with the extensions not working correctly or at all in the supported versions.

I am also afraid that this can increase the learning curve for developing extensions which interact with documents. Already with the previous migration changes were made without documenting how to do basic interactions with the documents, nor proper migration guidance:

Edit: and there are outstanding issues that broke things in editors severely degrading the UX for me that have not been fixed since, like:

After this change, an extension author will need to learn one more thing which is the internals of yjs, and what yjs is in the first place. Right now they are not required to do so.

I will likely be voting "No" but not because I oppose this change in principle and will oppose it always - I think that the advantages of the change do no outweigh the risks at this time, this can change if we come back to this in a years time for example.

Quoting from jupyterlab/jupyterlab#16481 (comment)

For instance with Yjs, changes to a nested shared type can be observed: [...] But not with ModelDB: [...]

In principle this could be solved by extending the ModelDB, right? We could have changedDeep signal as well.

@davidbrochart
Copy link
Author

In principle this could be solved by extending the ModelDB, right? We could have changedDeep signal as well.

Maybe but it's not the only missing feature.

@echarles
Copy link
Member

A few years ago, we already had those discussions, and luckily we have today a ydoc package that proves that the abstraction approach makes sense. Other projects (SyncedStore...) outside of JupyterLab in the meantime have also successfully taken the abstraction approach, creating collaborative data structures, or distributed state management solutions, on top of Y.js.

To be coherent, this proposal should be extended to the ydoc package. ALL the collaborative data structure of JupyterLab should be architectured the same way. Therefor, this proposal should be simply be closed and reopened covering both the observable and ydoc packages, otherwise JupyterLab core and extensions developers will have to deal with an architectural bicephal implementation and thinking.

Also, it would be useful to provide some examples of the kind of changes extension authors should expect, for example in a form of a before / after or diff.

I was thinking to ask the same thing on the initial discussion but suddenly has been taken in this vote process... anyway, thank you @jtpio, this will help voters to make sense of the expected changes. Would it be possible to give 2 examples: a simple use case, and a complex use case?

As additional information, could you also please list the impacted packages and identify the impact on the external jupyterlab repositories (including the non-official extensions), this will help voters.

This seems like a perfect opportunity to apply that stricter principle and remove this otherwise unnecessary layer that is having current real costs for a supposed future benefit that we may never actually realize/need.

@fperez this proposal is about removing the abstraction for the observables package only, and keeping it for ydoc, which brings questions about the validity of the raised arguments. This proposal would lead to bicephal code which does not make sense to me.

In principle this could be solved by extending the ModelDB, right? We could have changedDeep signal as well.

@krassowski Like you and @jtpio I am much worried on the impacts for JupyterLab third-party consumers and extenders. You may be answered with a some crafted migration tooling (the complexity of the current tooling is btw the biggest complaints I hear), release and communication plans, by this proposal supporters.

My strongest concern is however the proposed architecture. It looks to me that instead of making better an existing good and working thing, it it proposed to drop it and replace it with something weaker and bringing others and more issues, leading to non-coherent implementations (raw yjs for observables, abstraction layer for ydoc).

I am voting -1.

(to be clear, even if this proposal would cover both observables and ydoc, I would be voting -1 as favoring abstraction for that specific aspect).

@davidbrochart
Copy link
Author

we have today a ydoc package that proves that the abstraction approach makes sense

It only proves that we made it work with the modelDB abstraction, but as I said there are a number of issues with that.

To be coherent, this proposal should be extended to the ydoc package

Of course I would also drop modelDB in jupyter-ydoc.

@fperez this proposal is about removing the abstraction for the observables package only, and keeping it for ydoc

You must have misunderstood me, jupyter-ydoc must be updated accordingly, as any other extension that depends on modelDB.

it it proposed to drop it and replace it with something weaker and bringing others and more issues, leading to non-coherent implementations (raw yjs for observables, abstraction layer for ydoc).

How is Yjs weaker than modelDB? What issues does it bring?
Again, it seems obvious that jupyter-ydoc won't have any abstraction layer anymore.

@echarles
Copy link
Member

@davidbrochart
Copy link
Author

Of course not, but YNotebook would not use modelDB anymore.

@echarles
Copy link
Member

echarles commented Jun 28, 2024

Turning round again and again... the arguments you are using to justify the change you propose don't make sense to me and this change is just driving to bicephal implementations (raw yjs in some case, and abstraction in other case). I will further abstain on this thread and your replies to avoid cluttering the discussion.

@davidbrochart
Copy link
Author

Sorry if I'm not clear enough. I just want to drop the modelDB abstraction, both in JupyterLab and jupyter-ydoc. No abstraction layer anymore, I hope that this is clear.

@fcollonval
Copy link
Member

fcollonval commented Jun 28, 2024

teasing on

We can actually close this one and jupyterlab/jupyterlab#16481 too. As it is solved by jupyterlab/jupyterlab#12695 + jupyterlab/jupyterlab#13168 + jupyterlab/jupyterlab#13247

for those in doubt we are still using ModelDB (and we don't), just search the code.

teasing off

When refactoring the API to ease the integration of the shared models, Carlos and I after various discussions and reading of various issues and PRs decided to create ydoc to have a semantic API hiding YJs to ease the understanding of the developers. As any abstraction, it can definitely be improved and no solution are provided to deal with advanced features of YJs like transaction (although that should not be needed if the semantic API is well done) or the undo manager (that one can be tricky).

The data duplication between a JupyterLab model and a shared model is still happening for the cell output model and the attachments (search for globalModelDBMutex). This was left as is by lack of time and to not block 4.0.

If changing those parts (and the outputs is probably what you @davidbrochart have in mine) is the goal of this, then drafting a PR to see the extend of API breaking changes sound reasonable and not too expensive.

Regarding using Yjs directly, I'm strongly against it because there are lots of subtle things about it and having a semantic layer as ydoc is much better for developers.


Note

ydoc does not depend on JupyterLab objects; hence not on the observables package.

    "dependencies": {
        "@jupyterlab/nbformat": "^3.0.0 || ^4.0.0-alpha.21 || ^4.0.0",
        "@lumino/coreutils": "^1.11.0 || ^2.0.0",
        "@lumino/disposable": "^1.10.0 || ^2.0.0",
        "@lumino/signaling": "^1.10.0 || ^2.0.0",
        "y-protocols": "^1.0.5",
        "yjs": "^13.5.40"
    }

@davidbrochart
Copy link
Author

davidbrochart commented Jun 28, 2024

When I say using Yjs directly, I don't mean getting rid of jupyter-ydoc's API, but not using the observables package as an abstraction layer on top of Yjs in JupyterLab.

@ellisonbg
Copy link
Contributor

ellisonbg commented Jun 28, 2024

Overall, I am in favor of this proposed direction, but it sounds like there are still technical details to work through. However, I also agree with others comments about the need for us to be careful about breaking APIs. @davidbrochart could you update the proposal with more information about what parts of this work can be done in the 4.x series of releases (backwards compatible) and which parts would need to go into major releases (presumably 5.x). For example, can we switch to having Yjs be the single source of truth for all of our own code, but still offer a ModelDB compatibility layer for the rest of the 4.x series? I am generally aware of a number of other backward incompatible changes that are needed across different JupyterLab and Jupyter Server repos, so we can't avoid breaking APIs forever. It maybe helpful for planning purposes for us to begin mapping out upcoming breaking changes, so we can better assess the timing and tradeoffs.

@fcollonval
Copy link
Member

When I say using Yjs directly, I don't mean getting rid of jupyter-ydoc's API, but not using the observables package as an abstraction layer on top of Yjs in JupyterLab.

Is it the observables or the Lumino signals you want to get ride off? If this is the Lumino signals, I'm strongly against using two notifications mechanism. Lumino signals is the JupyterLab standard way to connect code pieces since the beginning of JupyterLab and we should avoid having multiple communication mechanisms in public API.

@davidbrochart
Copy link
Author

Is it the observables or the Lumino signals you want to get ride off?

Only the observables package. Yjs doesn't have the equivalent of signals. But the changes emitted through these signals would have the Yjs delta format, not the observables change format. This is already what we do in jupyter-ydoc, for instance here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants