-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rfc for supporting tables in excalidraw #1
base: main
Are you sure you want to change the base?
Conversation
active-rfcs/0000-tables.md
Outdated
{ | ||
"id": "cell-1", | ||
"row": 0, | ||
"column": 0, | ||
"x": 100, | ||
"y": 100, | ||
"width": 100, | ||
"height": 50, | ||
"text":"Cell 1", | ||
"fontSize": 16; | ||
"fontFamily": 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwelle, as per my understanding, can you confirm if this is what you meant by virtual
elements? These elements would be drawn at the time of rendering so there will be no existence of actual separate text elements
@Mrazator I have tried incorporating some of your suggestions and answered some of the queries in the updated RFC. Could you elaborate on what you meant by |
Sure, the flat array of keeping all the cells and the related metadata seems quite brittle & inflexible long-term. I would aim at lifting most of the cell metadata above, so it's easily accessible (table dimensions, row order, column order, etc.). It should make the data also more understandable and flexible, so we don't have to iterate through all the cells and calculate the mental model for each potential use case - swap rows/columns (more below), column sorting, ensuring table integrity (no duplicate rows/columns), increasing/decreasing table dimensions, etc. The example given was about the common drag-and-drop swapping behaviour of being able to pick a row/column and drop it into a different position within a table (see below examples from Notion and Notes). With a flat hierarchy (one array) to keep all the cells, such an operation would rely on a sequential search going through all the cells. Screen.Recording.2024-07-15.at.12.27.07.movScreen.Recording.2024-07-15.at.12.28.48.movOne simple way of improving this is to have a matrix (simplified): {
header: ["A", "B", "C"],
rows: [
[1, 2, 3],
[4, 5, 6]
]
} This way you can easily perform the following:
Such a format is probably fine for import/export or even simple single-player scenarios, but it raises issues in multiplayer, affecting both I will post an update here once I sort my thoughts out. |
active-rfcs/0000-tables.md
Outdated
|
||
Yes we will need write custom logic to handle reconcilation in `tables`, mainly attaching `timestamp` to each cell and writing a custom algorithm to resolve the conflicts and ensuring latest changes are applied to all the collaborators. | ||
|
||
### Storing cells with respect to rows to ease reordering of rows and columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Mrazator as per your suggestion, I have added an alternative approach for taking care of swapping rows and columns.
I have added index
and lastUpdated
fields to resolve conflicts during collaboration.
Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that in order to support concurrent changes out of the box, we cannot rely on an array at all and thus we would have to replace the whole thing with objects, which gets ugly pretty quickly. Very rough format idea:
// initial state
{
header: {
"column_A": { value: "A", index: "a0", isDeleted: false },
"column_B": { value: "B", index: "a1", isDeleted: false },
"column_C": { value: "C", index": "a2", isDeleted: false },
},
rows: {
"row_0": {
cells: {
"cell_0": { value: 1, index: "a0" },
"cell_1": { value: 2, index: "a1" },
"cell_2": { value: 3, index: "a2" },
},
index: "a0"
isDeleted: false,
},
"row_1": {
cells: {
"cell_0": { value: 4, index: "a0" },
"cell_1": { value: 5, index: "a1" },
"cell_2": { value: 6, index: "a2" },
},
index: "a1",
isDeleted: false,
},
}
}
That is all based on simple scenarios, which modify columns & rows (notice the likely necessity for soft deletions - isDeleted
).
A) To the table defined above, "A" added a row [7, 8, 9]
, "B" removed a column "C", "A" performs undo, "B" performs undo
B) To the table defined above, "A" added a column D
, "A" performs undo, "B" added a column "E", "A" performs redo
Questionably, similar soft deletions might be / might not be needed on the cell level, but I still didn't go through all the possible cases.
Btw. I think we will likely want to avoid relations with created with boundElements
, as those introduced a lot of challenges during multiplayer undo-redo, so I am guessing it would end in similar cases here. For details check the conflict resolution of relations defined with boundElements
as part of the History
tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mrazator in the nested tree structure above, unless the object order is significant, you'd need to sort by index
first during render if you didn't want to keep explicit x
& y
(and potentially width
& height
).
At that point we can go with a flat structure and it'll have similar perf profile (except you sort once, but longer array which might be slower depending on number of cells).
Ideally we get rid of sorting.
Below, I'm using a flat structure, but need to track x
/y
/width
/height
.
{
"type": "table",
"cells": {
"cell-id-1": {
"id": "cell-id-1",
"row": 0,
"column": 0,
"x": 0,
"y": 0,
"width": 100,
"height": 50
},
"cell-id-2": {
"id": "cell-id-2",
"row": 0,
"x": 100,
"y": 0,
"column": 1,
"width": 100,
"height": 50
}
}
}
If we want to get rid of x
& y
, we'd have to keep the auxiliary rows
mapping as @Mrazator suggested).
As for width
& height
, IMO we need to track that regardless, else we'd need to compute it dynamically from the text content and find a maxHeight
for each row, an maxWidth
for each column, before we could render any cell.
But by tracking these, a simple change in dimension for once cell would cascade for all the cells in given row/column based on what dimension changed.
If we track x
& y
as well, updates would cascade even more.
We would need to sync on property-level instead of element-level.
As for isDeleted
, I'm still not convinced we need it for either.
active-rfcs/0000-tables.md
Outdated
{ | ||
"id": "table-id", | ||
"type": "table", | ||
"x": 100, | ||
"y": 100, | ||
"title": "Table Title", | ||
"rows": [{ | ||
"id": "row-1", | ||
"index": 0, | ||
"lastUpdated": 1633257617000 | ||
"cells": [{ | ||
"id": "cell-1", | ||
"x": 100, | ||
"y": 100, | ||
"width": 100, | ||
"height": 50, | ||
"content": "Cell-11", | ||
"timestamp": 1633257618000 | ||
}, | ||
{ | ||
"id": "cell-2", | ||
"x": 200, | ||
"y": 100, | ||
"row": 0, | ||
"column": 1, | ||
"width": 150, | ||
"height": 50, | ||
"content": "Cell-12", | ||
"timestamp": 1633257619000 | ||
} | ||
]}, | ||
{ | ||
"id": "row-2", | ||
"index": 1, | ||
"lastUpdated": 1633257620000 | ||
"cells:[{ | ||
"id": "cell-3", | ||
"x": 100, | ||
"y": 150, | ||
"width": 120, | ||
"height": 60, | ||
"content": "Cell-21", | ||
"timestamp": 1633257621000 | ||
}, | ||
{ | ||
"id": "cell-4", | ||
"x": 220, | ||
"y": 150, | ||
"width": 130, | ||
"height": 60, | ||
"content": "Cell-22", | ||
"timestamp": 1633257617000 | ||
|
||
} | ||
}], | ||
"columns:": [{ | ||
"id": "column-1", | ||
"index": 0, | ||
"title": "Column 1", | ||
"timestamp": 1633257617000 | ||
}, | ||
{ | ||
"id": "column-2", | ||
"index": 1, | ||
"title": "Column 2", | ||
"timestamp": 1633257617000 | ||
}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the discussion here so it reflects all the fields (lastUpdated,
timestamps
index` and more) and the high-level approach.
I agree using arrays will impact the perf especially when it comes to lookup so moving to objects will be a better option hence here is the proposed structure 👇🏻
{
"id": "table-id",
"type": "table",
"x": 100,
"y": 100,
"title": "Table Title",
"cells": {
"cell-11": {
"id": "cell-11",
"width": 100,
"height": 50,
"content": "Cell-11",
"rowId": "row-1",
"columnId": "col-1",
"timestamp": 1633257618000
},
"cell-12": {
"id": "cell-12",
"row": 0,
"column": 1,
"width": 150,
"height": 50,
"content": "Cell-12",
"rowId": "row-1",
"columnId": "col-2",
"timestamp": 1633257619000
},
"cell-21":{
"id": "cell-21",
"width": 120,
"height": 60,
"content": "Cell-21",
"rowId": "row-2",
"columnId": "col-1",
"timestamp": 1633257621000
},
"cell-22": {
"id": "cell-22",
"width": 130,
"height": 60,
"content": "Cell-22",
"rowId": "row-2",
"columnId": "col-2",
"timestamp": 1633257617000
}
},
"rows": {
"row-1": {
"id": "row-1",
"index": 0,
"cellIds":["cell-11", "cell-12"],
"lastUpdated": 1633257617000,
"isDeleted": false,
},
"row-2": {
"id": "row-2",
"index": 1,
"cellIds":["cell-21", "cell-22"],
"lastUpdated": 1633257620000,
"isDeleted": false,
}
},
"columns:": {
"column-1": {
"id": "column-1",
"index": 0,
"title": "Column 1",
"timestamp": 1633257617000,
"isDeleted": false,
},
"column-2": {
"id": "column-2",
"index": 1,
"title": "Column 2",
"timestamp": 1633257617000,
"isDeleted": false,
}
},
- Each row has a mapping of
cellIds
to identify which cells are mapped to that row. - Depending on whether the virtual text elements approach works out, the
content
field will be replaced withboundTextElementIds
/boundTextId
to keep it simple and thetimestamp
field will be removed as well from each cell. Can rename thecontent
totext
to keep it simple. - As per
isDeleted
for tracking row and column deletion we will need it. As perisDeleted
for each cell, I am not sure if we will need it so can revisit this later. - As per
width
andheight
, we need it as mentioned in RFC since each cell may have a different dimension - As per
x
/y
, sincecellIds
are there for each cell so we can compute the coordinates for each cell withcellIds
andwidth
andheight
- I have kept
rowId
andcolumnId
for each cell to identify the rows and columns for quick look up when cell updated since now it's a flatten structure. Can remove later if not needed but most likely will be needed during updates.
If the above structure looks good, I will get started on implementation and we can explore the scenarios on the way.
cc @Mrazator @dwelle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Each row has a mapping of cellIds to identify which cells are mapped to that row.
do you anticipate we'll be doing lookups from cell -> row/column to warrant the expanded data structure of rows: {}, columns: {}
instead of simple rows: string[][]
@Mrazator suggested?
EDIT: Actually yes, during local edits so we can easily update applicable row + column cells when source cell dimensions change. Though I assume we could do array lookup or in-memory cache and it'd be fine, too.
The way I look at it, the renderer will go row by row and look up the cell by id to render it (sow row -> cells), instead.
3. As per
isDeleted
for tracking row and column deletion we will need it. As perisDeleted
for each cell, I am not sure if we will need it so can revisit this later.
Actually scratch my previous comment, as I was considering isDeleted
flag only from the perspective of the history. We will need the flag for collab/reconciliation. So we'll need it for each cell separately.
But why track it for whole columns/rows?
4. As per width and height, we need it as mentioned in RFC since each cell may have a different dimension
True, didn't consider users can make the columns/rows bigger than the text itself. Although perhaps the MVP doesn't need to support manual resizing (but it probably won't be that hard to implement).
Now that I'm thinking about it more, we'll definitely need per-property syncing & reconciliation, otherwise if someone is editing two different cells in the same row/column, there would be conflicts such as when one user writes enough text it expands the row height, while the other user edits a different cell in that row.
Row/column reordering edits would have similar conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually scratch my previous comment, as I was considering isDeleted flag only from the perspective of the history. We will need the flag for collab/reconciliation. So we'll need it for each cell separately.
Why for each cell? Users can delete a row or a column but not individual cell
But why track it for whole columns/rows?
To track the deletion of entire row / column and this will be used in undo/redo and collab reconciliation as well.
Now that I'm thinking about it more, we'll definitely need per-property syncing & reconciliation, otherwise if someone is editing two different cells in the same row/column, there would be conflicts such as when one user writes enough text it expands the row height, while the other user edits a different cell in that row.
Per property syncing as in? With the help of index
, lastUpdated
in row and column, and timestamp
for each cell we could come up with a reconciliation algorithm (theoretically as I think but will be more clear once implemented)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why for each cell? Users can delete a row or a column but not individual cell
..
To track the deletion of entire row / column and this will be used in undo/redo and collab reconciliation as well.
Yeah nvm, I didn't think this one through.
Per property syncing as in? With the help of
index
,lastUpdated
in row and column, andtimestamp
for each cell we could come up with a reconciliation algorithm (theoretically as I think but will be more clear once implemented)
So that the remote client doesn't throw away someone's text
update just because someone (e.g. themselves) have updated width
of that cell more recently due to table reflow.
But unless we want to keep separate version
for each respective attribute (or ones we deem should be synced per-attribute), we'll have to come up with something else, such as in-memory versioning for attributes. But that feels brittle and would sidestep element.version
entirely, adding more complexity.
Or we can ignore it, but then we should come up with a data structure which requires the least updates of neighboring cells during reflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I get the point, this is a valid use case, I have updated the RFC with the latest structure, queries/updates so we can start the implementation.
As per the syncing, I will come back to this later and we can decide how we want to go about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, most likely we will have to add per-property syncing for tables as @dwelle was also suggesting.
Each cell property can have a timestamp
property as well and the conflicts will be resolved based on that to ensure all collaborators' updates are applied correctly and there isn't randomness of who wins.
Let's say user A updates the width of the cell and user B updates the text of the cell, so the updates would look like 👇🏻
{
"id": "cell-11",
{
width: 60,
timestamp: 1633257618000
},
{
height: 50,
timestamp: 1633257616000,
},
content: "Hello",
timestamp: 1633257616000,
}
}
And updates of user B would look like 👇🏻
{
"id": "cell-11",
{
width: 60,
timestamp: 1633257616000
},
{
height: 50,
timestamp: 1633257616000,
},
content: "Hello",
timestamp: 1633257618000,
}
}
Now each property timestamp will be compared and one with a greater timestamp value will be applied for that property
width -> User A wins
height -> Both are equal
content -> User B wins
However, I strongly think we are going towards OT so most likely we might want to try out OT (Operational Transform) for tables to resolve the concurrent conflicts efficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot rely on regular timestamps (in such format) for synchronisation purposes (in general in a distributed system), as you cannot ensure the clocks on the client are in sync. Instead, lamports clocks (~version
) and their combinations with versionNonce
/clientId
or in the form of HLC
are usually used. Also for a traditional OT, you would need a server, which does not go well with E2EE and local-first principles.
I am also not persuaded we need syncing on the properties of the individual cells, as it will become unnecessarily complex not just for reconciliation, but also for the multiplayer history. Instead, as I suggested above, I would aim at lifting most of the metadata from the cells up. For example, if the width
is defined in a column and the height
is in the row
, you don't need property-based syncing on the cell.
Surely there will be more properties on the cell level like fontFamily
, but if we survived with element-based sync on all the other elements till now, we should be good to go for the cells as well. I would either attack property-level syncing before or after we have tables, but not as part of the tables and not just suddenly for the cells.
In its simplest form, when it comes to concurrent updates to cells, it would look something like:
User A updated cell-11
:
{
"id": "cell-11",
"originalText": "Hello from A",
"fontFamily": 1,
"version": 100
}
User B updated cell-11
:
{
"id": "cell-11",
"originalText": "Hello from B",
"fontFamily": 2,
"version": 100,
}
As we got used to now, since versions are the same, we take the one with a higher versioNonce
. We could have versionNonce
specified alongside the version
. But ideally, as mentioned above. versionNonce
should be the same for all the captured updates, so that one client consistently wins over the other, and the table data won't just become a mess.
This means we could just have one versionNonce
on the whole table
level, and in case of conflicts, take all the updates from the table with higher versionNonce
. I.e. if User A update generated a versionNonce
: 111 and User B generate a versionNonce
: 222, User B should win for all conflicting changes - cells with the same version. Similar versioning can be applied to rows
and columns
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a combination of version
and versionNonce
is what we use now as well.
For dimensions
we can have width
defined on column
and height
defined in row
and then the top level version and versionNonce combination should work well (apart from selecting random client when version is same and whichever version nonce is higher, but thats ok as we do that now as well)
Even if I ignore text-related properties (font size, font family etc for now where we need property-based syncing), later when we have full-fledged WYSIWYG, more properties on the cell will be added and syncing could become more challenging unless we check each type of property update or have some property based syncing algorithm - each cell has a version and version nonce (That's what feels theoretically rn).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rich text will be another challenge, but IMHO should not be done sooner than properties level sync. Once we have sync on properties, we will need to re-think the whole text-level reconciliation and after that, we will likely do something like https://www.inkandswitch.com/peritext/ for rich text (either ourselves or partially reuse an existing CRDT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got me thinking, what's stopping us from a flat hierarchy (elements JSON) with the additional row
element type?
- conceptually enforces a parent-child
table
>row
>cell
hierarchy (runtime) - loading each element into an in-memory
Map
would rely on its insertions order (~ordering based onindex
just on init) and provide 0(1) lookups - we could keep a plain-old text element for cells (hence once we tackle rich-text / text-level-reconcilliation, tables would be solved automatically)
- reconciliation would stay more or less untouched (for now)
- multiplayer history would mostly be provided out of the box - at least until we recognize some conflicts worth tweaking
- render step would skip rendering of all rows/cells (text elements) and render it with the table element (similarly as we do with bound text now) - hence keeping elements' order denormalized (similar to
frames
>frame childs
)
RFC for excalidraw/excalidraw#4847