Investigate "document model" requirement #1

valiafetisov · 2023-02-15T10:33:32Z

Goal

Clarity on the requirement and technical solutions

Context

During the last meeting we were introduced with the requirement of the "document model". In order to properly implement it, we need to figure out:

Why there is such a requirement (what problem it tries to solve)?
What is the essential underlying functionality?
What is essential to start with and what can be deprioritized?

Notes: let's not discuss something very long-term in this issue to keep scope small, but just clarify things that might influence our initial setup/approach here.

Assets

Requirement description https://www.notion.so/makerdao-ses/Powerhouse-Document-Model-40af39a305374d0d99239ddf266cd18c

Tasks

Answer the questions above based on the available information
Raise more questions if needed

valiafetisov · 2023-02-15T10:39:48Z

Why there is such a requirement (what problem it tries to solve)?

From reading the document, I can take:

...Because business process modeling has almost entirely been ignored so far by the blockchain industry (with the exception of supply-chain modeling), there is a tremendous competitive advantage that can be gained...

The goal is to make platform that will be able to give insights into the business processes of the Maker. But is the scope of the MVP (at the moment) is only the database model + API on top, I assume the modelling can happen in the upper layer or later on in the development phase. The only requirement here for us is then to closely follow Domain-driven design pattern.

Document model Design Principles
self-contained data structures that capture critical business information. They can be edited, stored, and exchanged by DAO contributors (just like spreadsheets, Word docs, etc.).

From this I assume that we will also (in the long-term) need to create some kind of export/import format + the viewer of the document.

The Powerhouse document model also defines the business logic that defines how the content can be manipulated

This is a bit trickier requirement, it implies we should export operations (ie actual code) + processor (also code) together with the document static state. So we would need to export not only the data, but the model(s) related to that data. This is becoming especially tricky if we expect migrations and model changes (see more points below)

The document’s State is simply a structured data format, for example a JSON structure

The provided example is too simple, because it displays only one-to-one or one-to-many relations. If we to introduce any other relation, like many-to-many or many-to-one, we would end up with data duplications (eg comments + users, where several comments contains the same user object). Therefore, the proposed format only works as a representation of the data, not as a storage/exchange medium. We might want to use something like jsonapi spec instead.

Operations

This requirement reminds me of "operational transformation" algorithm used for the complex realtime collaborative systems (eg google docs). From the other side it also matches a description of the blockchain transaction "calldata". Both seems too complex technologies to achieve specified goal. If there is no other hidden problems this requirement tries to solve, I think a history of edits + data about who and when did those edits might be sufficient here. That would produce the same data as in the examples, but without "replay to get into the final state" functionality. I think "replay" also contradicts point about making database exports/read-only replicas accessible and useful for analysts familiar with SQL.

History, undo, redo, and pruning

Keeping history of operations to be able to replay them would require us:

To do migrations not only on the final state, but on all operations
To keep complete history of the processors code (which automatically turns this application into a private blockchain where every processor code is preserved along with data)

Document Processor

This is basically a smart contract on steroids (since usual contracts can't really work with complex data structures such as json objects). See my point above for the overall question about distributing code with data.

Open, Edit, Save, Send
create new documents and modify them in the document editor

I think it's a good idea to implement an auto-generated UI on top of the data structures that will be defined in our system instead of expecting every other system to reimplement the wheel. I would actually push this requirement on top in order to make MVP version usable end-to-end without external tools or curl commands. In the long-term we can maybe even provide a frontend plugin/iframe to easier integrate editing interface into other platforms.

store documents locally
send and receive documents

This requirement is doable in the long-term, if we would keep all logic out of the database and resolvers and keep it potentially operational inside the browser. I would even consider using sqlite instead of the postgresql, because it's actually faster and can work in the browser. This way, a private sqlite instance can be created in the browser to keep all the data. This is still a complex requirement, since it introduces immense complexity for the locally run migrations and makes data integrity a complex task.

Attachments

I generally don't see a problem with this requirement, although I assume it's a long-term one. For simple files with can start with base64 encoding them and later introduce more complex system with file upload as soon as we figure out "export/import" and "local editing" functionalities.

References

Many data exchange formats (like jsonapi spec linked above) seem to already support that. In the graphql worlds there is also a technique of prepending uuids with unique prefixes to make clear what type of record is referenced, essentially turning uuids into a form of reference.

Business Analysis Process
because input and output data are so clearly defined, software developers have the ideal specification format to quickly turn the business logic into production-grade code

Are we talking about the platform that let users write frontend-only code to get insights into the data (without the ability to modify the data?). Like observablehq.com? Then, I think, integrating the data with existing tools and providing integration examples will potentially be more useful than introducing a completely new application that someone have to spend time learning. Or is this UI suppose to be a "playground" to develop new operations/processors and test them over actual data before submitting it as a pull request? I think I'm missing concrete examples of the usage of such application to understand it. Or more generally put, what does this requirement try to solve?

Overall

Thank you for the doc @wkampmann, I think it clarifies a lot of points and gives a lot of insights into the future of the tool. Most of the things are not directly relevant to start working on the project, but useful to keep in mind. Main takeaways: we should aim to have UI already for the MVP, we must preserve history of the records from day 1 and keep in mind potential frontend-only/offline-first functionality of the future. Next, I will do technical research on the existing tools to come up with a framework that will let us accomplish all possible outcomes of this evaluation.

valiafetisov · 2023-02-20T07:24:47Z

@wkampmann, would you be able to bring some input on the raised questions? I think it will be most important to outline underlying non-technical requirements here, since it we will allow us to derive technical decisions from it. I.e.: what exactly is the problem that the proposed model is trying to solve?

valiafetisov · 2023-02-20T13:32:29Z

Also, please let me know here if I misinterpreted/misunderstood something on the page, while deriving requirements from it

valiafetisov · 2023-03-27T09:26:52Z

So the outcome of the investigation and based on the last meeting at 24.03.2023: document model requirement will be split into 4 parts:

Business level modelling (what operations each document should and should not allow) – responsibility of the SES
Business logic implementation (defining operations and their effects) – responsibility of the SES
- Roles and permissions handling
Business logic as the API (authentication, authorisation, storage, retrieval, etc) – done by us
Document editing UI (executing operations offline and interacting with the API) – responsibility of the SES

Current integration schema:

valiafetisov mentioned this issue Feb 20, 2023

Collect initial requirements #3

Closed

3 tasks

LukSteib mentioned this issue Mar 21, 2023

SwitchboardAPI: Milestone powerhouse-inc/powerhouse#359

Open

19 tasks

valiafetisov closed this as completed Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate "document model" requirement #1

Investigate "document model" requirement #1

valiafetisov commented Feb 15, 2023 •

edited by liberuum

Loading

valiafetisov commented Feb 15, 2023 •

edited

Loading

valiafetisov commented Feb 20, 2023

valiafetisov commented Feb 20, 2023

valiafetisov commented Mar 27, 2023

Investigate "document model" requirement #1

Investigate "document model" requirement #1

Comments

valiafetisov commented Feb 15, 2023 • edited by liberuum Loading

Goal

Context

Assets

Tasks

valiafetisov commented Feb 15, 2023 • edited Loading

Overall

valiafetisov commented Feb 20, 2023

valiafetisov commented Feb 20, 2023

valiafetisov commented Mar 27, 2023

valiafetisov commented Feb 15, 2023 •

edited by liberuum

Loading

valiafetisov commented Feb 15, 2023 •

edited

Loading