Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data/Schema scoping #2

Open
valadas opened this issue Sep 17, 2021 · 11 comments
Open

Data/Schema scoping #2

valadas opened this issue Sep 17, 2021 · 11 comments
Labels
enhancement New feature or request

Comments

@valadas
Copy link
Member

valadas commented Sep 17, 2021

Currently, all schemas and the data they contain are global for the whole instance.

I believe the solution only makes sense if we can at least be able to scope it by portal. A PortalId field here would do it for the data but not for the schema. One site may want a Contact table that has 3 fields and no relations but the other site also want's a Contact table but with 10 fields and 3 relations. Those schemas would be incompatible.

For other uses cases, the instance owner may want to have a schema/data that is global for all portals and would hate this feature.

One solution to implement this without breaking the global option would be to add an optional _portalId to the table names.

So Contacts would be global for the instance, but Contacts_1 and Contacts_2 would be scoped per portal.

This leaves us with just the naming conflict between the global and portal specific ones... I am not sure what is the best way to solve this one but I think that since it's all WebAPI driven, the APIS could have a portalId in their request, if not provided they would serve the global one but if provided would serve the portal specific one.

@valadas valadas added the enhancement New feature or request label Sep 17, 2021
@valadas valadas added this to the 1.0.0 milestone Sep 17, 2021
@X3Technology
Copy link
Collaborator

100% agree with your proposed approach for table naming.
The user needs to be able to specify whether the ContentType is scoped by portal or global.
It's easy enough to handle this requirement within the database table naming, but I am concerned about the impact to the WebAPI endpoints .

Currently, the URL to get all Contacts looks like "/api/content/contacts".

If the request is coming from DNN, there is a portal context and it is easy to grab the PortalId and then point at the correct table. using the schema proposed above.

But, what if the request is coming from outside DNN (thinking headless CMS here)? Then, there would be no portalId in the request context.

I think the only solution would to append a portalId querystring parameter to the URL, something like "/api/content/contacts/portalid/1", which I don't love, but I'm not sure there is better option.

I am also wondering how the authentication layer might impact this. For example, if a user authenticates and is only allowed access to PortalId=3, then it would simple to point his request of "/api/content/contacts/" to the 'Contacts_3' table. But what if he authenticates with a superuser account and has access to all the Contacts_ tables. Which one to point to then?

@valadas
Copy link
Member Author

valadas commented Sep 17, 2021

But, what if the request is coming from outside DNN (thinking headless CMS here)? Then, there would be no portalId in the request context.

It's still possible if the proper portal alias is used I believe... But I think I would prefer to be explicit because one may need the global one and someone else the portal specific one.

For authentication we definitely need to make that proper, I guess a host can access anything but and admin or other user would only be able to access the global data or the portal specific data just for that portal but not specific to other portals. And to make it even more complex, it may need to be different for reading and writing :) We certainly will have some fun in that are with the permission grid 😄

@X3Technology
Copy link
Collaborator

Yes I think you are right, If the portal alias is in the URL, we should be able to get the PortalId in the API request context.

Regarding the global table concept, I am also wondering about how to solve the use case where one portal's data could be shared with other portals.

In fact, maybe its not even necessary to have "global" tables. Maybe it's as simple as just making one portal's version of the table accessible to everyone.

So, a user could set everything up under portal 0, then use the permissions system to make it accessible up to all portals.
Basically, using the Roles defined in the Roles table (including the global -1, -2, -3 roles). Thus, the user would be able to select specific roles from each portal and assign the proper VIEW, EDIT, etc. permission from the Permissions table. Do we need new StructuredContent API specific permissions?

As for the API route formatting, here is my thought:
'/api/content/contacts' would default to the portalId determined by the context of the request (derived from portal alias).

However, all API calls would have an optional 'portalId' parameter that would override the context portalId and thus point to the table specific to the querystring portalId

So: '/api/content/contacts/portalId/3' would point to Portal 3's version of the Contacts table (necessary permissions still need to apply)

I think that's the cleanest solution and still allows us to have individual portal versions as well as shared (global) versions.

@valadas
Copy link
Member Author

valadas commented Sep 17, 2021

" Regarding the global table concept, I am also wondering about how to solve the use case where one portal's data could be shared with other portals.

I think we could handle this using portal groups, there is a field called "EffectivePortalId" if I recall correctly which give the root portal for that group no matter which portal in the group you came from, and then that could be used to scope multi-portal data and permissions

@valadas valadas removed this from the 0.1.0 milestone Sep 19, 2021
@WillStrohl
Copy link
Member

Am I reading this correctly... When you add a new data type for a new instance of structured content, there's a new data table created in the database?

@david-poindexter
Copy link
Contributor

david-poindexter commented Sep 20, 2021

Am I reading this correctly... When you add a new data type for a new instance of structured content, there's a new data table created in the database?

@hismightiness yes, when a new Content Type is created, a corresponding custom table is created to house its data.

@WillStrohl
Copy link
Member

Thanks for the clarification, David. I'm not a fan of that way of doing this, but whatever the community decides is what we'll go with. :)

@X3Technology
Copy link
Collaborator

X3Technology commented Sep 20, 2021 via email

@WillStrohl
Copy link
Member

This could lead to a bloating of schema, increases the chances of orphaned data, and also presents a challenge for some of our clients that have to audit and report any schema change to their security team - regardless of how it happened.

@X3Technology
Copy link
Collaborator

X3Technology commented Sep 21, 2021

Ok, so I read three objections:

  1. Bloated Schema
  2. Orphaned Data
  3. Schema Documentation

I must assume that since you are opposed to native tables for the custom data, you would be in favor of an EAV module similar to the core UserProfile table schema or some other structured content solutions like 2Sexy (or others).

  1. Bloated Schema. I don't see how the superuser (or permissioned user) having the ability to create custom tables is really is any different than the same user having the ability to install multiple custom third party modules which will install their own custom tables. If the organization doesn't want people to "bloat the schema" by creating custom tables, they can 100% turn off that ability to anyone that doesn't have the proper authority.

Further, at the end of the day, the data has to go somewhere. If it is in a custom table like this proposes or shoved into a generic EAV model, the data has to live somewhere. When you have native tables, at least the table's name gives some context to the kind of data it contains. A generic EAV table could contain anything and everything.

  1. Orphaned Data. The whole premise of a relational database is to capture how things, you know, relate, to each other. Enforcing referential integrity at the database level (which this solution does) ensures that orphaned records cannot happen. Even if you tried, you couldn't create an orphaned record, because the database will prevent the delete from ever happening. If you stuff all this same data into an EAV model, the referential integrity has to be enforced at the business logic layer rather than the database level (where it belongs). It is entirely possible that a user could go directly into an EAV table and manually delete a parent record without remembering/knowing to also manually delete it's child records. The argument of orphaned data actually supports the native table model, not the EAV model.

  2. Scheme Documentation. I don't have any clients with these stringent requirements, but this feels like a business requirement not a technical one. My guess is the vast, vast majority of our users don't have these requirements and we shouldn't make broad technical architecture decisions based on the business requirement of a small minority of our users. Further, there is nothing preventing the admin user from documenting the tables created in this system just as they would if they installed a custom third party module.

As food for thought, let me offer these benefits that you may not have thought about.

  1. Database Optimization / Performance
    Using native database tables, each table can be optimized to exactly the type of data that is holds. In an EAV schema, each piece of data must be stored in a generic string field (probably and nvarchar(MAX)) and then magically converted into the proper data type in the business logic layer at run time. This has several problems. First, the conversion process back and forth from string to whatever datatype the field is supposed to be is inherently not as optimized as using native datatypes. Second, it's entirely possible that someone could manually manipulate the database (import/export?) and accidentally put in a value that the business layer won't know how to handle (like putting a date into what is supposed to be a boolean field). Third, at the storage level, putting a bit value (or a integer value or a date value) into an string field is equally non-optimal. Booleans belong in a bit field, integers belong in a byte field, etc., etc. Columns have datatypes for good reason and part of that is so the data can be optimally stored and indexed for query performance.

As an aside, I actually started this project using an EAV model. I built the entire thing to work using an EAV model; it works great. I then tested it by creating a data object that had 20 fields of various data types and loaded it with 1 million records of fake data. I then modeled the same object using native tables that could be optimized for the data it contained. I then tested the performance of querying data from both models and the performance results were not even close. The EAV model had to do 20 inner joins in order to construct the results. It was night and day. The discrete table is nothing more "Select * from XXX". No joins, no data conversions. It's not even close. One end goal for this system is to be the starting point for a headless CMS. We need a system that is optimized at the lowest possible level for performance, not a house built on sand. EAV defeats the entire point of using a relational database in the first place.

  1. Ease of Use
    Have you ever tried to build a report that exports the UserProfile table? Or a raw SQL query that gets data out of an EAV model solution? It's not easy to build (or maintain). In fact, I don't even use the UserProfile table on any of my projects anymore. If a client needs the UserProfile functionality, I build a custom table to model exactly what they want. It's easier to understand and maintain and use.

  2. Portability
    What if the client wants to import some existing data into the data model? Or export it to some other system? Or build a custom third party module on top of the data? Good luck if your data is in an EAV table schema... The EAV model is only useful within the context of the solution implementing it. Relative to custom tables, it is extremely difficult to use the data anywhere other than the solution it belongs too.

Now, don't get me wrong. EAV absolutely has its place, but it isn't in a solution that's primary purpose is to manage potentially large volumes of data. I fully intend to build a FormBuilder that will reuse most of the concepts found in this solution, and I will use the EAV model for that solution. However, the primary use case for a forms module is much different than a structured content solution. First, the expected data population of a forms module is much less than that found in structured content scenarios. So, performance is not nearly the concern for a forms generator solution as the primary database I/O is inserting singleton records and the occasional data dump for a report. A general use structured content solution needs to be able to handle high-traffic read operations over the entire dataset and return results as quickly as possible. Further, in a forms module scenario, native tables just seem like overkill. Second, importing/exporting or using the data outside the context of the module is also not as much of a priority. In that use case, the EAV model is absolutely a valid choice. But for what we proposing and how it potentially could be used, the EAV model is just not the right choice.

@WillStrohl
Copy link
Member

Great discussion, Jay. Thanks for taking the time to write out your thoughts.

RE: Bloated Schema
When installing an extension that brings new schema with it, it's a known set of schema changes. With most clients, they would have already identified these dependencies and know they're going to happen in production before they occur, and it's usually not an overwhelming number of tables.

I have two clients I already know could never use this design because a new data type or something would trigger a security event because a new table was created by a content editor. Now, the site has to undergo an audit related to the content changes. Installation and content updates are two different levels of risk that are directly related in this scenario. In fact, if this is the pattern that's rolled out, I already know I'll need to uninstall this as part of each upgrade to prevent potential security events later.

RE: Ease of Use
I agree. Your points would be a trade-off if that approach were taken.

RE: Portability
I disagree with this point entirely as a con. We have a module developer feature called IPortable that literally uses this pattern to make content and any other serializable data portable. It's simply a matter to have a well-defined schema and logic pattern.

Your points about scale are good ones as well. There are ways to address this as well.

By the way, I'm not saying this needs to change. I'm just adding dialog based on our own clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants