-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: User-defined SQL schemas #30916
Conversation
8ff1d1d
to
c75cd57
Compare
c75cd57
to
8628ec7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very comprehensive RFC!
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
(I like the idea to avoid descriptors because that would ask complex questions about leases and the allocation of ranges.)
Can you elaborate on this concern? I was imagining that stored schemas would be implemented with some sort of SchemaDescriptor
and would be present in the system.namespace
table very much like you describe in this section. How would any of these alternatives affect the allocation of ranges (or range IDs which you allude to elsewhere)? Are the leases you're concerned about table leases? We don't currently have leases on databases (unless something changed that I'm not aware of).
system.namespace
was initially designed to support arbitrary hierarchy similar to a filesystem, but we limited the initial lookup support to only 2 levels (database, table). I think you might be aware of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Can you elaborate on this concern? I was imagining that stored schemas would be implemented with some sort of
SchemaDescriptor
and would be present in thesystem.namespace
table very much like you describe in this section. How would any of these alternatives affect the allocation of ranges (or range IDs which you allude to elsewhere)? Are the leases you're concerned about table leases? We don't currently have leases on databases (unless something changed that I'm not aware of).
system.namespace
was initially designed to support arbitrary hierarchy similar to a filesystem, but we limited the initial lookup support to only 2 levels (database, table). I think you might be aware of this.
-
about ranges: currently every descriptor ID, be it db or table, causes a range to be allocated.
-
about leases: the fact there are no leases on dbs currently is a bug and we are lucky our users do not notice because traffic on db descriptors is low.
In contrast once schemas are available there will be many more create/drop traffic on schemas, also inside transactions, and users will be much more sensitive to the lack of proper transactional semantics on those. I don't want to be casual with them as we've been with databases.
-
about system.namespace and hierarchy. We're making a SQL database with at most 2-3 levels. The full generality of system.namespace was premature design IMHO. Also even if we wanted to implement arbitrary deep objects (say, if/when we want a filesystem) this schema is poor because it requires recursive queries to normalize/resolve paths. There are better approaches to storing filesystem paths, for example that used in ReiserFS or ZFS, and if/when we want to do that we should get inspiration from there.
Until then I would like to avoid considering this theoretical situation in this discussion, and instead capitalize/utilize the fact we need at most 2-3 levels to optimize/simplify things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
about ranges: currently every descriptor ID, be it db or table, causes a range to be allocated.
Right, but I don't think the addition of schemas will cause this problem to get much worse. We should fix this for databases.
In contrast once schemas are available there will be many more create/drop traffic on schemas, also inside transactions, and users will be much more sensitive to the lack of proper transactional semantics on those. I don't want to be casual with them as we've been with databases.
Hmm, ok. Can you elaborate on these concerns a bit more? I'm not up to speed on what problems the lack of database schemas is causing. Also, cc @vivekmenezes who has been thinking about "schema leases" for 2.2. I'm not quite sure what level of "schema" he is thinking of.
Until then I would like to avoid considering this theoretical situation in this discussion, and instead capitalize/utilize the fact we need at most 2-3 levels to optimize/simplify things.
Ack. I didn't mean to suggest that we should support more levels than database / schema / table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
Right, but I don't think the addition of schemas will cause this problem to get much worse. We should fix this for databases.
"Much" perhaps not, but worse definitely. In test frameworks, schemas get created at a similar rate as tables. So we'd double the traffic on identifiers. This alone actually would make alternative C below more attractive.
Hmm, ok. Can you elaborate on these concerns a bit more? I'm not up to speed on what problems the lack of database schemas is causing.
I'll look into it, I can't recall the specific problematic SQL patterns out of my head. However perhaps @vivekmenezes knows them by heart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
"Much" perhaps not, but worse definitely. In test frameworks, schemas get created at a similar rate as tables. So we'd double the traffic on identifiers. This alone actually would make alternative C below more attractive.
I'm going to file an issue about this. I wouldn't want these spurious empty ranges to be a reason to prefer one of these alternatives over another. I believe it shouldn't be too difficult to fix. For the purposes of this RFC, I think it should be a minor consideration (though it is great that you pointed it out).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
I'm going to file an issue about this.
Note that the issue here is independent of SchemaDescriptors
. Simply allocating IDs from the same namespace as table IDs and database IDs (as is being suggested in this alternative) will cause the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I'm going to file an issue about this.
Note that the issue here is independent of
SchemaDescriptors
. Simply allocating IDs from the same namespace as table IDs and database IDs (as is being suggested in this alternative) will cause the problem.
issue about databases having empty ranges: #24847
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 122 at r1 (raw file):
- using a simple 1-part name, e.g. `users`, both the "current database" (`SET database` / `USER`) and the *current schema* (`SET
s/USER/USE/
docs/RFCS/20181003_schemas.md, line 224 at r1 (raw file):
inherit directly from database to table even with schemas defined. The ability to customize zones per schema would be a feature that
I think this is worth doing at the same time that we introduce user-defined schemas.
docs/RFCS/20181003_schemas.md, line 362 at r1 (raw file):
latency. ### Alternative B: schemas with descriptors
I prefer option B.
I dislike option A because it creates such a difference between databases (which have descriptors) and schemas (which have some other to-be-determined place to store the things that are currently in the DB descriptor). Option A would be more appealing to me if we planned to move away from database descriptors too and make them use the same system as for schemas.
Option C is acceptable, but it feels hackier to me than option A or B. I think the concerns about database descriptor leases are more severe for option C (for option B, operations on schemas would not be substantially more common or more complex than operations on databases are today, so it doesn't make things much worse, but in option C the kinds of changes we'd make to database descriptors become more complex). Vivek's schema leases should take care of this no matter which alternative we choose here, though.
that's not quite corrects -- we've found that test frameworks create/destroy schemas at a similar rate as tables, not databases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 31 at r1 (raw file):
A. compatibility with PostgreSQL -- ORMs and client apps expect this to work. B. simplify the management of multi-user or multi-app clusters.
I'm not convinced that B is important at this stage. Is A a priority?
docs/RFCS/20181003_schemas.md, line 356 at r1 (raw file):
(for example, privileges and grants for creating new tables) would be stored in a separate system table. (to be defined by further experimentation)
i think privileges should be kept out of descriptors. See #2939
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 31 at r1 (raw file):
Previously, vivekmenezes wrote…
I'm not convinced that B is important at this stage. Is A a priority?
Robert has pressed for A, it is found to be a blocker in the sales pipeline
Lakshmi has pressed for B, it is found to be demand from managed users
The motivation for this work (and the fact it was put on the 2.2 roadmap) comes from them. We can ask Andy Woods for clarification.
docs/RFCS/20181003_schemas.md, line 356 at r1 (raw file):
Previously, vivekmenezes wrote…
i think privileges should be kept out of descriptors. See #2939
I tend to agree -- although that seems to be an orthogonal concern to this RFC. As long as table/db descs are storing descriptors, any new schema descriptors can do the same; only when we actively work to remove privs from db/table descs, should we remove them from schema descs too. The design for the two things ought to be consistent.
I will reword this to express this more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
Previously, vilterp (Pete Vilter) wrote…
issue about databases having empty ranges: #24847
FYI, databases and views will no longer create empty ranges. So the "allocation of ranges" part here is now FUD. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
FYI, databases and views will no longer create empty ranges. So the "allocation of ranges" part here is now FUD. 😃
Saw that. Nice work @ridwanmsharif!
8628ec7
to
4f558b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
docs/RFCS/20181003_schemas.md, line 122 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
s/USER/USE/
Done.
docs/RFCS/20181003_schemas.md, line 224 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I think this is worth doing at the same time that we introduce user-defined schemas.
Done.
docs/RFCS/20181003_schemas.md, line 280 at r1 (raw file):
Previously, vilterp (Pete Vilter) wrote…
Saw that. Nice work @ridwanmsharif!
Done.
docs/RFCS/20181003_schemas.md, line 356 at r1 (raw file):
Previously, knz (kena) wrote…
I tend to agree -- although that seems to be an orthogonal concern to this RFC. As long as table/db descs are storing descriptors, any new schema descriptors can do the same; only when we actively work to remove privs from db/table descs, should we remove them from schema descs too. The design for the two things ought to be consistent.
I will reword this to express this more clearly.
Done.
docs/RFCS/20181003_schemas.md, line 362 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I prefer option B.
I dislike option A because it creates such a difference between databases (which have descriptors) and schemas (which have some other to-be-determined place to store the things that are currently in the DB descriptor). Option A would be more appealing to me if we planned to move away from database descriptors too and make them use the same system as for schemas.
Option C is acceptable, but it feels hackier to me than option A or B. I think the concerns about database descriptor leases are more severe for option C (for option B, operations on schemas would not be substantially more common or more complex than operations on databases are today, so it doesn't make things much worse, but in option C the kinds of changes we'd make to database descriptors become more complex). Vivek's schema leases should take care of this no matter which alternative we choose here, though.
I have addded your reaction to the rationale section below.
4f558b8
to
9e80d86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed/studied this with @bobvauwter today. Came up with new alternative D; also found pitfalls due to mixed-version clusters with the other alternatives and the new one. Came up with a test scenario to implement as foundation for the evaluation of the various alternatives.
Reviewable status: complete! 0 of 0 LGTMs obtained
Waiting for this one to go live. Any ETA? |
No earlier than 19.2
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
@Disturbing, out of curiosity, what's your use case? In CockroachDB, there's little difference between a |
Im working on using hasura's graphql engine @ global scale. Last mentioned, there's an error here: hasura/graphql-engine#411 (comment) I can ask their team to change the setup if you have a recommendation. I personally haven't ran the startup just yet, but I should be able to find some time to repo this error if it still exists or not. |
@Disturbing my recommendation would be to use databases instead of schemas and see whether that satisfies your use case. |
9e80d86
to
1128801
Compare
Updated RFC with new version numbers, also minor simplification informed by #41482 |
hey @knz is this still up to date given the temp table work? i'm not sure what the follow up is given that schemas have an furthermore, there is a performance issue we need to solve with table resolution given #44802 |
I like B best, but as you note, we'll need a story around caching and retrieval. I would like to move databases and schemas to be leased resources like tables. There are lots of details to be worked out but in my mind: A database version will change when adding, deleting, or renaming a child. Same for a schema. We'll also need a migration to move the tables to a public schema. There's lots of details to work out here. This is all part of planning work for the upcoming sql schema team. |
This draft RFC has not been updated since Arul chose his scheme for the new system.namespace. Now that this choice is made, some of the discussion in the RFC can be collapsed. is this a discussion we want to pick back up now? |
It looks like user defined schemas will be an item for the sql features team in 20.2, so its time for us to start thinking about this again. |
Yep, also on the schema teams plate. The tentative breakdown is that name resolution and the associated leasing falls on the schema team. Let's make sure to coordinate. |
For anyone following this issue, there's an updated RFC for user-defined schemas here: #48276 |
Release note: None
1128801
to
88e35a3
Compare
At knz's suggestion I am marking this version of the RFC as "superseded" and merging it. bors r+ |
❌ The GitHub CI (Cockroach) build has failed on 88e35a37. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Build succeeded |
48276: rfcs: user-defined schemas r=lucy-zhang a=lucy-zhang #30916 is @knz's original RFC for user-defined schemas. I created a mostly-new one because our support for schemas has improved significantly over the last few releases (in part motivated by temp tables), and now many of the open questions in the previous RFC have been resolved. This new version attempts to pinpoint the challenges we have yet to deal with. A few sections of this are taken more or less directly from the previous RFC, where applicable. Closes #47148. Release note: None Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
Informs #30916.
Release note: None