-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: make view descriptors depend on table/columns by IDs #15388
Conversation
Great stuff! How would One thing I don't quite understand is: if I'm understanding the change correctly, we look up the current table and column names when we print out the expression. Why wouldn't showing that (without the numeric references) be correct? In your example, suppose column "v" was renamed to "w", we we would display it with the new column name "w". Review status: 0 of 16 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. pkg/sql/testdata/logic_test/views, line 504 at r1 (raw file):
Can we have some tests that rename tables/columns and then do SHOW CREATE VIEW again? Comments from Reviewable |
Renames are not yet activated, as per the commit message (I haven't implemented the necessary migration yet). However, hypothetically:
The "AS kv" clause is embedded into the view descriptor, for good reason: every potential other reference to the names "kv", "k" or "v" in the rest of the query cannot be (easily) rewritten. To answer your other points:
No that's not correct; the names are embedded alongside the IDs during view creation. The code for
That's because we do not have valid SQL syntax to display a non-numeric-reference table where the set of columns is different. Again the example from initially (hypothetically):
What do you expect to see?
Maybe you're thinking about constructing an ad-hoc subquery |
I see. My thinking was that we could just replace the IDs with the current names, and we can do that even if we have to keep the This projection is only useful if we want to leave any The ad-hoc subquery to do the projection would work too I guess, as long as we only include it when a projection is necessary. There wouldn't be any round-tripping issues unless there's a schema change in-between. |
Review status: 0 of 16 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. pkg/sql/data_source.go, line 443 at r2 (raw file):
is an Comments from Reviewable |
In a regular AS clause ( I will look into doing the other thing you mentioned earlier. |
Thanks for doing this, @knz! Let me know if you want a full review from me, otherwise I'm going to trust that @RaduBerinde found everything I would have and more. Review status: 0 of 16 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. Comments from Reviewable |
Bumping to 1.1, too much cleanup to do here for today |
I'm kinda bummed about this PR, there's an unforeseen hurdle: when restoring a backup that was made prior to this change, the descriptor must be rewritten but it's not easy to handle this in the restore context. A good way forward would be to introduce a new FormatVersion value for table descriptors, which is one beyond the current one, and indicates the descriptor is using structural dependencies. This way the backup code can mostly ignore the problem. (It also solves the other problem which this PR had from the beginning: the lack of a migration story that would enable ALTER RENAME.) However doing so requires introducing a version upgrade path that doesn't exist currently: one that looks up other descriptors in the process of updating the current one. As the descriptor upgrade can occur in the lease code, I am very concerned that there's some kind of care to be taken that the table descriptors that will be used during the rename are "atomically considered" with the lease being taken on the upgraded descriptor. I do not know how to do this. @jordanlewis do you think we could sit together at some point and look at this together? I believe you are more familiar with this code. |
b2377fe
to
90370ee
Compare
@cuongdo I would like to ressucitate this PR, but I fear it needs concerted attention from people comfortable with both the lease code, descriptor code and perhaps backup code. |
ffa0c99
to
28baa3c
Compare
f589bc1
to
efa474d
Compare
bcf0cf5
to
3cacc5e
Compare
An update on this PR:
|
Some tests failing due to base changes. Will also fix next. |
One step forward - the only remaining broken test is because of #17306, which I will fix shortly. |
8af350b
to
d9271e2
Compare
Prior to this patch, view dependency analysis during CREATE VIEW was broken because it would only check dependencies after plan optimization, i.e. possibly after some dependency were lost. For example, with the queries: ```sql CREATE VIEW v AS SELECT k FROM (SELECT k,v FROM kv) -- loses dependency on kv.v CREATE VIEW v AS SELECT k,v FROM kv WHERE FALSE -- loses dependency on kv ``` This patch addresses the issue as follows: - the dependencies are now collected during the initial construction of the query plan, before any optimization are applied. This way we ensure the dependency tracking is complete. - a migration is implemented that will fix any view descriptor and corresponding dependency information that was populated prior to this fix.
This patch does what it says on the label: the descriptor still contains a valid SQL query, but with all table names rewritten to use numeric table references. These are hidden initially, until the referenced tables gets a new schema. For example: ``` CREATE TABLE kv(k INT PRIMARY KEY, v INT); CREATE VIEW vx AS SELECT v AS x FROM kv; SHOW CREATE VIEW vx; +------+-----------------------------------------+ | View | CreateView | +------+-----------------------------------------+ | vx | CREATE VIEW vx AS SELECT v AS x FROM kv | +------+-----------------------------------------+ (1 row) ALTER TABLE kv ADD COLUMN d INT; SHOW CREATE VIEW vx; +------+--------------------------------------------------------------+ | View | CreateView | +------+--------------------------------------------------------------+ | vx | CREATE VIEW vx AS SELECT v AS x FROM [64(1, 2) AS kv (k, v)] | +------+--------------------------------------------------------------+ (1 row) ``` A side effect of this patch is that view definitions can now use star expansion, i.e. `CREATE VIEW kv_alias AS SELECT * FROM kv` is now possible and valid. (And there was much rejoicing.) Another side effect, visible in the example above, is that `SHOW CREATE VIEW` now reveals the structural dependency (showing both the ID and the original table name) when the table's name or column definitions change. This may or may not be desirable from a UX perspective, however anyone wishing to improve upon this will take note that if the column list in the table descriptor was altered after the view was created (e.g. to add new columns, rename columns or remove columns not depended on by the view), there is no valid way to print out the view definition using valid SQL but without using the table reference syntax. Consider the example above: suppose column "v" was renamed to "w"; trying to print out as `create view vx as select v as x from kv` would be invalid because then column "v" would have disappeared; then suppose "v" was renamed to "w" and a new unrelated column "v" was added, trying `create view vx as select v as x from kv as kv(v)` would be invalid as well because the new column `v` is unrelated to the one the view was intended to depend on. Due to these obstacles, it is advisable to let the numeric table reference show up in the output of `SHOW CREATE VIEW` and document this behavior. Although it would be correct to do so for newly-created clusters, this patch does *not* lift the restriction on ALTER that prevents it from renaming tables or columns depended on by views. This change would be correct on new clusters because once the guarantee is enforced that all view descriptors depend on table/columns by IDs, renaming becomes possible (renaming preserves the ID); however, for previously created clusters there may exist already some view descriptors containing by-name table references, and allowing those to be renamed would be unsound. The proper approach is to completement this patch by a cluster migration which rewrites all existing view descriptors. This work is left to a subsequent commit. A last improvement brought by this PR is that column renames are now properly visible in pg_catalog: ```sql > CREATE VIEW v (x) AS SELECT k FROM kv > SELECT definition FROM pg_catalog.pg_views WHERE viewname = 'v' SELECT k AS x FROM (SELECT k FROM kv) ```
cc @danhhz can you check I didn't do anything foolish in |
I only looked at restore.go. Left one comment Review status: 0 of 44 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/ccl/sqlccl/restore.go, line 311 at r4 (raw file):
since we error in the second branch, this is equivalent to In the event that you strongly disagree and want to keep it like this, then please rename This all also really wants a backup/restore test, but I'm fine leaving that for a followup given how big this PR is already Comments from Reviewable |
@RaduBerinde @jordanlewis I am considering closing this PR in favor of another solution I discovered this week: instead of rewriting the view query as done in this PR, instead,
The advantage is that the view query would stay identical which would improve UX when looking at the view query after new columns have been added to the underlying views/tables. What do you think? |
I agree that would better from a UX perspective. I don't know if there are any subtleties involved implementation-wise, I'll let @jordanlewis chime in. |
Closing in favor of #17501. |
(Needed for #12611 and #13968.)
This patch does what it says on the label: the descriptor still
contains a valid SQL query, but with all table names rewritten to use
numeric table references. These are hidden initially, until
the referenced tables gets a new schema.
For example:
A side effect of this patch is that view definitions can now use star
expansion, i.e.
CREATE VIEW kv_alias AS SELECT * FROM kv
is nowpossible and valid. (And there was much rejoicing.)
Another side effect, visible in the example above, is that
SHOW CREATE VIEW
now reveals the structural dependency (showing both theID and the original table name) when the table's name or column
definitions change.
This may or may not be desirable from a UX perspective, however anyone
wishing to improve upon this will take note that if the column list in
the table descriptor was altered after the view was created (e.g. to
add new columns, rename columns or remove columns not depended on by
the view), there is no valid way to print out the view definition
using valid SQL but without using the table reference syntax. Consider
the example above: suppose column "v" was renamed to "w"; trying to
print out as
create view vx as select v as x from kv
would beinvalid because then column "v" would have disappeared; then suppose
"v" was renamed to "w" and a new unrelated column "v" was added,
trying
create view vx as select v as x from kv as kv(v)
would beinvalid as well because the new column
v
is unrelated to the one theview was intended to depend on. Due to these obstacles, it is
advisable to let the numeric table reference show up in the output of
SHOW CREATE VIEW
and document this behavior.Although it would be correct to do so for newly-created clusters, this
patch does not lift the restriction on ALTER that prevents it from
renaming tables or columns depended on by views. This change would be
correct on new clusters because once the guarantee is enforced that
all view descriptors depend on table/columns by IDs, renaming becomes
possible (renaming preserves the ID); however, for previously created
clusters there may exist already some view descriptors containing
by-name table references, and allowing those to be renamed would be
unsound. The proper approach is to completement this patch by a
cluster migration which rewrites all existing view descriptors. This
work is left to a subsequent commit.
A last improvement brought by this PR is that column renames are now
properly visible in pg_catalog:
cc @a-robinson