Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change postgres get_catalog to not use information_schema #1540

Merged
merged 4 commits into from
Aug 4, 2019
Merged

Change postgres get_catalog to not use information_schema #1540

merged 4 commits into from
Aug 4, 2019

Conversation

elexisvenator
Copy link
Contributor

  • information_schema in Postgres is not very performant due to the complex views used to create it
  • use underlying pg_catalog tables/views instead
  • returns the same rows/columns as the information_schema version
  • order of rows is different, this is because there was only a partial sort on the information_schema version
  • column_type will return different values to before
    • some arrays were ARRAY, will now be {type}[]
    • user-defined types were previously USER_DEFINED, now will be the name of the user-defined type
      ^ main point of this PR
  • performance is 2-5x faster, depending on query caching

 - `information_schema` in Postgres is not very performant due to the complex views used to create it
 - use underlying `pg_catalog` tables/views instead
 - returns the same rows/columns as the `information_schema` version
 - order of rows is different, this is because there was only a partial sort on the `information_schema` version
 - `column_type` will return different values to before
   - some arrays were `ARRAY`, will now be `type[]`
   - user-defined types were previously `USER_DEFINED`, now will be the name of the user-defined type <-- main point of this PR
 - performance is 2-5x faster, depending on query caching
@elexisvenator
Copy link
Contributor Author

Docs with custom data types :D

image

@elexisvenator
Copy link
Contributor Author

No sure what is failing in tests, looks like a date formatting issue? Doesn't seem related to the code I touched.

@drewbanin
Copy link
Contributor

Thanks for the PR @elexisvenator - this looks really great! Don't worry about the integration tests - we'll kick those off for you.

I'd need to do some searching here, but I recall reading that different tables/views are present in pg_tables vs information_schema vs other system tables depending on the permissions that a given user has. Do you know anything about that? Is it something we'd need to consider here?

I'd like to play around with this one a little bit, but this looks very, very good :). Thanks again!

@elexisvenator
Copy link
Contributor Author

elexisvenator commented Jun 15, 2019

It's interesting to check the where clauses of the other views to see how they work. For comparison, here is information_schema.tables:

  WHERE ((pg_class.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char", 'p'::"char"])) 
    AND (NOT pg_is_other_temp_schema(npg_class.oid)) 
    AND (   pg_has_role(pg_class.relowner, 'USAGE'::text) 
         OR has_table_privilege(pg_class.oid, 'SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) 
         OR has_any_column_privilege(pg_class.oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)));

and here is the pg_catalog ones:

-- pg_catalog.pg_tables
  WHERE (pg_class.relkind = ANY (ARRAY['r'::"char", 'p'::"char"]));
-- pg_catalog.pg_views
  WHERE (pg_class.relkind = 'v'::"char");

So information_schema includes foreign tables, but then also does a bunch of privilege checks that the pg_catalog views dont. a relkind of p apparently means partitioned table but that wasn't actually documented in the postgres docs :/.

Happy to go back and add the permission checks and tweaks

@drewbanin
Copy link
Contributor

I spent some time playing around with this and noticed that there are columns in the response that have names like ........pg.dropped.6......... Have you seen anything like that in your experiments? Looks like we can filter on col.attisdropped, but I'd be curious to know if there are any other subtle differences like this that might pop up

@drewbanin
Copy link
Contributor

@elexisvenator I'd love to target this for a patch release like 0.14.1 -- is this something you think you'll have time to iterate on? I think it's very, very close, and @beckjake and I are big fans of your implementation :)

@elexisvenator
Copy link
Contributor Author

Sure, ill tweak it on the weekend :)

I suspect the pg.dropped columns are getting cleaned up by the privilege checks, will experiment further

@elexisvenator
Copy link
Contributor Author

So ran some more tests, these changes should stop the deleted columns appearing.

The one big thing that the information_schema views do that I left out is the table and column permission checks. They seem to be intended for a system where multiple users create their own isolated set of tables or even columns within the same database. This isn't really the type of scenario that dbt would be used in, and given that dbt ignores results for any schemas/tables that dont belong to things that dbt creates I dont see (the unlikely chance of) any extra results due to no permission checks being a problem.

@drewbanin
Copy link
Contributor

Thanks @elexisvenator - I buy all of this. You're right - dbt will match up the relations it finds in the catalog with the models/sources/seeds/etc that exist in the dbt project, so I don't think visibility into the existence of objects that a user can't access should be a problem.

Let me kick off the tests here, then we should be good to merge this. We'll probably want to change the target branch - let me chat with @beckjake about that one.

@drewbanin drewbanin changed the base branch from dev/wilt-chamberlain to dev/0.14.1 July 2, 2019 18:13
v = view
r, f, p = all are different forms of table
@cmcarthur cmcarthur requested a review from drewbanin August 1, 2019 15:05
@drewbanin
Copy link
Contributor

@elexisvenator I lost track of this one! Kicked off the tests here. When they pass, this will be ready to roll for (hopefully!) 0.14.1 :)

@drewbanin
Copy link
Contributor

Merging this for dbt v0.14.1 -- thanks for your contribution (and your patience :p) @elexisvenator!

@drewbanin drewbanin added this to the 0.14.1 milestone Aug 4, 2019
@drewbanin drewbanin merged commit b2f2e69 into dbt-labs:dev/0.14.1 Aug 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants