Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Postgres enums #580

Closed
wants to merge 9 commits into from

Conversation

dstu
Copy link
Contributor

@dstu dstu commented Jan 15, 2017

This PR provides the infer_enums! macro, which creates Rust enums for each enum discovered in a database, and extends infer_schema! so that you can name columns .

See #343 for discussion about how this has been developed up until now.

Items to be addressed before this is ready for merge:

  • Correctly implement all necessary traits for newly declared types.
  • Clean up changes to infer_schema! and related macros. (Argument-passing to indicate where to look up non-builtin types is rather unslightly.)
  • Add more Rust documentation describing how the new features work.
  • Fix migrations and tests so that tests pass on SQLite again. (SQLite does not support custom enum types the way Postgres does.)
  • Get things working within custom schemas.
  • Implement and test arrays of custom types.

dstu added 8 commits January 14, 2017 04:00
…ostgres

enums.

No actual encoding/decoding is done yet, but codegen compiles and looks
reasonably okay. Testing is lacking, so the code is far from guaranteed from
being correct at this point.
Encoding/decoding of actual enum values is not yet supported. (impls are
suggested but just call unimplemented!). Enum-valued table columns is also yet
not supported.
New impls of HasSqlType cannot be defined for types inside the diesel crate, so
we can't do something like:

impl HasSqlType<NewType> for diesel::pg::Pg { ... }

This commit adds the trait ProvidesSqlType, which can be implemented for new
types, and a blanket impl so that any type with a suitable impl of
ProvidesSqlType gets a corresponding impl of HasSqlType. So new types can be
added with an impl declaration like:

impl ProvidesSqlType<diesel::pg::Pg> for NewType { ... }
diesel_codegen/Cargo.toml: add dependency on syntex_syntax in an attempt at
getting access to the "self" keyword for qualifying symbol lookups. May not be
the right way to go.

diesel_codegen/src/lib.rs: raise macro recursion limit to
enable builds with inferred enum types. Pull in syntex_syntax.

diesel_codegen/src/schema_inference.rs: propagate information about types being
builtin or derived from the DB schema. Some fuzziness here about how exactly to
do local name lookups (WRT "self" and not the crate root).

diesel_codegen/src/type_inference.rs: move some utility functions to

diesel_codegen_shared/src/schema_inference/pg.rs. Update enum code generation to
try to implement with all necessary interfaces.

diesel_codegen_shared/src/lib.rs: add conditional dependency on itertools.

diesel_codegen_shared/src/schema_inference/data_structures.rs: track whether a
column type is a builtin or custom.

diesel_codegen_shared/src/schema_inference/mod.rs: conditional dependency on
itertools, export utility functions.

diesel_codegen_shared/src/schema_inference/pg.rs: move utility functions
here. Add hacky heuristic for detecting if a type is built-in.

diesel_codegen_shared/src/schema_inference/sqlite.rs: mark all SQLite types as builtin.

diesel_tests/tests/associations.rs: add values for new enum type columns.

diesel_tests/tests/boxed_queries.rs: add values for new enum type columns.

diesel_tests/tests/errors.rs: add values for new enum type columns.

diesel_tests/tests/expressions/mod.rs: add values for new enum type columns.

diesel_tests/tests/filter.rs: add values for new enum type columns.

diesel_tests/tests/filter_operators.rs: add values for new enum type columns.

diesel_tests/tests/find.rs: check values of new enum type columns.

diesel_tests/tests/insert.rs: insert values for new enum type columns.

diesel_tests/tests/joins.rs: check values of new enum type columns.

diesel_tests/tests/macros.rs: add values for new enum type columns.

diesel_tests/tests/order.rs: add values for new enum type columns.

diesel_tests/tests/schema.rs: add a call to infer_enums! so that we get our new
enum types. Unfortunately, it's still necessary to add an additional "pub use
self::blah" because I have not yet figured out how to import the new enum types
from their captor module automagically.

diesel_tests/tests/select.rs: select values for new enum type columns.

diesel_tests/tests/update.rs: update values for new enum type columns. It might
be a good idea to try to add a test updating an enum typed column.

migrations/postgresql/20161224024421_add_enum_columns/{up,down}.sql: add a
column for a new enum type. Still don't yet have things working in custom
Postgres schemas yet.
This feels like a bit of a hack, since it side-steps the type resolution issue
and continues to rely on a hard-coded heuristic for figuring out which types are
declared where, but it seems reasonable. Users of the library can declare or
"pub use" anything they need in the namespace they're working
with. This leaves types in schemas out in the cold, unless we're willing to
assume some sort of convention of parallel structure between Rust namespaces in
the extra types module and DB schemas.

diesel_codegen/src/lib.rs: bring up to date with changes from master.
diesel_codegen/src/schema_inference.rs: Thread extra types module through.
diesel_codegen/src/type_inference.rs: bring up to date with changes from master.
diesel_codegen_shared/src/schema_inference/mod.rs: Thread extra types module through.
diesel_codegen_shared/src/schema_inference/pg.rs: Thread extra types module
through and resolve non-builtin types in that namespace.
diesel_tests/tests/lib.rs: new module for automagic enum decls.
diesel_tests/tests/schema.rs: pass new macro parameter for the extra types module
@dstu dstu changed the title Create enums Support for Postgres enums Jan 15, 2017
@killercup
Copy link
Member

Nice! I saw your branch a while ago, and am glad to see this PR. This is quite a lot of code! I'll try to read through it this evening, or tomorrow afternoon. Feel free to bug me if I forget about it though :)

Some record of the original variant name is necessary so that SQL generation
uses the right literals.
Copy link
Member

@killercup killercup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew, this is a lot of new code. That's amazing work!

I read through it, but probably not as in-depth as I should have (esp. the type_inference module). You seem to be working on adding even more things anyway :)

Before this reaches the magic 1000 lines mark: Do you think it'd make sense to land this in several batches? E.g., adding support for custom schemas and enums in PG array can probably be added in future PRs.


fn infer_enums_for_schema_name(database_url: &str, schema_name: Option<&str>,
type_list: Option<&str>, camel_case_types: bool,
camel_case_variants: bool) -> quote::Tokens {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is rustfmt? Can you try to match the fn declaration style used in e.g. column_def_tokens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting is mostly a combination of what rust-mode does in Emacs and my attempts at respecting cosmetic issues like line length as per a hazy recollection of the recommendations aturon compiled. If you prefer, I'll run rustfmt over things before declaring this ready for a merge.

infer_table_from_schema!(#database_url, #table_name);
}
pub use self::#mod_ident::*;
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor this to let macro_call = if let Some(type_module) = extra_types_module { quote! { infer_table_from_schema!(extra_types_module=#ype_module, #database_url, #table_name); } } else { quote! { infer_table_from_schema!(#database_url, #table_name); } } so the mod and pub use doesn't need to be repeated?

@@ -1,4 +1,5 @@
use syn;
// use syntex_syntax::symbol::keywords;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where you use syntex right? Please please please remove this dependency if it's not going to be used, it takes ages to compile. (And if we need something that syn doesn't have in this regard, I'd gladly make a PR to syn!)

Copy link
Contributor Author

@dstu dstu Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was trying to use syntex to work around some issues but don't need it right now. I'll kill the dependency if it's not needed (which it probably won't be) before any merge.

@@ -74,19 +88,51 @@ mod information_schema {
}
}

pub fn determine_column_type(attr: &ColumnInformation) -> Result<ColumnType, Box<Error>> {
pub fn camel_cased(snake_case: &str) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Finally! PascalCase! Errrr, I mean, "upper camel case". (On a more serious note, I think this fn should be called snake_to_type_case.)

Please note that all other case conversions we do are trivial conversions by choice, to not introduce surprisingly complex behavior.

The snake case version we get here are valid Postgres identifiers, right? What properties do they have in comparison to Rusts type identifiers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK on changing the method name.

Right now, this PR converts a Postgres type name or variant foo_bar into FooBar on the Rust side. There are bool arguments to the macro expansion methods to control whether this happens, but they aren't currently used because there is no macro-level interface for toggling them.

I think it makes sense to map to Rust-style naming, but I am willing to be convinced otherwise. If any name manging could happen, clearly documenting default behavior and providing an option to turn it off are also necessary.

#[test]
fn camel_cased_i18n() {
assert_eq!("Außerdem", camel_cased("außerdem"));
assert_eq!("AuSSerdem", camel_cased("au_ßerdem"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N̵͡i̛c͘e ̀e͜d̨͘͢g͜͡e͠͞ ̴̛c̶̨͞a͢͜s͞ȩ̵s.͟͠

Can enum variant names contain umlauts, though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Rust, it's an open issue.

In Postgres, it appears perfectly valid to have both types and variants contain umlauts. I'll have to dig deeper into the documentation to see if there are any surprises with regards to other unusual characters.

It seems like eventual convergence (arbitrary whitespace-free Unicode allowed) is an eventual goal, but it would probably be better to check explicitly that we're generating a legal type name and die with a reasonable explanation rather than letting rustc barf.

/// schema name.
///
/// This macro can only be used in combination with the `diesel_codegen` or
/// `diesel_codegen_syntex` crates. It will not work on its own.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up: diesel_codegen_syntex will be removed in 0.10

@@ -1,4 +1,5 @@
#![deny(warnings)]
#![recursion_limit = "512"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this needed for? (It's fine, just maybe add a comment what requires this, so we can remove it when it's no longer needed.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quote! macro makes use of (potentially deep) recursion, and my large invocations of it are running into the default limit. This number should be tuned to fit whatever this PR's eventual needs are; it may be that we wind up needing something smaller at first.

@@ -249,10 +249,20 @@ pub type VarChar = Text;
#[cfg(feature = "postgres")]
pub use pg::types::sql_types::*;

pub trait ProvidesSqlTypeFor<DB> where DB: TypeMetadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see a doc comment here telling me what this type is all about. I assume it abstracts over structs and enums providing SQL types? Maybe also mention HasSqlType is implemented for everything that implements this trait.

quote = "0.3.10"
diesel = { version = "0.9.0", default-features = false }
diesel_codegen_shared = { version = "0.9.0", default-features = false }

[dependencies.syn]
version = "0.10.3"
features = ["full"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the same inline table syntax as above?

syn = { version = "0.10.3", features = ["full"] }

@@ -10,11 +10,15 @@ repository = "https://github.com/diesel-rs/diesel/tree/master/diesel_codegen"
keywords = ["orm", "database", "postgres", "sql", "codegen"]

[dependencies]
syn = "0.10.3"
syntex_syntax = "0.52.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can not introduce this dependency. Syntex takes ages to compile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I don't think this will be needed.

@dstu
Copy link
Contributor Author

dstu commented Jan 18, 2017

This can be broken up fairly easily; it is quite feasible to implement support simply for generating enum types to match what's in a database.

I'll make changes to the PR that address the comments so far when I have more time this week.

@@ -12,6 +12,22 @@
/// This macro can only be used in combination with the `diesel_codegen` or
/// `diesel_codegen_syntex` crates. It will not work on its own.
macro_rules! infer_schema {
(extra_types_module = $extra_types_module: expr, $database_url: expr) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the solution of the user manually specifying extra_types_module.

From a usability standpoint, it would be nicer to be able to infer_everything! and just get tables, types, and (eventually) views from the database.

From a design standpoint, this is just kicking the problem down the field a short distance. As-is, all peers to SQL types live in diesel::types, and library internals freely assume this. Looking up types that are not on a whitelist on a single user-specified alternate path works for now, but a few problems are readily evident:

  • All DB-derived types are assumed to live in one namespace. In Postgres, you have the potential for multiple namespaces (which naturally arise from each distinct database connection string, or even from schemas nested inside of a database). This could lead to name clashes if, for example, identically named types are declared in different databases or schemas. (It seems like a good solution is to recommend strongly that different invocations of infer_schema! use different extra_types_module settings, but I'd prefer a cleaner design where remembering to do that sort of thing isn't necessary.)
  • The binding between a DB type and a Rust type is implicit (based on type name alone). This loose coupling could lead to bugs whenever there are name clashes.
  • It is quite conceivable that a user will want to use types that come from disparate Rust namespaces, at which point they will have to do a lot of tedious and potentially confusing re-exporting with pub use (so that everything is in one namespace), or we will have to extend this mechanism to support having multiple extra type paths.

This could be solved by creating an intermediate representation of types in a database schema and mapping DB types to Rust types through that. This would require threading some sort of global table state through most of the code generation process, so the effects of adopting this design would be felt throughout a lot of Diesel. (Perhaps simply having a global shared structure would be adequate, since it will only be used at compilation time.)

Having a central, explicit set of data structures that describe types, tables, views, procedures, etc., could be quite useful. I suspect that some sort of compile-time metadata model would be very helpful for validating what is read from the database ("table has a primary key structure we don't support"), generating good error messages ("typename foobaz is not a valid Rust identifier"), debugging/testing (write tests against the intermediate representation code), streamlining code generation (all necessary state is in one obvious place), and error reporting (have richer local state from which to produce meaningful error messages).

This approach could go very wrong by being too top-heavy and prematurely putting straightjacketing Diesel. The project has gotten really far without building a top-down intermediate model of database components, and maybe it's best to keep it that way for now. It would be unpleasant to have to refactor a big metadata layer every time you want to support a new DB feature.

If it seems a good idea, I suggest working towards merging just the enum declaration part of this PR and opening a separate issue to discuss whether a metadata model is necessary or a good idea. That discussion may yield solutions to the problems I've described here which can be used to add DB type support to infer_schema! without having to commit to a great deal more overhead.

@gsollazzo
Copy link

Hi!
When can we expect to see this PR merged in master?
Thanks

@dstu
Copy link
Contributor Author

dstu commented Feb 17, 2017

I apologize for not having been on top of this. I may be able to put some more time into it this weekend. I still need to rebase on top of the latest Diesel release and restrict this PR's scope to consist only of generating types for Postgres enums.

@dstu
Copy link
Contributor Author

dstu commented Mar 13, 2017

After looking at how far diesel has progressed since I wrote this PR, I have concluded that it must be rewritten. The basic approach (query Postgres catalog tables to get enum definitions) is still perfectly valid, but diesel's switch to using SQL schema standard means that the way this is done has to change. I am closing this PR and will move discussion back to #343.

@dstu dstu closed this Mar 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants