Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for insert into tbl1 from select * from tbl2 queries #1106

Closed
lancecarlson opened this issue Aug 15, 2017 · 6 comments · Fixed by #1496
Closed

Support for insert into tbl1 from select * from tbl2 queries #1106

lancecarlson opened this issue Aug 15, 2017 · 6 comments · Fixed by #1496
Labels
Milestone

Comments

@lancecarlson
Copy link
Contributor

Is this something diesel would/should support? (Correct me if Diesel already supports this)

@lancecarlson
Copy link
Contributor Author

Thinking this through, I would imagine the syntax might look like this:

diesel::insert(diesel::select_table).into(diesel::insert_into_table);

I'm not sure that the dsl adds anything over the SQL unless it can generate the columns from the struct because (at least in postgres) you have to cast the select output into a table with an explicit column manifest/schema.

@killercup
Copy link
Member

If the tables have the exact same columns, you could try

diesel::insert(table1::table.select(table1::all_columns)).into(table2::table)

@sgrif
Copy link
Member

sgrif commented Dec 16, 2017

Update: 73778af0 was very much in service of this eventual feature. Any version of this that lands will require explicitly listing the columns. e.g. We will not support INSERT INTO table SELECT ... but we will support INSERT INTO table (col1, col2) SELECT ....

I would love to see ideas on concrete API proposals for this. Right now the best I've got is:

insert_into(table1)
    .from_select(
        (col1, col2, col3),
        table2.select(...).etc,
    )

I think we can do better though. In particular, I don't like the amount of rightward drift there. It reminds me very much of our old on_conflict function from 0.16, which we deprecated in 0.99 for the same ergonomic reasons

@sgrif sgrif added this to the 1.2 milestone Jan 15, 2018
@sgrif
Copy link
Member

sgrif commented Jan 15, 2018

I think I'm going to go with this API:

insert_into(table1)
    .from_select(table2.select(...))
    .into_columns((col1, col2, col3))

The order is backwards from what you'd write in SQL, but it lets us default to table1::all_columns, and any method that allowed you to specify the columns doesn't make sense unless you know that you're inserting from a select statement.

@sgrif
Copy link
Member

sgrif commented Jan 16, 2018

So I prototyped this out, and found that the select statement generally was so long we had the same rightward drift problem. You can work around it by sticking stuff in locals, but then the variable is awkward to name and really "you can work around this crappy API" doesn't make it acceptable. @weiznich had a really good idea in the gitter room though. select_statement.insert_into(table).columns(col_list), which is even more backwards from the SQL, but definitely lets you insert line breaks where it's natural. TBH this is also pretty close to how I read these queries anyway, since the select statement is by far the most important part.

I'm going to play around with it some more, but I think I want to go with that API. The main issue will be making it discoverable. It'll have to end up being a trait that's implemented on BoxedSelectStatement, SelectStatement, and Table rather than an inherent method on IncompleteInsertStatement. I suppose linking to it from the docs, and adding a section to the insert guide will probably be sufficient.

sgrif added a commit that referenced this issue Jan 17, 2018
This is a long overdue refactoring. I'm going to do a similar
refactoring for `InsertValues` as part of #1106, but this shares a lot
of groundwork for that and is much smaller in scope. `Changeset` has
always been entirely redundant with `QueryFragment`, but we had to keep
it as a separate trait for two places where the implementation changed:

- `Eq` doesn't qualify the column name
- tuples skip commas when `is_noop` returned true (for `Changeset` this
  was only the case for `None` or a tuple where all elements were `None`)

The first one is easy to solve, we just return a type other than `Eq` in
`AsChangeset` (there was very little reason to use `Eq` in the first
place. We weren't able to re-use any of its code.)

The second one is a bit trickier, since we need to replicate the
`is_noop` method in `QueryFragment`. I'm fine with doing this, since
`InsertValues` has the same method (and tuples implement `walk_ast`
identically for `InsertValues` and `Changeset`), so it makes sense to
share them.

Originally I added an explicit `noop` method to `AstPass` which we would
call. However, it felt weird that the impl for `()` which literally just
does `Ok(())` didn't call `noop`. At that point I realized I could just
generalize this to "did a method that generates SQL get called?" This
works fine for updates. I'm not 100% sure if it'll work for inserts, but
it's worth a shot.

I should note that the semantics of the new `is_noop` are slightly
different than the one on `InsertValues`, which explicitly states that
an empty array will return `false`. This will make it return `true` when
we switch inserts to use this. Since we already have to explicitly
handle empty arrays long before we start checking `is_noop`, I think
that's fine.
@sgrif
Copy link
Member

sgrif commented Jan 18, 2018

I ended up generalizing the API. The only API that is being explicitly added for this feature is the into_columns method, which only works for insert from select. Otherwise, passing a select statement to values will now "just work", and any query that can be written as insert_into(table).values(values) can instead be written as values.insert_into(table)

sgrif added a commit that referenced this issue Jan 18, 2018
This feature has been in the works for a very long time, and has a lot
of context... I've added headers so if you already know about the
iteration of the API and the evolution of `InsertStatement` internally,
skip to the third section.

Getting to this API
===

I'd like to give a bit of context on the APIs that have been considered,
and how I landed on this one.

To preface, all of the iterations around this have been trying to
carefully balance three things:

- Easy to discover in the API
- Being syntactically close to the generated SQL
- Avoiding rightward drift

For most of Diesel's life, our API was `insert(values).into(table)`.
That API was originally introduced in 0.2 "to mirror `update` and
`delete` (it didn't mirror them. It was backwards. It's always been
backwards).

My main concern with the old API actually was related to this feature.
I couldn't come up with a decent API that had you specify the column
list (you basically always need to specify the column list for this
feature).

So in 0.99 we changed it to what we have now, and I had toyed around
with `insert_into(table).columns(columns).values(select)`, as well as
`insert_into(table).from_select(columns, select)`. I was leaning towards
the second one for a while (I didn't realize at the time that it was
exactly SQLAlchemy's API). I hated the `columns` method because it was
unclear what it was doing until you saw you were inserting from a select
statement. It also implied an interaction with tuples that didn't exist.

However, another thing that happened in 0.99 was that we deprecated our
old upsert API. The `from_select` form reminded me far too much of the
old `on_conflict` API, which we had just removed. In practice what that
API would give you was something like this:

```rust
insert_into(posts::table)
    .from_select(
        (posts::user_id, posts::title),
        users::table
            .select((
                users::id,
                users::name.concat("'s First Post"),
            )),
    )
```

That's just far too much rightward drift for me. Yes, you can assign the
args to local variables, but they're awkward to name and now your code
reads weird. I thought moving the columns list to another method call
would help, but it doesn't.

```rust
insert_into(posts::table)
    .from_select(
        users::table
            .select((
                users::id,
                users::name.concat("'s First Post"),
            )),
    )
    .into_columns((posts::user_id, posts::title))
```

Eventually a member of the Diesel team had the idea of making this an
`insert_into` method on select statements. This solves all of my
concerns around rightward drift, though at the cost of the other two
priorities. The API now looked like this:

```
users::table
    .select((
        users::id,
        users::name.concat("'s First Post"),
    ))
    .insert_into(posts::table)
    .into_columns((posts::user_id, posts::title))
```

I liked the way the code flowed, but I had concerns around
discoverability, and didn't like how backwards it was from SQL. But I
could live with it and started implementing.

Up until this point I had been assuming that we would have an
`InsertFromSelectStatement` struct, which was distinct from
`InsertStatement` and necessitated the differently named methods. I
realized when I started digging into it though, that we really just want
to re-use `InsertStatement` for this. It seems obvious in hindsight.

And if we were going to use that structure, that meant that it wouldn't
be much harder to just make passing `SelectStatement` to `values` work.
This automatically solves most of my concerns around discoverability,
since it now just works exactly like every other form of insert.

That said, I really don't like the rightward drift. I liked the
`.insert_into` form for being able to avoid that. But for the final
implementation, I just generalized that feature. *Anything* that can be
written as `insert_into(table).values(values)` can now be written as
`values.insert_into(table)`.

Context around InsertStatement
===

This file has churned more than just about any other part of Diesel. I
feel like I've re-written it nearly every release at this point. I think
the reason it's churned so much is for two reasons. The first is that
it's kept a fundamental design flaw through 1.1 (which I'll get to), and
we've been constantly working around it. The second is that `INSERT` is
actually one of the most complex queries in SQL. It has less variations
than `SELECT`, but far more of its variations are backend specific, or
have different syntaxes between backends.

`InsertStatement` was originally added for 0.2 in c9894b3 which has a
very helpful commit message "WIP" (good job past Sean). At the time we
only supported PG, so we actually had two versions -- `InsertStatement`
and `InsertQuery`, the latter having a returning clause. I'm honestly
not sure why I didn't do the `ReturningClause<T>` `NoReturningClause`
dance we do now and did back then in `SelectStatement`. Maybe I thought
it'd be easier?

Anyway this file had to go through some pretty major changes in 0.5 when
we added SQLite support. We needed to disallow batch insert and
returning clauses on that backend. Additionally, the way we handle
default values had to work differently on SQLite since it doesn't
support the `DEFAULT` keyword.

At this point though, it still a few things. The query starts with
`INSERT INTO`, had a columns list, then the word `VALUES` and then some
values. It also managed all parenthesis. (Yes it was trivial to make it
generate invalid SQL at that point).

Fast forward to 0.9, I decided to support batch insert on SQLite. This
needs to do one query per row, and I felt that meant we needed a
separate struct. I didn't want to have `BatchInsertStatement` and
`BatchInsertQuery` and `InsertStatement` and `InsertQuery`, so we got
the `NoReturningClause` struct, and got a little closer to how literally
every other part of Diesel works.

In 0.14 we added `insert_default_values` which left us with
`InsertStatement`, `BatchInsertStatement`, and `DefaultInsertStatement`,
which eventually went back down to the two.

The last time the file went through a big rewrite was in 0.99 when I
finally unified it down to the one struct we have today.

However, it still had the fatal flaw I mentioned earlier. It was trying
to handle too much logic, and was too disjoint from the rest of the
query builder. It assumed that the two forms were `DEFAULT VALUES` or
`(columns) VALUES ...`, and also handled parens.

We've gone through a lot of refactoring to get rid of that. I finally
think this struct is at a point where it will stop churning, mostly
because it looks like the rest of Diesel now. It doesn't do anything at
all, and the only thing it assumes is that the word `INTO` appears in
the query (which I'm pretty sure actually is true).

The actual implementation of this commit
====

With all that said, this commit is relatively simple. The main addition
is the `InsertFromSelect` struct. It's definitely a Diesel struct, in
that it basically does nothing, and all the logic is in its `where`
clauses.

I tried to make `Insertable` work roughly everywhere that methods like
`filter` work. I couldn't actually do a blanket impl for tables in Rust
itself, because it made rustc vomit on recursion with our impls on
`Option`. That shouldn't be happening, but I suspect it's a lot of work
to fix that in the language.

I've also implemented it for references, since most of the time that you
pass values to `.values`, you pass a reference. That should "just work"
here unless we have a good reason not to.

The majority of the additions here are tests. This is a feature that
fundamentally interacts with many of our most complex features, and I've
tried to be exhaustive. Theres ~3 lines of tests for every line of code
added. I've done at least a minimal test of this feature's interaction
with every other feature on inserts that I could think of. I am not so
much worried about whether they work, but I was worried about if there
was a syntactic edge case I didn't know about. There weren't. Same with
the compile tests, I've tried to think of every way the feature could be
accidentally misused (bad arguments to `into_columns` basically), as
well as all the nonsense things we should make sure don't work (putting
it in a tuple or vec).

I didn't add any compile tests on making sure that the select statement
itself is valid. The fact that the `Query` bound only matches valid
complete select statements is already very well tested, and we don't
need to re-test that here. I also did not explicitly disallow selecting
from the same table as the insert, as this appears to be one of the few
places where the table can appear twice with no ambiguity.

Fixes #1106
sgrif added a commit that referenced this issue Jan 20, 2018
This feature has been in the works for a very long time, and has a lot
of context... I've added headers so if you already know about the
iteration of the API and the evolution of `InsertStatement` internally,
skip to the third section.

Getting to this API
===

I'd like to give a bit of context on the APIs that have been considered,
and how I landed on this one.

To preface, all of the iterations around this have been trying to
carefully balance three things:

- Easy to discover in the API
- Being syntactically close to the generated SQL
- Avoiding rightward drift

For most of Diesel's life, our API was `insert(values).into(table)`.
That API was originally introduced in 0.2 "to mirror `update` and
`delete` (it didn't mirror them. It was backwards. It's always been
backwards).

My main concern with the old API actually was related to this feature.
I couldn't come up with a decent API that had you specify the column
list (you basically always need to specify the column list for this
feature).

So in 0.99 we changed it to what we have now, and I had toyed around
with `insert_into(table).columns(columns).values(select)`, as well as
`insert_into(table).from_select(columns, select)`. I was leaning towards
the second one for a while (I didn't realize at the time that it was
exactly SQLAlchemy's API). I hated the `columns` method because it was
unclear what it was doing until you saw you were inserting from a select
statement. It also implied an interaction with tuples that didn't exist.

However, another thing that happened in 0.99 was that we deprecated our
old upsert API. The `from_select` form reminded me far too much of the
old `on_conflict` API, which we had just removed. In practice what that
API would give you was something like this:

```rust
insert_into(posts::table)
    .from_select(
        (posts::user_id, posts::title),
        users::table
            .select((
                users::id,
                users::name.concat("'s First Post"),
            )),
    )
```

That's just far too much rightward drift for me. Yes, you can assign the
args to local variables, but they're awkward to name and now your code
reads weird. I thought moving the columns list to another method call
would help, but it doesn't.

```rust
insert_into(posts::table)
    .from_select(
        users::table
            .select((
                users::id,
                users::name.concat("'s First Post"),
            )),
    )
    .into_columns((posts::user_id, posts::title))
```

Eventually a member of the Diesel team had the idea of making this an
`insert_into` method on select statements. This solves all of my
concerns around rightward drift, though at the cost of the other two
priorities. The API now looked like this:

```
users::table
    .select((
        users::id,
        users::name.concat("'s First Post"),
    ))
    .insert_into(posts::table)
    .into_columns((posts::user_id, posts::title))
```

I liked the way the code flowed, but I had concerns around
discoverability, and didn't like how backwards it was from SQL. But I
could live with it and started implementing.

Up until this point I had been assuming that we would have an
`InsertFromSelectStatement` struct, which was distinct from
`InsertStatement` and necessitated the differently named methods. I
realized when I started digging into it though, that we really just want
to re-use `InsertStatement` for this. It seems obvious in hindsight.

And if we were going to use that structure, that meant that it wouldn't
be much harder to just make passing `SelectStatement` to `values` work.
This automatically solves most of my concerns around discoverability,
since it now just works exactly like every other form of insert.

That said, I really don't like the rightward drift. I liked the
`.insert_into` form for being able to avoid that. But for the final
implementation, I just generalized that feature. *Anything* that can be
written as `insert_into(table).values(values)` can now be written as
`values.insert_into(table)`.

Context around InsertStatement
===

This file has churned more than just about any other part of Diesel. I
feel like I've re-written it nearly every release at this point. I think
the reason it's churned so much is for two reasons. The first is that
it's kept a fundamental design flaw through 1.1 (which I'll get to), and
we've been constantly working around it. The second is that `INSERT` is
actually one of the most complex queries in SQL. It has less variations
than `SELECT`, but far more of its variations are backend specific, or
have different syntaxes between backends.

`InsertStatement` was originally added for 0.2 in c9894b3 which has a
very helpful commit message "WIP" (good job past Sean). At the time we
only supported PG, so we actually had two versions -- `InsertStatement`
and `InsertQuery`, the latter having a returning clause. I'm honestly
not sure why I didn't do the `ReturningClause<T>` `NoReturningClause`
dance we do now and did back then in `SelectStatement`. Maybe I thought
it'd be easier?

Anyway this file had to go through some pretty major changes in 0.5 when
we added SQLite support. We needed to disallow batch insert and
returning clauses on that backend. Additionally, the way we handle
default values had to work differently on SQLite since it doesn't
support the `DEFAULT` keyword.

At this point though, it still a few things. The query starts with
`INSERT INTO`, had a columns list, then the word `VALUES` and then some
values. It also managed all parenthesis. (Yes it was trivial to make it
generate invalid SQL at that point).

Fast forward to 0.9, I decided to support batch insert on SQLite. This
needs to do one query per row, and I felt that meant we needed a
separate struct. I didn't want to have `BatchInsertStatement` and
`BatchInsertQuery` and `InsertStatement` and `InsertQuery`, so we got
the `NoReturningClause` struct, and got a little closer to how literally
every other part of Diesel works.

In 0.14 we added `insert_default_values` which left us with
`InsertStatement`, `BatchInsertStatement`, and `DefaultInsertStatement`,
which eventually went back down to the two.

The last time the file went through a big rewrite was in 0.99 when I
finally unified it down to the one struct we have today.

However, it still had the fatal flaw I mentioned earlier. It was trying
to handle too much logic, and was too disjoint from the rest of the
query builder. It assumed that the two forms were `DEFAULT VALUES` or
`(columns) VALUES ...`, and also handled parens.

We've gone through a lot of refactoring to get rid of that. I finally
think this struct is at a point where it will stop churning, mostly
because it looks like the rest of Diesel now. It doesn't do anything at
all, and the only thing it assumes is that the word `INTO` appears in
the query (which I'm pretty sure actually is true).

The actual implementation of this commit
====

With all that said, this commit is relatively simple. The main addition
is the `InsertFromSelect` struct. It's definitely a Diesel struct, in
that it basically does nothing, and all the logic is in its `where`
clauses.

I tried to make `Insertable` work roughly everywhere that methods like
`filter` work. I couldn't actually do a blanket impl for tables in Rust
itself, because it made rustc vomit on recursion with our impls on
`Option`. That shouldn't be happening, but I suspect it's a lot of work
to fix that in the language.

I've also implemented it for references, since most of the time that you
pass values to `.values`, you pass a reference. That should "just work"
here unless we have a good reason not to.

The majority of the additions here are tests. This is a feature that
fundamentally interacts with many of our most complex features, and I've
tried to be exhaustive. Theres ~3 lines of tests for every line of code
added. I've done at least a minimal test of this feature's interaction
with every other feature on inserts that I could think of. I am not so
much worried about whether they work, but I was worried about if there
was a syntactic edge case I didn't know about. There weren't. Same with
the compile tests, I've tried to think of every way the feature could be
accidentally misused (bad arguments to `into_columns` basically), as
well as all the nonsense things we should make sure don't work (putting
it in a tuple or vec).

I didn't add any compile tests on making sure that the select statement
itself is valid. The fact that the `Query` bound only matches valid
complete select statements is already very well tested, and we don't
need to re-test that here. I also did not explicitly disallow selecting
from the same table as the insert, as this appears to be one of the few
places where the table can appear twice with no ambiguity.

Fixes #1106
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants