Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: EXPOSED-436 Allow using insert values on update with upsert() #2172

Merged
merged 3 commits into from
Aug 18, 2024

Conversation

bog-walk
Copy link
Member

@bog-walk bog-walk commented Jul 24, 2024

Description

Summary of the change:
onUpdate parameter of upsert() (and batchUpsert()) has been deprecated and replaced with a parameter that takes a lambda with an UpdateStatement as its argument, so that users have access to database-specific expression that allows to reference values that would have been used in INSERT clause:

// with conflict, on update, newly stored word will be old stored word concatenated to new number: "Word A || 9"
TestTable.upsert(
    onUpdate = { it[TestTable.word] = concat(TestTable.word, stringLiteral(" || "), insertValue(TestTable.count)) }
) {
    it[word] = "Word B"
    it[count] = 9
}

Detailed description:

  • Why:
    onUpdate currently only accepts a list of columns to include in the UPDATE clause and the values/expressions to use with them. It is not possible to refer to the values that would have been inserted had there been no conflict. This is already done implicitly if a user chooses to omit an argument for onUpdate, but the user cannot manually choose to do this themselves.
  • What:
    The onUpdate parameter is now of type UpsertBuilder.(UpdateStatement) -> Unit, so that it has access to new expression insertValue(column). This makes it possible to reference these values using database-specific syntax like EXCLUDE, VALUES(), and alias identifier notation.
  • How:
    • onUpdate property of underlying class is deprecated in old function
    • New expression insertValue() has been added, which can only be called from within an upsert statement (as its syntax is useless elsewhere)
    • To limit duplication across UpsertStatement and BatchUpsertStatement, a new interface, UpsertBuilder, is added to house all common logic. This also removes business logic from FunctionProvider, which was using duplicate code that already existed, for example, in InsertStatement.

Type of Change

Please mark the relevant options with an "X":

  • New feature
  • Documentation update

Affected databases:

  • MariaDB
  • Mysql5
  • Mysql8
  • Oracle
  • Postgres
  • SqlServer
  • H2
  • SQLite

Checklist

  • Unit tests are in place
  • The build is green (including the Detekt check)
  • All public methods affected by my PR has up to date API docs
  • Documentation for my change is up to date

Related Issues

EXPOSED-436

Comment on lines 98 to 104
/**
* Specifies that this column should be updated using the same values that would be inserted if there was
* no violation of a unique constraint in an upsert statement.
*
* @sample org.jetbrains.exposed.sql.tests.shared.dml.UpsertTests.testUpsertWithManualUpdateUsingInsertValues
*/
fun <T> Column<T>.asForInsert(): ExpressionWithColumnType<T> = AsForInsert(this, this.columnType)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always open to naming suggestions, here and elsewhere.

Most other ORMs support this syntax by calling it excluded, though I'm hesitant to do that because it's a strong bias towards PostgreSQL.
For example, in ORMs that chain for statement building, the pattern is usually something like:

upsert().set { excluded -> column = excluded.column }

// or

upsert().set(column to excluded(column))

This implementation could also be done as a regular function, instead of an extension function, if preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e5l @obabichevjb This expression will generate 4 different syntax depending on the db:
EXCLUDED.column, NEW.column, VALUES(column), or S.column.

I find Table.column.excluded() quite unclear, especially if not coming from a PostgreSQL background knowing exactly the SQL needed.

TestTable.upsert(
    onUpdate = { listOf(TestTable.word to TestTable.word.excluded()) }
)

Table.column.new() or new(Table.column) may a bit more clear and I like it, but it is still a MySQL-specific syntax. Although, MySQL would accept EXCLUDED (or any alias) as well, since the syntax is only based on the alias of the insert values (which we chose to set as NEW).

TestTable.upsert(
    onUpdate = { listOf(TestTable.word to new(TestTable.word)) }
)

Another alternative that we could consider is a variant of: Table.column.insertValue(), insertValue(Table.column), insertValueFor(Table.column), insertValueOf(Table.column).

TestTable.upsert(
    onUpdate = { listOf(TestTable.word to TestTable.word.insertValue()) }
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a draft here and GitHub removed it =( never trust browser inputs...

I had a thought that it could be better to make a function in the context of onUpdate with column argument instead of column extension function.

In this case it would be easier to find it in this context. If you know that you expect VALUES(column) in sql, you can start to type values word and get autocomplete of function with one column argument, and it will be clear what it does.

I don't like when multiple functions with different names do the same thing, but we may provide two function excluded(column) and values(column). More generic variant I can imagine now are fromInsert(), insertValue().

One another variant, I believe less, because it could be harder to implement and maintain is providing a constant (let's say EXCLUDED) that would have same type as original table. In this case it would be possible to write something like TestTable.word to EXCLUDED.word

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I have no strong preference here, but it looks like jooq and ktorm use excluded, so probably it would be better to align it with them, to make moving from those libraries (and vice versa >_<) eaiser

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree and I can switch to column argument function instead if we agree on a name.

In general, I have no strong preference here, but it looks like jooq and ktorm use excluded, so probably it would be better to align it with them, to make moving from those libraries (and vice versa >_<) eaiser

In this case, I don't see it this way, for 2 reasons:

  1. I agree it's good to align with other ORM, especially when it makes sense and the common consensus is clear and fits standard SQL (and Kotlin conventions). I don't think there's anything wrong with being different sometimes (or even setting a new standard for the others ourselves), especially if it aligns with the project's goals.
  2. They only use excluded in very specific context. jooq for example splits its upsert functionality based on database support by using different extension functions on the common insert statement builder. So only a subset of databases will use .onConflict() and only those will have access to excluded(). Other databases go the way of .onDuplicateKeyUpdate() and don't have any access to excluded(). ktorm follows similar principle with its database-specific modules and builder classes, so the class that provides excluded isn't available in any module but PG and SQLite.

I'm hoping we can come up with a generic variant to cover all db, but about having multiple functions with different names that do the same thing:

  • I agree about not liking it, but these functions would only exist within the upsert interface if that helps.

I like new() or insertValue() but I'll try to think of others.

@bog-walk bog-walk requested review from e5l and obabichevjb July 24, 2024 03:25

/**
* Appends an SQL update command for a derived table (with or without alias identifiers) to [this] QueryBuilder.
* Appends to a [queryBuilder] the SQL syntax for a column that represents the same values from the INSERT clause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the name is not clear. Let's discuss the purpose and try to find something better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the parameter position it looks better

tester.batchUpsert(
newWords,
onUpdate = {
listOf(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's breaking change anyway, would it better to make here a builder like other insert/update/upsert builders, it looks like:

onUpdate = {
  it[tester.word] = concat(tester.word, stringLiteral(" || "), tester.count.asForInsert())
  it[tester.count] = intLiteral(1).plus(tester.count.asForInsert())
}

For me it's always painfull to check if I should write in the builder this[column] = value or it[column] = value, with this change we will have also listOf(column to value) (don't know if we use such patter somewhere else)

Probably we could try to use for setters only it, and from this take only contextual methods (like asForInsert() in this context).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, to avoid breaking change, we could deprecate this method, and create another one where onUpdate is already a lambda

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to make a new builder as I already tried using the existing builder constructs and had issue with capturing the set arguments within the correct statement.

About the deprecation, could you let me know if/how it is possible? When I tried to deprecate the old and add the new with a lambda, because of the default arguments, it causes all usages to fail with overload resolution ambiguity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the point about deprecations...

Not sure that it's legal, but removing default parameter from onUpdate of old function would solve that problem:

fun <T : Table, E : Any> T.batchUpsert(
    data: Iterable<E>,
    vararg keys: Column<*>,
    onUpdate: List<Pair<Column<*>, Expression<*>>>?,
    onUpdateExclude: List<Column<*>>? = null,
    where: (SqlExpressionBuilder.() -> Op<Boolean>)? = null,
    shouldReturnGeneratedValues: Boolean = true,
    body: BatchUpsertStatement.(E) -> Unit
): List<ResultRow> {
    return batchUpsert(data.iterator(), onUpdate?.let { { it } }, onUpdateExclude, where, shouldReturnGeneratedValues, keys = keys, body = body)
}

I'm trying to understand would it break somebody, and imagination says that should not.. but I can not be 100% sure. All the calls with onUpdate should continue to call the old deprecation function. All the calls without onUpdate should start to use new one automatically... is that logic?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In merge command I was creating InsertStatement and UpdateStatement as receivers, and took the arguments from there, not sure if it's applicable here.. I still have an idea to reuse merge command for upsert if it's possible, but last time I face the problem, that I needed currentDialect in the moment it's not available yet, but I hope I'll make one more attempt later

@bog-walk bog-walk force-pushed the bog-walk/add-upsert-exclude-syntax branch from 6c70016 to 299d639 Compare August 9, 2024 02:44
Comment on lines +631 to +655
@Deprecated(
"This `batchUpsert()` with `onUpdate` parameter will be removed in future releases. " +
"Please use function `onUpdate()` in `body` lambda block instead.",
level = DeprecationLevel.WARNING
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to use ReplaceWith but this may be too complex a change. The closest I could get is with the following:

ReplaceWith("upsertReturning(keys = keys, onUpdateExclude = onUpdateExclude, where = where) { body.invoke();it.onUpdate { onUpdate.invoke() } }")

// takes this original
Items.upsert(
    onUpdate = { listOf(Items.price to Items.price.times(10.0)) }
) {
    it[id] = 1
    it[name] = "B"
    it[price] = 200.0
}

// and converts it to this
Items.upsert(
    keys = arrayOf()
) {
    it[id] = 1
    it[name] = "B"
    it[price] = 200.0
    it.onUpdate {
        listOf(price to price.times(10.0)) // getting this to convert is difficult
    }
}

Comment on lines +86 to +89
when (this) {
is UpsertStatement<*> -> updateValues.putAll(arguments)
is BatchUpsertStatement -> updateValues.putAll(arguments)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, updateValues would be a property of the interface so this could be simplified, but doing so would expose the mutable map publicly since properties can't be internal or private.

Comment on lines +131 to +132
/** Returns the expressions to be used in the update clause of an upsert statement, along with their insert column reference. */
internal fun UpsertBuilder.getUpdateExpressions(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 functions are defined as extension functions of the interface to force them to be internal only, since method's can't be.

Comment on lines +374 to +376
it.onUpdate { update ->
update[count] = intLiteral(100) times insertValue(count)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test highlights the duality of this form of onUpdate.
The inner function will act using the same pattern as an InsertStatement, for example.

So with upsert(), the statement is passed as the argument to the lambda block and the new function (like the setters above it) will only be accessible by it.onUpdate(). This leads to the implicit parameter being shadowed, which is why I've named it update here.

With batchUpsert() below, the statement is the receiver, so it's accessed as this.onUpdate() and the inner parameter doesn't need to be named to be less confusing.

@bog-walk
Copy link
Member Author

bog-walk commented Aug 9, 2024

@e5l @obabichevjb I've used the feedback and tried a different approach. Namely 2 changes:

  • Column.asForInsert() is now insertValue(column)
  • The parameter onUpdate is deprecated and replaced with the function onUpdate() that can be used inside the lambda block to set column-value assignments for the update part.
    • I thought to deprecate onUpdateExclude as well and move it to the new function like onUpdate(exclude = ?), but the logic for the 2 is completely separate so I didn't. Meaning that if an argument is passed to be excluded from the update part, then column-value assignments will never be set in the lambda (and vice versa). Because current internal logic ignores excluded columns if a specific update builder is specified by onUpdate.

Please let me know if you think this is a step in a better direction.

@obabichevjb
Copy link
Collaborator

obabichevjb commented Aug 9, 2024

@bog-walk The new variant looks much better to me.

In terms of potential improvements, I'd like to note some API moments.

Looking at the following example:

tester.upsert {
    it[id] = 1
    it[word] = "Word B"
    it[count] = 9

    it.onUpdate { update ->
        update[count] = intLiteral(100) times insertValue(count)
    }
}

In the original upsert it was easy to understand how to configure onUpdate part, because it was an upserts argument. In the new variant it could be potentially not so obvious for the user to understand where is the onUpdate part, because it's inside it parameter, that is of UpsertStatement<?> type with many other parameters. If you start to type it. IDEA will suggest many fields, and it's actually unclear, which are for the user's usage, and which are internal (I will add a screenshot).

Screenshot 2024-08-09 at 10 00 33

Also it is used in 2 ways: as setter for columns, and as an object with extra methods. It looks like it is a kind of magic map, where you can put values, and perform actions.

I don't know if it's easy to do, but probably upserts body block could be called in the context of something what would provide onUpdate methods, in this case the code could look like:

tester.upsert {
    it[id] = 1
    it[word] = "Word B"
    it[count] = 9

    onUpdate { update ->
        update[count] = intLiteral(100) times insertValue(count)
    }
}

@obabichevjb
Copy link
Collaborator

obabichevjb commented Aug 9, 2024

I also like that "insert" and "update" parts are symmetrical in terms of setting column values, so I can write something like this for example (I usually prefer explicit body arguments for outer code blocks):

tester.upsert { insert ->
    insert[id] = 1
    insert[word] = "Word B"
    insert[count] = 9

    insert.onUpdate {
        it[count] = intLiteral(100) times insertValue(count)
    }
}

But for me it looks quite asymmetrical in terms of configuring "update" inside "insert". Probably it comes from merge command, where it was possible to configure insert and update parts independently.

So if the logic that upsert is made first to make insert, and in the case of conflict to react somehow with update, the current variant is completely ok.

But just like an alternative option, the statement could look like:

tester.upsert { 
  onInsert {
    it[id] = 1
    it[word] = "Word B"
    it[count] = 9
  }
  onUpdate {
      it[count] = intLiteral(100) times insertValue(count)
  }
}


/**
* Appends an SQL update command for a derived table (with or without alias identifiers) to [this] QueryBuilder.
* Appends to a [queryBuilder] the SQL syntax for a column that represents the same values from the INSERT clause
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the parameter position it looks better

onUpdate parameter of upsert() (and batchUpsert()) currently accepts a list of
columns to include in the UPDATE clause and the values/expressions to use. It is
not possible to refer to the values that would have been inserted had there been
no conflict.

The onUpdate parameter now accepts a lambda with an UpsertStatement as its receiver,
so that it has access to new expression `asForInsert()`. This makes it possible
to reference these values using database-specific syntax like EXCLUDE, VALUES(),
and alias identifier notation.

Additional:
- To limit duplication across UpsertStatement and BatchUpsertStatement, a new
interface, UpsertBuilder, is included to house common logic. This removes business
logic from FunctionProvider for everything up to statement preparation.
Deprecate existing upsert functions and replace with variants that do not have
onUpdate parameter. This functionality is now accomplished using onUpdate() in
the upsert lambda directly, which utilizes UpdateStatement to set column-value
assignments.

Change asForInsert() to accept arguments instead of column receivers and rename
to insertValue().

Change visibility modifier of some interface functions to internal.

Update documentation with new upsert builder construct.
- Revert onUpdate class and function type to original.
- Make class property deprecation level warning
@bog-walk bog-walk force-pushed the bog-walk/add-upsert-exclude-syntax branch from b7206b3 to 945f8a2 Compare August 18, 2024 01:17
@bog-walk bog-walk merged commit 8726aa8 into main Aug 18, 2024
5 checks passed
@bog-walk bog-walk deleted the bog-walk/add-upsert-exclude-syntax branch August 18, 2024 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants