Default to the `pooled-lo` optimizer in Hibernate ORM #31899

yrodiere · 2023-03-16T12:01:05Z

Description

To be considered for Quarkus 3 as a way to address the infamous backwards-incompatible sequence changes in Hibernate ORM (see #31521)

Hibernate ORM defaults to the pooled optimizer, which assumes the pool of IDs it can use is between <sequence value >-<increment size>+1 and <sequence value>.

It works, but has a significant downside: every time you restart a sequence manually (native SQL, import script) you have to take care to set the value to max(id) + <increment size>, otherwise Hibernate ORM will use incorrect IDs. E.g. if you your max id is 2 and you restart your sequence to 3, next time Hibernate ORM retrieves the sequence value it will assign IDs between 3-<increment size>+1 and 3, which, with the default increment size of 50, leads to Hibernate ORM starting with negative IDs (-46, -47, ...) and eventually coming all the way back to already assigned IDs, i.e. 1, 2.

So, that can be surprising, and hence it's not a great default.

Hibernate ORM provides another optimizer, pooled-lo. That optimizer will assume the pool of IDs it can use is between <sequence value> and <sequence value>+<increment size>-1... which removes the problem mentioned above: if you restart your sequence to 3, Hibernate ORM will use IDs 3 to 52, which is exactly what you'd naively assume.

Interestingly, switching from the pooled optimizer to the pooled-lo optimizer... is a backwards-compatible change! If the pooled optimizer leaves the sequence with e.g. value 151 (intending to use values 101 to 151 in the next pool), then upon restarting after changing the optimizer, the pooled-lo optimizer will simply skip those values and use values 151 to 200 instead.

So, we could consider making the pooled-lo optimizer the default in Quarkus. The only problem would be if other actors hit the database and expect the pooled optimizer to be used; they might end up using IDs that the pooled-lo optimizer already used. We might consider this acceptable, since:

Those collisions will lead to immediate error messages, assuming the DB has integrity constraints on primary keys (all DBs do, right? right?)
People in this situation are probably advanced users who can debug this.

See https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#identifiers-generators-optimizer for documentation about Hibernate ORM optimizers.

Implementation ideas

No response

The text was updated successfully, but these errors were encountered:

quarkus-bot · 2023-03-16T12:01:09Z

/cc @Sanne (hibernate-orm), @gsmet (hibernate-orm)

yrodiere · 2023-03-16T12:05:46Z

@Sanne , @gsmet , see above. Any opinion on this change? It's not completely safe but I think it would be a net gain for new users and demos, and not particularly worse for existing users and complex applications.

@sebersole stated some time ago that pooled-lo could have been a better default for Hibernate ORM, and I agree with him, but I don't think we can reasonably change that in ORM right now. In Quarkus, though... it may be more reasonable.

gsmet · 2023-03-16T12:11:26Z

My question about pooled-lo is that anytime you start your app, you skip 50 items, right?

Sanne · 2023-03-16T12:35:04Z

I think it's a reasonable idea. Also, the whole point of using "generated ids" is that the business logic shouldn't really care which IDs are actually being generated; so if there's artificial some additional gaps of numbers that go unused that's totally fine, especially as we're not wasting significant ranges of numbers from the available space of ids.

As an alternative, if this plan wouldn't work for some reason, also bear in mind that the increment of the sequence is set to the range of the optimizer ORM is using and this is validated by the schema tools; so even just recommending users to do add a single "nextvalue" instruction in their scripts also avoids the users needing to do any math - the unsolved problem is awareness about this being necessary.

Speaking of awareness - wondering if ORM couldn't validate ranges on boot, as we know which tables are sourcing their ids from each generator; but that's not trivial and not something I'd recommend looking into right now.

yrodiere · 2023-03-16T12:49:59Z

My question about pooled-lo is that anytime you start your app, you skip 50 items, right?

Only on the first start after switching from pooled to pooled-lo. I don't see any reason it would skip items on the next starts.

That is, except the fact that you essentially "lose" the current content of the pool when you stop an application, but that's true for all optimizers. As Sanne said, gaps in entity IDs are something people are expected to live with when using optimizers. Not something frequent, but something that happens.

the unsolved problem is awareness about this being necessary.

Right that's the whole point of this proposal. Wrong increment sizes are pretty much dealt with; the remaining problem is correctly (re-)initializing sequence values, and that's not just a migration problem, it will potentially affect anyone starting a new application.

wondering if ORM couldn't validate ranges on boot, as we know which tables are sourcing their ids from each generator; but that's not trivial and not something I'd recommend looking into right now.

Agreed on both points: would be nice, but now's not a good time. I suspect that would opens up a whole new range of problems since I don't think the schema tools look at entity data at all at the moment.

sebersole · 2023-03-16T12:58:18Z

I have indeed said that - pooled-lo should really be the default, in hindsight.

I did want to chime in about the "schema validation" bit. So far, validation has been limited to checking the actual schema. What you are suggesting here goes beyond that into validating some of the data contained in that schema. Different thing. It is on par with, e.g., validating that columns mapped to enums contain only valid values.

Not saying that does not have value or merit, but its a whole new thing entirely

gsmet · 2023-03-16T13:32:36Z

OK, then I think using pooled-lo as the default for Quarkus makes sense.

Sanne · 2023-03-16T14:10:09Z

I did want to chime in about the "schema validation" bit. So far, validation has been limited to checking the actual schema. What you are suggesting here goes beyond that into validating some of the data contained in that schema. Different thing. It is on par with, e.g., validating that columns mapped to enums contain only valid values.

Yes, absolutely: I might have been unclear:

ORM does validate the schema, including it to define exactly the expected increment
We could leverage the knowledge sourced from schema validation to additionally validate used ranges

It's definitely two separate items, the first one an existing feature of the schema validator, the second being wishful thinking I'm just pointing out to consider in the future. Sorry for the tangent.

yrodiere added kind/enhancement New feature or request triage/quarkus-3 labels Mar 16, 2023

quarkus-bot bot added area/hibernate-orm Hibernate ORM area/persistence OBSOLETE, DO NOT USE labels Mar 16, 2023

quarkus-bot bot moved this to Todo in Quarkus Roadmap/Planning Mar 16, 2023

quarkus-bot bot added this to Quarkus Roadmap/Planning Mar 16, 2023

yrodiere mentioned this issue Mar 17, 2023

Allow configuring the default ID optimizer and default to pooled-lo #31947

Merged

gsmet closed this as completed in #31947 Mar 20, 2023

github-project-automation bot moved this from Todo to Done in Quarkus Roadmap/Planning Mar 20, 2023

quarkus-bot bot added this to the 3.0 - main milestone Mar 20, 2023

yrodiere mentioned this issue Mar 21, 2023

Hibernate ORM - Document how to handle sequences in import.sql #31521

Closed

michalvavrik mentioned this issue Mar 21, 2023

Fix MultiplePersistenceUnitTest by removing workaround that adds a gap of 50 as pool optimizer now uses pooled-lo strategy quarkus-qe/beefy-scenarios#375

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to the `pooled-lo` optimizer in Hibernate ORM #31899

Default to the `pooled-lo` optimizer in Hibernate ORM #31899

yrodiere commented Mar 16, 2023 •

edited

Loading

quarkus-bot bot commented Mar 16, 2023

yrodiere commented Mar 16, 2023

gsmet commented Mar 16, 2023

Sanne commented Mar 16, 2023

yrodiere commented Mar 16, 2023

sebersole commented Mar 16, 2023

gsmet commented Mar 16, 2023

Sanne commented Mar 16, 2023

Default to the pooled-lo optimizer in Hibernate ORM #31899

Default to the pooled-lo optimizer in Hibernate ORM #31899

Comments

yrodiere commented Mar 16, 2023 • edited Loading

Description

Implementation ideas

quarkus-bot bot commented Mar 16, 2023

yrodiere commented Mar 16, 2023

gsmet commented Mar 16, 2023

Sanne commented Mar 16, 2023

yrodiere commented Mar 16, 2023

sebersole commented Mar 16, 2023

gsmet commented Mar 16, 2023

Sanne commented Mar 16, 2023

Default to the `pooled-lo` optimizer in Hibernate ORM #31899

Default to the `pooled-lo` optimizer in Hibernate ORM #31899

yrodiere commented Mar 16, 2023 •

edited

Loading