Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use lambdas for transaction scoping; Don't reload objects after commit #552

Merged
merged 2 commits into from
May 5, 2024

Conversation

nscuro
Copy link
Collaborator

@nscuro nscuro commented Apr 20, 2024

The way transactions were previously handled made it impossible to compose multiple transactional operations into larger tasks. PersistenceManager#currentTransaction()#begin() fails when the transaction is already active. This in turn opens the door to data inconsistencies, when parts of a task are committed but later operations fail, with no way to roll back to the original state before the task was started.

It also encouraged commits in high frequencies, which can take a toll on the database's write-ahead-log for high volume processing use-cases.

There was no handling of rollbacks, such that a failed commit could have "poisoned" an entire PersistenceManager, as no further persistence operations can be performed until after a rollback was executed.

This change introduces the Transaction class, which coordinates transaction begin, commit, and rollback. It is also capable of handling nested transactions, by joining an existing active one if required. This aspect can be controlled via propagation settings.

It also removes the need to reload objects from the database after each commit. This behavior has historically added quite a big overhead, and apparently was needed because DataNucleus transitions objects into "hollow" state (all fields except the primary key are unloaded). The unloading of fields is now disabled globally by setting the DataNucleus.RetainValues property to true.

The ScopedCustomization class was added to allow for scoped changes to a PersistenceManager, without impacting later operations on it. Usually things like adding fetch groups via pm.getFetchPlan().addGroup("foo") are intended to only affect one or a handful of queries, but in reality it affects all queries for the lifetime of the PersistenceManager. This has historically led to over-fetching. Now, it's easier to scope such things more tightly.

Further, this PR makes an effort to ensure that all Query objects are closed once consumed. Not closing query objects unnecessary leaks resources in cases where the owning PersistenceManager is not short-lived.

References:

The way transactions were previously handled made it impossible to compose multiple transactional operations into larger tasks. `PersistenceManager#currentTransaction()#begin()` fails when the transaction is already active. This in turn opens the door to data inconsistencies, when parts of a task are committed but later operations fail, with no way to roll back to the original state before the task was started.

It also encouraged commits in high frequencies, which can take a toll on the database's write-ahead-log for high volume processing use-cases.

There was no handling of rollbacks, such that a failed commit could have "poisoned" an entire `PersistenceManager`, as no further persistence operations can be performed until after a rollback was executed.

This change introduces the `Transaction` class, which coordinates transaction begin, commit, and rollback. It is also capable of handling nested transactions, by joining an existing active one if required. This aspect can be controlled via propagation settings.

It also removes the need to reload objects from the database after each commit. This behavior has historically added quite a big overhead, and apparently was needed because DataNucleus transitions objects into "hollow" state (all fields except the primary key are unloaded). The unloading of fields is now disabled globally by setting the `DataNucleus.RetainValues` property to `true`.

The `ScopedCustomization` class was added to allow for scoped changes to a `PersistenceManager`, without impacting later operations on it. Usually things like adding fetch groups via `pm.getFetchPlan().addGroup("foo")` are intended to only affect one or a handful of queries, but in reality it affects all queries for the lifetime of the `PersistenceManager`. This has historically led to over-fetching. Now, it's easier to scope such things more tightly.

Further, this PR makes an effort to ensure that all `Query` objects are closed once consumed. Not closing query objects unnecessary leaks resources in cases where the owning `PersistenceManager` is **not** short-lived.

References:

* https://www.datanucleus.org/products/accessplatform_6_0/jdo/persistence.html#lifecycle
* https://www.datanucleus.org/products/accessplatform_6_0/jdo/query.html#_closing_a_query
* https://www.datanucleus.org/products/accessplatform_6_0/jdo/persistence.html#transaction_local
* https://github.com/search?q=repo%3AGoogleCloudPlatform%2Fdatanucleus-appengine%20retainvalues&type=code

Signed-off-by: nscuro <nscuro@protonmail.com>
nscuro added a commit to nscuro/dependency-track that referenced this pull request Apr 20, 2024
The biggest impact is that some tests now require eviction of the L1 cache to see changes made in the functionality they're testing.

stevespringett/Alpine#552
Signed-off-by: nscuro <nscuro@protonmail.com>
@stevespringett stevespringett merged commit 84bd9dc into stevespringett:master May 5, 2024
2 checks passed
@nscuro nscuro deleted the transactions branch May 6, 2024 12:08
nscuro added a commit to nscuro/Alpine that referenced this pull request May 7, 2024
The `getCount` methods were refactored to not modify the original `Query` object in stevespringett#552. Neither JDO nor DataNucleus offer a mechanism to copy a `Query` object 1:1, hence a new object was created instead.

This however broke `getCount` for queries that make use of subqueries (and potentially other advanced features).

This change reverts it back to temporarily modify the query object again.

Signed-off-by: nscuro <nscuro@protonmail.com>
nscuro added a commit to nscuro/dependency-track that referenced this pull request May 10, 2024
This is to properly support nested transactions as introduced in stevespringett/Alpine#552.

Signed-off-by: nscuro <nscuro@protonmail.com>
MM-msr pushed a commit to MM-msr/dependency-track that referenced this pull request Jun 18, 2024
This is to properly support nested transactions as introduced in stevespringett/Alpine#552.

Signed-off-by: nscuro <nscuro@protonmail.com>
nscuro added a commit to DependencyTrack/hyades-apiserver that referenced this pull request Jul 31, 2024
This is to properly support nested transactions as introduced in stevespringett/Alpine#552

Ports DependencyTrack/dependency-track#3711 from Dependency-Track v4.12.0-SNAPSHOT.

Signed-off-by: nscuro <nscuro@protonmail.com>
nscuro added a commit to DependencyTrack/hyades-apiserver that referenced this pull request Jul 31, 2024
This is to properly support nested transactions as introduced in stevespringett/Alpine#552

Ports DependencyTrack/dependency-track#3711 from Dependency-Track v4.12.0-SNAPSHOT.

Signed-off-by: nscuro <nscuro@protonmail.com>
netomi pushed a commit to netomi/dependency-track that referenced this pull request Aug 8, 2024
This is to properly support nested transactions as introduced in stevespringett/Alpine#552.

Signed-off-by: nscuro <nscuro@protonmail.com>
nscuro added a commit to nscuro/dependency-track that referenced this pull request Dec 8, 2024
Fields are no longer unloaded when a transaction commits (`DataNucleus.RetainValues` is enabled globally), as of stevespringett/Alpine#552.

Signed-off-by: nscuro <nscuro@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants