-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better exception when accessing two non-XA datasources in a transaction #40431
Comments
I assume that Agroal is swallowing the exception so just report that as the root cause, ie changing the JTA enlistment semantics is not the correct approach. Note that there is an open PR to allow mulitple XA unaware resources (#40365) . |
@mmusgrov There is no exception from Narayana here, just a method returning a boolean:
I can't, see above :)
Yes I'm aware of that, having been involved quite a bit xD As we discussed though, this is a temporary solution. We need to think of longer term solutions, and providing clear exception messages when someone tries to do something incorrect is the first step. |
So can we enable arjuna logging (unless there are already warnings) to see why the enlist call returns false? |
If "Agroal 2.3+ now makes it possible to detect that an application is accessing two non-XA datasources in a transaction" why does it still attempt the enlistment? |
I need more information in order to provide a better analysis of why the enlist is failing. |
@shawkins and @yrodiere Also note that in the multiple one phase aware resources case, if the first one phase aware resource commits and the second one rolls back then overall result reported back to the caller is rollback even though the first one committed (they ought to get a heuristic outcome) and I am fixing that at the moment (I'm on a national holiday today so it may have to wait until tomorrow to be completed). Since we never anticipated use of multiple one phase resources I am busy testing it now and writing test cases where I find deficiencies. |
It "makes it possible to detect" by implementing that
Good to know, thanks. And this all definitely can wait until tomorrow (or later) as far as I'm concerned, so don't feel forced to work today :) |
So is the test adding the two XA unaware resources with the allowUnsafeMultipleLastResources property implemented and set? If so then that is probably enough for me to reproduce the problem. But if the test is doing something else then I need more information to determine why the second resource is being rejected. |
@mmusgrov Yes with allowUnsafeMultipleLastResources not enabled, then the second Connection access like this shawkins/quarkus-quickstarts@014e306#diff-761c636458b29cbabf9e4f4b500472088e222e1310327cade67b7db30a8febb0R25 will result in the exception in the description. The AddOutcome is AR_DUPLICATE, but the calling logic is treating that the same as AR_REJECTED - https://github.com/jbosstm/narayana/blob/ae4755680f1e5dc7988f8525d1352d24a8adfffd/ArjunaJTA/jta/classes/com/arjuna/ats/internal/jta/transaction/arjunacore/TransactionImple.java#L638 |
You don't need This is the exact problem we've been discussing for the past week(s), but I'm talking about a different improvement in that area:
Just access two Agroal, non-XA datasources in the same transaction, in Quarkus 3.8.2 or 3.9.latest or 3.10.latest or Quarkus main, and you'll get this exception. |
Thanks @shawkins , I'll look into creating a narayana unit test to duplicate this behaviour. |
I get this same error when pointing Keycloak 24.0.4 at an empty MSSQL version 15 or 16 database, it fails after creating all the tables and starts to populate them. c:\DCX5\keycloak-24.0.4\bin>kc.bat start-dev --db mssql --db-url "jdbc:sqlserver://localhost:1433;databaseName=dcx5_keycloak;trustServerCertificate=false;encrypt=false" --db-username sa --db-password XXXXXX UPDATE SUMMARY
|
nm i fixed it by adding non-xa true to my start command c:\DCX5\keycloak-24.0.4\bin>kc.bat start-dev --db mssql --db-url "jdbc:sqlserver://localhost:1433;databaseName=dcx5_keycloak;trustServerCertificate=false;encrypt=false" --db-username sa --db-password XXXXXX --transaction-xa-enabled false |
@yrodiere and @shawkins Changing the exception message is quite invasive and impacts existing users so, strictly speaking, the original message would need deprecation first. How about the following explicit log message instead:
|
That would be in addition to the warning issued during start up if the allowUnsafeMultipleLastResources property is set. |
@mmusgrov instead of duplicate, could it say multiple? |
I don't understand why we're talking merely about changing a message. Once again, the more important obstacle is that the API between Agroal and Narayana is a method returning a boolean to indicate success/failure, which doesn't allow proper exception propagation. And, critically, doesn't allow Agroal to wrap Narayana's failure with its own, adding context (such as the fact that the "resource" we're talking about is a Now, introducing a new method might require deprecating the old one, sure (though that's not strictly required). Changing the behavior of the existing API, similarly, might require introducing a configuration flag defaulting to the old value, with some deprecation to warn about future changes. I don't see how any of that is blocking anything, though.
IMO such a log is only useful as a quick, temporary stopgap if further work needs more time. As a user, getting an exception with little information and having to look at the logs to understand the exception is poor user experience, be it only because logs are easy to miss. If you want to go that way, FWIW I'd explain the problem and potential solutions a bit more:
Phrasing probably needs improvement, but I think this information needs to be there. |
Where programmatic exception handling can be used for runtime resolution of a problem, then the exception approach has some advantages. Here however the exception indicates a logic error rather than an environmental one and needs to be resolved at development time, not runtime. The only thing that code noticing the problem can reasonably do, is log the exception. So either narayana generates a log message (it already does) or something else catches the exception and then does. Either way, the end result is developers needing to look at the log. That log may be written to console in dev mode to slide the WARN log messages or stack traces under the developers nose, but its still just a log. Whilst the JTA spec that says how enlistResource works does leave some room for maneuver here, there is little to be gained by making behavior changes that risk breaking existing uses of the standard API.
Narayana did not fail. It correctly detected and prevented an unsafe situation that could have resulted in data loss. It's functioning as intended. TL;DR: Improved usability is a good thing, but changing API behavior is not. Fortunately, one does appear to require the other on this occasion. |
You don't necessarily need to change that behavior, or even Jakarta Transaction. You could simply expose a superset of the Jakarta Transaction APIs in Narayana. After all, the Agroal integration is already specific to Narayana.
The reason for the inability to add a resource. All Agroal has is an How in the world is Agroal supposed to go from here to giving an actionable message to users such as "cannot register two non-XA resources in the same transaction; enable XA for this datasource or use two separate transactions"?
Context is everything. A single exception shows you the problem, where it happened and (if it's worded correctly) how to address it. With two log lines, on the other hand, you split this information in two places that are not necessarily going to be easy to reconcile:
Logs are no way to report problems of that kind of importance to end users. This is just a workaround for an API unsuitable to Agroal's needs. It's fine if you don't want/can't change it, but let's not pretend everything is alright here. |
Really? Historically it was specific to the tx component integration SPI, which allowed it to share work with wildfly. Which bits are coupling directly against narayana? Or is the integration SPI now considered part of narayana as it's the only supported implementation?
Fair. I think your best bet here is just to assume that if you enlist a LastResource and get false, it's because there already is another one enlisted. There are very few other reasons that will ever cause an enlistment problem for a LastResource since its start call won't go over the network. I'd rather deal with the vanishingly small number of false positives than change the API to get to 100% certainty on the reason. I don't recall seeing one with any other cause in many years, but if you're worried then keep track of which LastResource a tx has by using TransactionSynchronizationRegistry.putResource/getResource
see above, but: narayana already does that and your argument seems to be they don't read the messages anyhow :-)
Indeed. What information do you need that isn't already in the one log line that ARJUNA12140 writes?
There is nothing else. Exceptions are just just log entries that happen to include a stack trace. Those frames that are within arjuna code don't provide any useful context to users. The end user only needs the exception API if they are going to try to handle it programmatically, which they aren't.
I'm not. JTA is a mess of an API, because it's a kludge over XA's C centric conventions. But I'm also far from convinced that changing the API for this brings enough, or indeed any, benefit to justify the effort. There are definitely improvements that could be made, but we can get something good enough without going that far. If we're going to start making tx API changes, there are at least three on my list that are far more impactful than this one.
I've been doing tx support for two decades, it's going to take quite a lot to surprise me. Seeing a material drop in support load from this change may - there just isn't the evidence it is causing enough problems to be worth the effort to implement a new API. Much quicker changes, like tweaking the existing agroal exception text to include more of the context that's already available but which it's not currently using, would give most of the benefit whilst leaving most engineering time for higher priority issues. |
@yrodiere FYI We do actually indicate why the transaction rolls back when it is asked to commit. If you look at exception (exception.getCause()), although the reason string could be improved (I'll raise a Narayana PR to do that). Did you paste all of the exception trace? You should see something like:
|
I have no idea of the history behind this, I've never even contributed to Agroal. But I can point to the integration, which is specific to Narayana (and as far as Quarkus is concerned, I'm happy it is): https://github.com/agroal/agroal/blob/30465b6eaae28366449832e1ee22bbe8a9b83562/agroal-narayana/src/main/java/io/agroal/narayana/NarayanaTransactionIntegration.java#L22-L25
The stacktrace. What part of the user code is triggering the issue. Though I suppose you could include it in the warning log, however unusual that is.
Alright, then the problem is priority/bandwidth -- that I can completely understand.
Alright, then @barreiro our best chance at solving this would be to change the Agroal message to something like below; would that be possible?
|
I didn't even know The thing is... if Agroal cannot enlist a resource, it's faced with a situation it cannot recover from, so it throws an exception. And I suspect that blocks this See the stacktrace from #39283 , there is no mention of
We could try to look into ways of leveraging suppressed exceptions to report both to the user, but considering the exception from Narayana just reports a stacktrace we already have, without a detailed message like "you can't add two non-XA resources to a single transaction", I'm not sure that's worth the trouble. |
If you look at the cause of the exception (e.getCause()) then setRollbackOnly will appear in the backtrace. |
Our exception includes the root cause and any exception that Agroal generates should maintain the root cause. I can look into including a more detailed reason message in the exception's root cause. |
#40779 introduces a better error message (see agroal/agroal@6c62f82) |
The narayana internal frames won't mean anything to them. Their own code's frames are there anyhow in the exception that agroal already throws.
That actually is the deferred exception support at work, just in a different way. When commit is called, the transaction system MUST complete termination of the transaction before returning control. That is, it can't stop half way through terminating the tx and throw an exception. So if there is a problem, such as the pre-commit flush failing as here, then it catches and stashes that exception, goes on to finish the job by rolling back, then attaches the exception to the RollbackException it throws at the end. The same pattern can be used when a user explicitly calls setRollbackOnly rather than the tx system catching an exception - the stack at the point where the user or library called setRollbackOnly is stashed and thrown as cause if the user later tries to call commit. We could do that internally - catch the hibernate exception and call setRollbackOnly, relying on it to do the stack capture, but then the cause would rooted in the exception handler, not in hibernate. Nothing would stop us stashing some more context when we internally call setRollbackOnly before returning false from enlistResource, but that would help the tx rollback message, not the one that's thrown by agroal. A slightly kludgy workaround would be to have enlistResource stash the offending LastResource instances on the tx context, from which agroal could retrieve them by tsr.getResource. |
We're going in circles... Yes the stacktrace with user frames is in the Agroal exception, but that exception has no information about what failed exactly. Anyway. Let's settle for the hardcoded message introduced in #40779 / agroal/agroal@6c62f82 , which is similar to what you suggested. It's probably the best compromise right now considering adding new (throwing) methods in Narayana is not an option. Closing, thanks. |
See also #41090 |
@yrodiere FYI we added a specific arjuna logging code for the message that get's reported in the rollback exception, notice that I took onboard your recommended text. |
We did that in narayana pr jbosstm/narayana#2256 |
Got it @mmusgrov , thanks! |
Description
#39283 revealed that Agroal 2.3+ now makes it possible to detect that an application is accessing two non-XA datasources in a transaction, which is unsafe.
When detected, we get an exception looking like this:
The root cause is really not helpful, because it doesn't specify why the connection couldn't be enlisted to the transaction. We should improve that message, and that means changes in Agroal.
@barreiro I think you had plans in that area; could you please link the corresponding Agroal Jira?
Implementation ideas
Unfortunately
Transaction.enlistResource
only returns a boolean to state whether enlistment worked or not.So here are our only solutions:
enlistResource
which throws an exception on failure, with details regarding what failed exactly ("Can't add two non-XA resources")The text was updated successfully, but these errors were encountered: