Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec UUID support #319

Merged
merged 2 commits into from
Jan 17, 2022
Merged

spec UUID support #319

merged 2 commits into from
Jan 17, 2022

Conversation

gavinking
Copy link
Contributor

Draft support for UUID.

See #151 and #152.

Thoughts?

@gavinking gavinking linked an issue May 5, 2021 that may be closed by this pull request
@gavinking
Copy link
Contributor Author

Note: I changed only orm_3_0.xsd. I'm unclear on whether this should also affect orm_2_2.xsd.

@m-reza-rahman
Copy link

This looks good to me. It is definitely a timely, relevant and cloud friendly feature. I hope this makes it into the next revision.


</xsd:documentation>
</xsd:annotation>
<xsd:restriction base="xsd:token">
<xsd:enumeration value="TABLE"/>
<xsd:enumeration value="SEQUENCE"/>
<xsd:enumeration value="IDENTITY"/>
<xsd:enumeration value="UUID"/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this needs a new schema since this is JPA 3.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, new schema is required. 2.2/3.0 (and all other already released versions) should be taken as written in the stone with no modifications allowed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I wasn't sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gavinking can you address this, please?

* primary keys for the entity by generating a RFC 4122
* Universally Unique IDentifier.
*/
UUID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a value to enum in java is always backwards incompatible change. Given the strict requirement on semantic versioning for spec projects, we would need to bump spec version to 4.0 - requires new project plan or wait till there is progress review for current version and bump the spec version during that.

Copy link
Contributor

@lukasj lukasj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the idea and the change is good and goes in the right direction. The only problem I see is the way it is done is not backwards compatible. Would it be possible to make the change in a backwards compatible way?

@gavinking
Copy link
Contributor Author

gavinking commented May 6, 2021

Would it be possible to make the change in a backwards compatible way?

Well, yes, there's two reasonable ways I can see.

The first is a minimal way, but sacrifices functionality: instead of introducing a new enum value, we could just use AUTO, and infer UUID generation based on the Java type of the field. The issues with this are:

  • that some people might like to use fields of type String to hold their UUIDs, and
  • asymmetry: UUID is an implicitly second-class citizen compared to the other generation strategies.

The second way is a much bigger change, but addresses a more significant flaw in the current design. Currently @GeneratedValue has two things:

  • a strategy, which is an enumerated value, and
  • a generator, which is a string-based name of a generator defined elsewhere.

These two things overlap in the sense that you never need to specify both of them.

The thing is that @GeneratedValue was designed before any of us had any experience at all programming with annotations, and we got it wrong:

  • There's not just three "strategies". Rather, there's an open-ended list of algorithms for generating identifiers, where providers support things beyond the spec, and users often implement their own.
  • Unfortunately, using string-based names to refer to generators is untypesafe. And this approach is quite uncomfortable in other ways. For example, I often feel like I don't have a good place to put the @SequenceGenerator annotation.

In summary, the current design just doesn't offer any truly clean way for JPA implementors and users to integrate their own custom id generation strategies. And now, apparently, it's even a problem for the JPA spec itself!

A better approach, it seems to me, would be to give @GeneratedValue a new member of type Class. This is more in line with the approach we introduced in later specs like CDI. A reference to a class is typesafe, and unlike an enum, it models an open-ended list. I'm not sure whether, for this problem, the reference would be to an annotation class, or to an interface, or to a class class—we would need to think that through. However, something approximately like the following would be reasonable, it seems to me:

  1. Add a new member to @GeneratedValue, say source or strategyClass.
  2. Introduce the types UUID and Identity to represent the built-in UUID and identity column generation strategies.
  3. Allow users to specify a type annotated @SequenceGenerator or @TableGenerator that represents their sequence or table generator.
  4. Allow users to specify a type that's interpreted by the JPA provider in some vendor-specific way, giving them a more comfortable way to integrate their own custom id generation strategies, or extensions provided by the vendor.

This solution looks perhaps a little baroque at first, but it actually solves multiple sources of discomfort in a way that's rather future-proof.

So one could write:

@Id @GeneratedValue(source=UUID.class)
String id;

Or:

@Id @GeneratedValue(source=MyIdTable.class)
long id;

....

@TableGenerator(table="my_ids", allocationSize=100)
interface MyIdTable {}

Or

@Id @GeneratedValue(source=MyCustomStategy.class)
MyId id;

(In principle it can be made more typesafe by having a marker interface for types representing id generators. Not sure if that's worth doing.)

Thoughts?

@gavinking
Copy link
Contributor Author

gavinking commented May 6, 2021

A variation of the second option, also worth considering, would be to introduce a @Generator meta-annotation, and instead of writing @GeneratedValue(source=UUID.class) or @GeneratedValue(source=MyCustomStategy.class), you would just write @UUID or @MyCustomStategy.

Actually perhaps that's better.

@rbygrave
Copy link

rbygrave commented May 6, 2021

Just a thought in case it helps regarding

The first is a minimal way, but sacrifices functionality: instead of introducing a new enum value, we could just use AUTO, and infer UUID generation based on the Java type of the field. The issues with this are:

that some people might like to use fields of type String to hold their UUIDs, and

As I see it people can already achieve this via a named custom generator. Maybe not super "nice" but it can be done.

asymmetry: UUID is an implicitly second-class citizen compared to the other generation strategies.

If I understand this correctly we are saying that because it's being implied via AUTO it isn't an equal with the other strategies that can all be defined nice and explicitly with their own enum value - yes hmmm, not ideal long term.

@gavinking
Copy link
Contributor Author

@rbygrave yes I agree it's possible, I just don't see it as a complete solution, and we would end up having to come back to it in the future.

@gavinking
Copy link
Contributor Author

gavinking commented May 6, 2021

Actually perhaps that's better.

Yeah, so scratch what I said earlier, I think it's cleaner to do it via a meta-annotation.

So we would have to:

  • introduce a new meta-annotation @Generator
  • add @UUID and @Identity, both bearing this annotation
  • weaken the annotation constraint on @SequenceGenerator and @TableGenerator to allow these annotations to be used as meta-annotations

Then users would annotate their generated fields with a generator annotation instead of @GeneratedValue. This is of course just the pattern we used everywhere in CDI. (I should have reached straight for this hammer.)

@gavinking
Copy link
Contributor Author

Excuse me, it would have to be called @GeneratedUUID or something like that to avoid the name collision. (Thinking out loud here.)

@lukasj
Copy link
Contributor

lukasj commented May 6, 2021

@UUID and @Identity => @UUIDGenerator and @IdentityGenerator to have the naming consistent with @SequenceGenerator and @TableGenerator?

I like meta-annotation approach

@gavinking
Copy link
Contributor Author

gavinking commented May 6, 2021

@lukasj Well yeah I considered that option but then didn't like it because the new annotations are describing the generated field, the old (now meta-)annotations are describing the generator itself.

I'm not sure if I'm being clear but what I mean is @UUIDGenerator UUID id reads a bit wrong because the field isn't a generator. That's why I suggested @UUIDGenerated UUID id. Do you get me?

@lukasj
Copy link
Contributor

lukasj commented May 6, 2021

Yep, I see your point and it makes sense

@sebersole
Copy link
Contributor

We are getting a split discussion now between here and the original issue...

As I mentioned there, we have 2 pieces of information we need:

  1. What is the database representation we should use
  2. How do we generate these values

(1) is relevant whether the UUID is an identifier or a simple basic value. (2) is ofc only relevant for identifiers.

IMO applying a @GeneratedValue to a UUID-typed id attribute is pretty straight-forward. And in fact we might want to strictly specify how these values get generated for portability.

Its not as straight-forward if we want to allow UUID-based generation for a String id attribute. I mean in Hibernate we already handle this, but I mean in a general way.

Personally I suggest just adding a new @UUIDGenerator annotation to be used in conjunction with @GeneratedValue. E.g.

@Id
@GeneratedValue
UUID id;
@Id
@GeneratedValue
@UUIDGenerator
UUID id;
@Id
@GeneratedValue
@UUIDGenerator
String id;

On the issue, I even suggest a new @UUID annotation to control the storage if you want specific handling on the db.

To be clear and fwiw, I am not at all against some form of Hibernate's IdentifierGenerator contract making its way in to the spec as well. That would allow much more flexibility in id values can get generated and allow users to do that in a portable way. But I also think UUID is universal enough to warrant easy solution

@lukasj
Copy link
Contributor

lukasj commented Dec 6, 2021

I think it's better to postpone this to 4.0 and have it done properly. What do you think? Also, since this is open here for quite long time already, where to merge this to make it easier for others to experiment with this? Should we branch out 3.x from master and use master for 4.0 this OR should a branch for 4.0 be created and merge changes from there to master at the right time? Note that master is protected (at least one reviewer is required, while other branches are not), so using extra branch for the time being may speed thing up a bit...

@gavinking
Copy link
Contributor Author

Well, I guess I see it like this. There's things we could do for 3.x and there's things which we could do for 4.0. Some of the stuff I've described here would not really fit in a minor release. So let's set all that aside.

What could make sense in a minor release?

  1. In principle we could add support for the most minimal thing, requiring support for @Id @GeneratedValue UUID id in 3.x. But how much value does this bring? Implementations can already support this if they want (e.g. Hibernate already does). And it would feel pretty strange that there's a GenerationType for each of the other built-in strategies, but not for UUID. And it sucks that you can't use String to represent your UUIDs. And so it feels like half a solution.
  2. The changes I proposed in the original PR are what I would think of as the "minimal thing that feels whole". They have a really low impact, are easy to implement, but put UUID generation on the same footing as the other built-in strategies. Of course, the PR adds GenerationType.UUID which in principle breaks binary compatibility.

So it's really a question for you. How inflexibly are we applying the rules around binary compatibility?

My intuition is that adding an enum value to GenerationType is a change that's going to break approximately zero programs, and it's something I wouldn't hesitate to do in a minor release of our product.

If we thought that client programs did a lot of switch (generationType), then sure, adding an enum value would be a nasty breaking change. But I don't think they do. I think the nature of how this enum is used suggests that this isn't actually a breaking change.

@gavinking
Copy link
Contributor Author

Note that nothing in the current PR precludes any future extensions that "open up" @GeneratedValue to acknowledge the notion that custom id generators are a thing that exist. We can still do that in 4. (And we shouldn't do it in 3.)

@lukasj
Copy link
Contributor

lukasj commented Dec 7, 2021

How inflexibly are we applying the rules around binary compatibility?

I'm personally open to changes having minimal impact on clients and preserving binary compatibility (which is the case here) and therefore I would allow the change in GenerationType - the only affected clients I can think of here can be some user written code generators and/or IDEs which may be iterating through values of the enum values. But it is not about me and binary compatibility...

What does @jgrassel or @dazey3 think about this?

@lukasj
Copy link
Contributor

lukasj commented Dec 7, 2021

wrt orm schema update - old versions have to stay untouched and new version - 3.1 - needs to be created

@lukasj
Copy link
Contributor

lukasj commented Dec 10, 2021

@dazey3 or @jgrassel can you share your view on inclusion of this as is in 3.1? I've already asked @Tomas-Kraus to prepare a TCK test for this ...

The persistent fields or properties of an entity may be of the following
types: Java primitive types, _java.lang.String_, other Java serializable
types (including wrappers of the primitive types, _java.math.BigInteger_,
_java.math.BigDecimal_, _java.util.UUD_, _java.util.Date_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UUID, not UUD right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice catch.

The _AUTO_ value indicates that the
persistence provider should pick an appropriate strategy for the
particular database. The _AUTO_ generation strategy may expect a
database resource to exist, or it may attempt to create one. A vendor
may provide documentation on how to create such resources in the event
that it does not support schema generation or cannot create the schema
resource at runtime.
resource at runtime. In the case of a field or property of type
_java.util.UUID_, the _AUTO_ strategy is equivalent to _UUID_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about the value of adding a UUID generator type enum value. It would only be applicable to persistent fields of type UUID, and of little value to integer types. If, for UUID types, AUTO == UUID, and with AUTO the default already for @GeneratedValue, then I would think that for the purpose of a generated value for a UUID type, just annotating it with @GeneratedValue would be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is conceptually incorrect about

@Id
@GeneratedValue
@Uuid
String id;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, along the lines of what the PR suggests

@Id
@GeneratedValue(UUID)
String id;

@jgrassel
Copy link
Member

jgrassel commented Dec 10, 2021

All of the persistent field types supported by JPA which are not (serializable) application-custom types are mapped by the JDBC specification to a SQL type, like int -> INTEGER, java.time.LocalDate -> DATE, etc. Looking at the JDBC 4.3 spec, nothing has been settled for java.util.UUID, which kind of makes the mapping a bit of a wild west exercise. Has there been any discussion about its type mapping among the JDBC standards group? Because JDBC 4.3 doesn't map the type, were there issues with standardizing a mapping that they couldn't resolve? I can see there being camps debating whether a UUID should be stored as a String, as a large numeric (if the database supports 128 bit numeric types), or as a raw binary.

I agree with Gavin's assertions about the existing @GeneratedValue annotation. It is not particularly flexible and future friendly. Adding a new field to the @GeneratedValue annotation, one of type Class (where the Class implements an Interface, or is loaded with annotations itself) would open a lot of possibilities. It also opens the door to a lot of extra complexities, like how to provide access to a persistence context's underlying JDBC resources (something an application, if they want to instrument their own version of table or sequence generation, or perhaps invoke a database stored procedure, would need), EntityManager/Persistence Context reentrancy, whether the custom generator should be able to inject CDI resources, etc. How to deal with generator failures and timeouts (like if the custom generator calls on a restful service to generate an identity and the request fails or times out).

I also wonder just what a "generic" UUID auto-generated value would look like? (we all know what identity|sequence|table-generated values for integers look like, it's just id++.) Would it just be a call to UUID.randomUUID()? I don't see that as a terribly valuable generator since that can conceivably create DuplicateKeyExceptions.

@sebersole
Copy link
Contributor

I also wonder just what a "generic" UUID auto-generated value would look like? (we all know what identity|sequence|table-generated values for integers look like, it's just id++.) Would it just be a call to UUID.randomUUID()? I don't see that as a terribly valuable generator since that can conceivably create DuplicateKeyExceptions.

People absolutely use UUID ids in the real world and there is no real good reason they cannot be generated. I think we all agree on that. So the question about how the generation is done is much more of a vendor value-add question in my opinion.

@sebersole
Copy link
Contributor

It also opens the door to a lot of extra complexities

Absolutely agree. I don't think it makes any sense for the actual generation contract to be defined by the spec (aka, having a new interface)

@jgrassel
Copy link
Member

jgrassel commented Dec 10, 2021

People absolutely use UUID ids in the real world and there is no real good reason they cannot be generated. I think we all agree on that. So the question about how the generation is done is much more of a vendor value-add question in my opinion.

Oh, I know people use UUIDs, that's not what I was talking about. It was about what kind of algorithm would generate it.
UUID.randomUUID() generates a value from the pseudo-random number generator. All of the supported generators (identity, sequence, table) pretty much guarantee that the primary key generated when the new entity is inserted into the database will not fail because a row with the same identity doesn't already exist in the database. Even though UUID's namespace is a tremendous 128 bits, a collision between two entities when a pseudo-random number generator is used is a non-zero possibility. Should that generate a PersistenceException, or generate new random value and retry?

(In essence, the core of my thought there was basically: whatever algorithm is used to generate a UUID, probably shouldn't involve invoking a random number generator)

@sebersole
Copy link
Contributor

Previously I had said

To be clear and fwiw, I am not at all against some form of Hibernate's IdentifierGenerator contract making its way in to the spec as well. That would allow much more flexibility in id values can get generated and allow users to do that in a portable way. But I also think UUID is universal enough to warrant easy solution

So to clarify, its not so much the actual how-do-i-generate-a-value method I have a concern about. It's the "supporting methods(s)" that would also be needed - how does a sequence-generator register a SEQUENCE for export? etc

But since they are tied together, I believe we should just not include one

@gavinking
Copy link
Contributor Author

I was talking about. It was about what kind of algorithm would generate it.

@jgrassel as stated in the proposed spec:

an RFC 4122 Universally Unique IDentifier

@lukasj
Copy link
Contributor

lukasj commented Jan 3, 2022

Draft of the TCK test for this change is available at jakartaee/platform-tck#793

@lukasj
Copy link
Contributor

lukasj commented Jan 12, 2022

Are we good with the current state of the proposal or is there anything pending to be resolved before merge of this?

@jgrassel
Copy link
Member

jgrassel commented Jan 12, 2022

Did you see my review comment on jakartaee/platform-tck#793 ? It's in "pending" so I'm not sure if the comment is visible.

@lukasj
Copy link
Contributor

lukasj commented Jan 12, 2022

no, your comment is not visible there. You should submit it somehow. Or just add a regular comment there

@jgrassel
Copy link
Member

Ok, I posted it as a normal comment instead of a code review comment, hope it's visible

@lukasj
Copy link
Contributor

lukasj commented Jan 12, 2022

yes, it is visible. Should that block this or is it a separate thing to resolve within TCK? I think it is the latter

@jgrassel
Copy link
Member

I wouldn't call it a blocker.

@gavinking
Copy link
Contributor Author

@lukasj sorry, I've been busy, and have not kept up with this. Is there anything you need from me here?

@gavinking
Copy link
Contributor Author

gavinking commented Jan 13, 2022

@lukasj sorry, I've been busy, and have not kept up with this. Is there anything you need from me here?

(For example: I think I promised some language to specify how these things are represented in a varchar column, right?)

@lukasj
Copy link
Contributor

lukasj commented Jan 13, 2022

The most important thing is to fix changes to orm schema - no changes to existing ones and create the new one for version 3.1; the diff between 3.0 and 3.1 is supposed to contain licence header, version and the UUID value in the enumeration.

I'll double check it one more time in the morning to see if there is anything else. I'd like to be able to produce RC2 build with these changes to allow completion of related tck test and adoption of this change in implementations. Whatever wording related can be tweaked later through additional PR. Unless there's anyone who disagrees with this flow...

@gavinking
Copy link
Contributor Author

OK, so I'll take a look at that tomorrow, OK?

@lukasj
Copy link
Contributor

lukasj commented Jan 13, 2022

Sounds good

@lukasj
Copy link
Contributor

lukasj commented Jan 14, 2022

What I guess we should do though, that is missing from this PR, is clearly state that when a UUID is persisted to a varchar column, then it should be persisted in the canonical representation [..]

Agree, this needs to be done. As @rbygrave pointed out, some DB vendors do provide native support for this and persistence providers should be free to use such mapping should they have good reason for it or are told so by user; so the wording should really be sth like If UUID is persisted to VARCHAR, then it should be done in UUID's canonical representation.

@gavinking
Copy link
Contributor Author

Yup.

@gavinking gavinking force-pushed the uuid branch 2 times, most recently from 16adc91 to 16dbe5b Compare January 14, 2022 15:08
@gavinking
Copy link
Contributor Author

@lukasj updated, please review.

Copy link
Contributor

@lukasj lukasj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (there are few minor things in the schema which I fix in the next PR)

Thank you all for the work!

@lukasj lukasj merged commit f716bc1 into jakartaee:master Jan 17, 2022
@lukasj lukasj linked an issue Jan 17, 2022 that may be closed by this pull request
@gavinking
Copy link
Contributor Author

Yay, excellent, thanks @lukasj!

@beikov
Copy link

beikov commented Feb 23, 2022

What is the exact expectation now for a persistent attribute like the following with respect to the used SQL type?

@Id
@GeneratedValue(strategy = GenerationType.UUID)
UUID id;

Does the JPA spec require the use of a VARCHAR datatype, or is the JPA provider allowed to use a different DDL type like binary or a native uuid type? The TCK change jakartaee/platform-tck@feb60a9 seems to suggest VARCHAR is expected, but I would like clarification, because Hibernate currently represents such a mapping as native uuid if possible and falls back to binary. I know that we can change the DDL for TCK runs, but was there a conclusion about a DDL type expectation?

@gavinking
Copy link
Contributor Author

@beikov As far as I know the JPA spec never mandates any particular column type for generated DDL, and that's no different here.

@beikov
Copy link

beikov commented Feb 24, 2022

Ok, just wanted to make sure since there was talk about cross provider compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native support for UUID class GenerationType.UUID
7 participants