Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create context product unique identifier #283

Closed
anilnatha opened this issue Jun 4, 2024 · 11 comments
Closed

Create context product unique identifier #283

anilnatha opened this issue Jun 4, 2024 · 11 comments

Comments

@anilnatha
Copy link

💡 Description

We are in need of creating a new unique identifier for our context products because we have agreed that we won't be using the LID as a unique identifier. These unique identifiers are needed so that the portal (or anything else) can:

  1. Easily search for context products in the registry
  2. Use the ID as a string parameter in our Portal's navigation scheme.

We need to start by identifying how these unique identifiers will be generated (preliminary discussion was to use the title), along with any formatting rules we need (e.g. convert spaces to dashes, etc)

⚔️ Parent Epic / Related Tickets

NASA-PDS/portal-wp-tasks#70

@anilnatha anilnatha added this to B15.0 Jun 4, 2024
@github-project-automation github-project-automation bot moved this to Release Backlog in B15.0 Jun 4, 2024
@anilnatha anilnatha changed the title Create context product unique values Create context product unique identifier Jun 4, 2024
@tloubrieu-jpl
Copy link
Member

I would like us to explore OpenSearch feature to generate this fields as a kind of "view" or "computed field".

@alexdunnjpl
Copy link
Contributor

@anilnatha @tloubrieu-jpl @al-niessner is there some context (lol) for LIDVID being inappropriate?

@tloubrieu-jpl
Copy link
Member

Hi @alexdunnjpl ,

We don't want to use the full lidvid because it is long and contains : which are not suitable in a URL.

We don't want also to use a subset of the lidid because, as a design principle we don't want to assume the registry identifiers are meaningful or have an internal structure, because we believe this would be wrong to build a system based on this assumption. This is a bit dogmatic, but we mostly want to preserve our flexibility in the future if we want to move away from the lidvid as identifiers of the products.

@al-niessner
Copy link

@alexdunnjpl @anilnatha @tloubrieu-jpl

Sorry, I do not understand the requirements or the goal. If you want a unique identifier and do not want a user to type it, then use md5_sha - pick your favorite sha length. They both (md5 and sha) can compute the same checksum for different data but I think there is no overlap due to how they are computed. If you can manage a synchronized integer and user will not type it, then use an up counter like every database prior to this one. If you want the user to type it (implies they know it a priori), then use lidvid and live with it. You can change the : to __ or something else. Annoying but less annoying than quoting or escaping in a URL.

@alexdunnjpl
Copy link
Contributor

tl;dr "What Al said"

If the uuid isn't mapped from an attribute of the product and can be determined by the system ad-hoc, use an autogenerated sequential int or hashed uuid.

If the uuid is supposed to be a function of the product, our products' versions logical IDs are that, already - you can't have a standardised identifier which is semantically-meaningful without committing to some kind of pattern. If IDs are expected to be hand-typed (why?), then learning to use %3A in urls doesn't seem that onerous in the context of a long string. If copy/pasting from other sources is the issue then using a url-safe replacement like Al suggests seems like the way to go.

If the issue is that not all products have LIDVIDs (which I recall being mentioned), then there may be a sane way to give them a similarly-patterned pseudo-lidvid identifier.

@tloubrieu-jpl
Copy link
Member

Thanks @al-niessner, interesting inputs. We don't want the id to be hand typed, we want to retrieve it from the existing data (without intelligent parsing of the lidvid).

The requirements come from the web modernization which need a URL scheme which is user friendly and SEO-friendly.

@anilnatha that brings us 3 options for this web id:
a) a short meaningless code with number or letters: "a1e2", "ez45"... I did not check the minimum length we should go for but this is the idea.
b) a dump translation of the lidvid where : are replaced by '-' or '_', for example urn-nasa-pds-insight_cameras-1.0
c) a codification of the existing title, e.g. "InSight Cameras Bundle" becomes "insight-cameras-bundles", plus a possibly complicated way of making the string unique by adding "-1" or "-2" suffixes when needed.

(@alexdunnjpl I was writting that comment when you posted yours. It does not sound like it overlaps too much...)

@alexdunnjpl
Copy link
Contributor

@tloubrieu-jpl for what it's worth, I think there's some value in retaining semantic identifiers, if there's not a positive reason to switch to hashed ids. Makes it easier to keep track of when, for example, you're inspecting a handful of products open in browser tabs.

Replacing special characters with hyphens seems like the best way to achieve the goal, iff you're certain that this will never cause a collision (ex. some:kind-of:id and some:kind:of-id). Could be a safe assumption in practice, though the PDS Standards allow for .-_ within LIDVID fields.

I'm still iffy on what "user-friendly" actually means in this context. What concrete user actions (current or foreseen) are made more difficult by the current identifier approach? Knowing these actions might make it easier to reason about the proposed solutions.

SEO-friendly, I imagine, is wanting the individual fields of the identifiers to be SE-parsable? But that doesn't comport with the idea that hashed ids would be acceptable.

@al-niessner
Copy link

@alexdunnjpl @anilnatha @tloubrieu-jpl

I think you need to clearly pick: user is going to look at, understand, and then type ID or is not. If you do not want it to be hand typed, then make it unintelligible. If you want to do as @alexdunnjpl suggests that intelligible ones for double checks, then just stick with the lidvid.

If unintelligible, then go big. Short means collisions over time.

When substituting you have to be absurd. Do 3 underscore for dash and 4 underscore for colon. That way you do not collide with user choices and you can have a two way map which you will find you want/need they after you implement it.

@jshughes
Copy link

jshughes commented Jun 5, 2024

I tend to side with retaining semantic identifiers and replacing special characters. During the early design phases of PDS4, the requirements on the LID were that it be unique, opaque, and unchangeable, regarding the system. Both md5 and “up counter” values were suggested. However, the DDWG decided that the LID should be human readable (aka user friendly). The URN template was adopted, and LID formation rules were created primarily for managing the creation of new LIDs. Unfortunately, the formation rules quickly evolved from guidance into a standard. If the standard is now a problem, then in my opinion and as they say in architecture, once you have mastered the rules, you can break them.

@tloubrieu-jpl
Copy link
Member

@anilnatha @jordanpadams We will discuss that at the breakout today.

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Jun 6, 2024

We are going for an option where we translate dumbly the lid by having URL frendly characters for example urn--nasa--pds--insight_cameras----1.0. In this case, no development is needed on the back-end side and I will close this ticket.

@github-project-automation github-project-automation bot moved this from Release Backlog to 🏁 Done in B15.0 Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Done
Development

No branches or pull requests

6 participants