Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task] Archives shows identical labels for identically-named but distinct files #442

Closed
andrewazores opened this issue May 19, 2022 · 12 comments · Fixed by #473
Closed

[Task] Archives shows identical labels for identically-named but distinct files #442

andrewazores opened this issue May 19, 2022 · 12 comments · Fixed by #473
Assignees
Labels
bug Something isn't working needs-documentation

Comments

@andrewazores
Copy link
Member

Related to #398

cryostat-archive-labels-bug-2022-05-19_14 46 59

See gif. Reproduction steps:

  1. Select a target
  2. Start a recording from the Recordings > Active view
  3. Archive the recording by selecting the recording within the view and clicking the Archive button
  4. Observe that the labels present on the active recording are displayed on the archived recording as well in the Recordings > Archived view
  5. Download the archived recording
  6. Navigate to Archives view
  7. Observe that the expected labels on the archived recording are still present
  8. Upload the downloaded archived recording back into Cryostat from the Archives view
  9. Observe that both copies have identical names (Re-uploading a saved recording does not modify the second file name #398) as well as identical labels. This shouldn't happen - normally, an uploaded recording should have no labels by default.
  10. Navigate back to Recordings > Archived and delete the original copy (do it this way so you're sure which one you're deleting)
  11. Navigate back to Archives and observe that the second, re-uploaded copy now has no labels, as expected

Not sure if this is entirely a -web bug or if this also has bugged behaviour in the backend.

@andrewazores
Copy link
Member Author

@dfitzmau @kucollin new Known Issue for 2.1.0. I think this can be considered an extension of #398 for the purposes of the Known Issues document though.

@andrewazores
Copy link
Member Author

@jan-law I've assigned you to this issue since the recording labels are mostly your domain. Please let me know if you don't want to take this one and I'll look into it myself.

@jan-law
Copy link
Contributor

jan-law commented May 19, 2022

I can look into this.

Observe that both copies have identical names

The labels are mapped to file names with the assumption that filenames in Cryostat's archives have distinct names, which is causing the behaviour seen here.

Should we allow recording files stored in different subdirectories to have identical file names? If we do, the frontend will need a few changes because the archived recording views perform actions by only looking up the recording name. e.g. attempting to delete one of the recordings with the same name will delete both of them at the same time.

@andrewazores
Copy link
Member Author

I think we should only expect uniqueness within the same subdirectory, since that generally matches the same policy we apply to active recording names (unique to the target). #400 will make it more apparent to the user what exactly is happening here by surfacing those subdirectories in the UI. I think there could be some weird interactions if we applied the renaming policy to uploaded files if there is already an identically-named file belonging to some target elsewhere in the archives. For example if you deleted that archived file belonging to a target then the uploaded copy would have the .2 in its name, but then you would be able to upload the same file into the archives again and there would be no naming conflict, so it would retain its original name again.

@jan-law
Copy link
Contributor

jan-law commented Jun 1, 2022

The java.lang:type=Runtime bean has a Name attribute which resolves to a string <pid>@<hostname>, which I think should be unique between jvm instances. e.g. for the cryostat container, the runtime name is 1624143@localhost.localdomain. Multiple connectURLs mapping to the same jvm should also resolve to the same runtime name. I can use this string instead of the targetId when mapping target recording files to labels.

To distinguish between identically named uploaded recordings and recordings directly saved from a target, the RecordingMetadataLabelsPostHandler would need more info in the API request besides the recording name. I'm not sure if it would be better practice to add the source target as a path param or a form param. If I add a path param like POST /api/v2/recordings/:sourceTarget/:recordingName/metadata/labels, it follows a similar request pattern as the target post handler, but recordings uploaded to archives don't have a sourceTarget. Thoughts?

@andrewazores
Copy link
Member Author

Hmm. PIDs are indeed unique per "host", but I wonder if we can rely on every container instance having a unique hostname across all platforms. Processes are also often (usually) run as PID 1 within their own containers...

To distinguish between identically named uploaded recordings and recordings directly saved from a target, the RecordingMetadataLabelsPostHandler would need more info in the API request besides the recording name. I'm not sure if it would be better practice to add the source target as a path param or a form param. If I add a path param like POST /api/v2/recordings/:sourceTarget/:recordingName/metadata/labels, it follows a similar request pattern as the target post handler, but recordings uploaded to archives don't have a sourceTarget. Thoughts?

This makes sense.

For a future major version (Cryostat 3.0) we should really have a proper database, and assign unique IDs to archived recordings. The database would contain the information about which target the recording came from or if it was uploaded by a client, and the /:recordingName path parameters would be /:recordingId instead.

For now that's a much larger undertaking and well out of scope for this fix, or for the next 2.2 release.

POST /api/v2/recordings/:sourceTarget/:recordingName/metadata/labels seems okay. I would definitely prefer this information to be added as a path parameter rather than a form parameter. It's a little unfortunate that this ends up looking so much like the POST /api/beta/targets/:targetId/recordings/:recordingName/metadata/labels endpoint but just slightly shuffled around, but so be it.

For client-uploaded recordings which have no sourceTarget, maybe it's good enough to just use a special-case value here to indicate it. Just 0, or something, since presumably the empty string won't work as a path parameter.

@andrewazores
Copy link
Member Author

This problem of /api/v/recordings/:recordingName/etc endpoints being unable to differentiate between per-target archives and the uploaded archives is actually affecting more than just labels, too. Deleting an archived recording also runs into problems since the file is searched for by name, but can be found in two locations.

@andrewazores
Copy link
Member Author

@jan-law on the backend side we can grab:

JFRConnection#getHandle -> IConnectionHandle
IConnectionHandle#getServerDescriptor -> IServerDescriptor
IServerDescriptor#getGUID -> String

Perhaps that works instead of the RuntimeBean Name attribute?

@jan-law
Copy link
Contributor

jan-law commented Jun 1, 2022

Ah, thanks, that works.

@jan-law
Copy link
Contributor

jan-law commented Jun 1, 2022

Did some more testing with the draft PRs above, and it looks like the GUID appears different for aliased target URLs. Any ideas why?

Also sometimes, the JFRConnection returns a null GUID. I noticed this when using the runtime bean as well. Looking into it.

targetId: cryostat:9093, guid: 55bdbd87-da06-49ec-b4a9-c782c4059494
targetId: service:jmx:rmi:///jndi/rmi://cryostat:9093/jmxrmi, guid: 1bd5c886-8d35-42e5-85bb-9ecdeae3a7de

targetId: service:jmx:rmi:///jndi/rmi://cryostat:9091/jmxrmi, guid: ec355558-b5ca-4757-b403-3c007ff7e7aa
targetId: service:jmx:rmi:///jndi/rmi://localhost:0/jmxrmi, guid: 6f2d866a-dc4e-412a-999b-91fc739040a5

@andrewazores
Copy link
Member Author

Huh. Getting a null GUID or null attribute from the RuntimeBean is very unexpected.

I'll have to dig deeper into the IServerDescriptor and see how/when it computes the GUID. Perhaps in the end this is not actually giving us anything beyond a generated ID corresponding 1:1 to a JMX service URL.

Maybe in that case using the Runtime MXBean is actually the better way to go. I don't think pid@host is going to be sufficiently unique in many container deployment environments, but there are lot more attributes there that we can pick up.

Taking all of the classpath, "name", library path, input arguments, pid, VM vendor, and VM version together sounds like it gets us really close to being able to identify deployments of an application, but not necessarily unique instances - since all of those properties would be shared across replicas, possibly even the PID all being 1 in container deployments for example. But, add the Runtime MXBean's "startTime" attribute, which gives the start timestamp in Unix epoch milliseconds, and suddenly it's a lot closer to unique. Still perhaps not sufficient, though. If that was nanosecond resolution I would be satisfied.

@andrewazores
Copy link
Member Author

Maybe taking all that information from the Runtime MXBean, including the JVM start time, and putting that all into a structure is really "unique enough" when we consider that this only needs to be unique each time that we look at the same JVM across a handful of different JMX service URLs. It's possible that we have some collisions across that Runtime MXBean data tuple over all of our targets, but what are the odds that there's a collision among the small number of aliased URLs pointing to the same target JVM?

So I guess hash(<pid, name, classpath, vendor, version, startTime>) or so is a unique-enough identifier for our requirements in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-documentation
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants