Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want to check that all Internal References are valid references to other PDS4 products within the current validating bundle #308

Closed
Tracked by #712
mit3ch opened this issue Mar 17, 2021 · 13 comments · Fixed by #347 or #762

Comments

@mit3ch
Copy link

mit3ch commented Mar 17, 2021

  1. For more information on how to populate this new feature request, see the PDS Wiki on User Story Development: https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Issue-Tracking#user-story-development

  2. Do the best you can with template. If it is too difficult to create a "story" just jot down as much info as you can.

Motivation

...so that I can ensure referential integrity of references within the bundle to other products within the same parent bundle

Additional Details

Need to confirm that every LID or LIDVID referenced in an Internal_Reference class exists.

Concatentate every LID/LIDVID in Identification Area from every label in a bundle. Then for each xml label, verify that each LID/LIDVID not in the Identification_Area is included in the concatenated list. If not giving a warning that there is a missing product. Ideally also check against all registered LIDs & LIDVIDs.

Check with Richard Chen, he has a python script that does this checking.

Acceptance Criteria

Given a product that contains one or more Internal_References to product LID/LIDVIDs within the same parent bundle
When I perform validation of the bundle
Then I expect to validate that all LIDs/LIDVIDs to products within the bundle are valid references

@mit3ch mit3ch added enhancement New feature or request requirement-needed labels Mar 17, 2021
@jordanpadams jordanpadams changed the title referential integrity check As a user, I want to check that all Internal References exist in the archive Mar 17, 2021
@jordanpadams
Copy link
Member

thanks @mit3ch thought we had a ticket for this already, but apparently not. definitely on our radar, but may be a little more complicated than Richard's script because it has to encompass the entire PDS4 archive. may have to wait until the Registries are installed and all PDS4 data is ingested

@mit3ch
Copy link
Author

mit3ch commented Mar 17, 2021 via email

@jordanpadams jordanpadams changed the title As a user, I want to check that all Internal References exist in the archive As a user, I want to check that all Internal References are valid Mar 17, 2021
@jordanpadams jordanpadams changed the title As a user, I want to check that all Internal References are valid As a user, I want to check that all Internal References are valid references to other PDS4 products Mar 17, 2021
@mit3ch
Copy link
Author

mit3ch commented Mar 17, 2021 via email

@mit3ch
Copy link
Author

mit3ch commented Mar 19, 2021 via email

@NASA-PDS NASA-PDS deleted a comment from mit3ch Mar 19, 2021
@msbentley
Copy link

Would such a warning/check fire only when the validation context was set to bundle? (most of the time I am validating product deliveries, so I wouldn't want warnings that referenced products were not found, simply because they were in a separate delivery, or had previously been delivered etc.)

@jordanpadams
Copy link
Member

@msbentley great question. i wasn't thinking this would apply only to bundles, but maybe that would make more sense. we can maybe bring this to SWG for more clarification.

@jordanpadams jordanpadams changed the title As a user, I want to check that all Internal References are valid references to other PDS4 products As a user, I want to check that all Internal References are valid references to other PDS4 products within the current validating bundle Mar 27, 2021
@qchaupds
Copy link
Contributor

@jordanpadams Is there a good representative bundle in our test resources? There's no test resources provided for this ticket.

@jordanpadams
Copy link
Member

here is some test data. I will send you the path on our servers.

But there are only a few products in there that contain references. Here is a snippet from one of the examples:

pds4-compil-comet-v1.0/pds4-compil-comet-v1.0/polarimetry/data/dbcp.xml

  <Reference_List>
    <Internal_Reference>
        <lidvid_reference>urn:nasa:pds:compil-comet:polarimetry:filters::1.0</lidvid_reference>
      <reference_type>data_to_document</reference_type>
    </Internal_Reference>
  </Reference_List>

You really just need to take any test bundle we have out there, and add a reference similar to this to a LIDVID of another product in the bundle.

If the LIDVID does not exist anywhere in that bundle, we should throw an error.

@mit3ch
Copy link
Author

mit3ch commented May 11, 2021 via email

@jordanpadams
Copy link
Member

copy that @mit3ch . we should be able to handle that logic.

@qchaupds I think it shouldnt be too complicated to make this happen. the way I see this is we should do this as part of the referential integrity checking we already do with pds4.bundle validation. We should maintain some sort of object/data structure (or may be we already have one) that contains any references within a product, and checks those as well. we can talk more about this offline if we want to provide some more clarification here.

@qchaupds
Copy link
Contributor

We have good success so far.

Running validate against a bundle we know has issues.

% validate -R pds4.bundle -r report_github308_bundle_invalid.json -s json -t src/test/resources/github308/invalid/bundle_kaguya_derived.xml >& t2

There are 3 warnings for 2 labels regarding a reference pointing to a non-existent logical identifier.

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 110 % egrep "label|not found" report_github308_bundle_invalid.json

  "label": "file:/home/qchau/sandbox/validate/src/test/resources/github308/invalid/bundle_kaguya_derived.xml",
      "message": "A LID reference urn:nasa:pds:kaguya_grs_spectra:document:kgrs_calibrated_spectra is referencing a logical identifier for a product not found in this bundle."
      "message": "A LID reference urn:nasa:pds:kaguya_grs_spectra:document:kgrs_ephemerides_doc is referencing a logical identifier for a product not found in this bundle."
  "label": "file:/home/qchau/sandbox/validate/src/test/resources/github308/invalid/data_spectra/kgrs_calibrated_spectra_per1.xml",
      "message": "A LID reference urn:nasa:pds:kaguya_grs_spectra:document:kgrs_calibrated_spectra is referencing a logical identifier for a product not found in this bundle."

The reference urn:nasa:pds:kaguya_grs_spectra:document:kgrs_calibrated_spectra does not occur anywhere as a logical_identifier:

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate/src/test/resources/github308/invalid 119 % grep -rn "urn:nasa:pds:kaguya_grs_spectra:document:kgrs_calibrated_spectra" . | grep logical_identifier

The reference urn:nasa:pds:kaguya_grs_spectra:document:kgrs_ephemerides_doc does not occur anywhere as a logical identifier:

pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate/src/test/resources/github308/invalid 120 % grep -rn "urn:nasa:pds:kaguya_grs_spectra:document:kgrs_ephemerides_doc" . | grep logical_identifier

There is a label src/test/resources/github308/invalid/VALID_odf07155_msgr_11.xml
but its logical identifier urn:nasa:pds:mess-rs-raw:data.odf:mess_rs_07155_156_60s_odf does not belong to the "urn:nasa:pds:kaguya_grs_spectra" bundle so the warning is not raised.

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate/src/test/resources/github308/invalid 124 % grep logical_identifier /home/qchau/sandbox/validate/src/test/resources/github308/invalid/VALID_odf07155_msgr_11.xml
<logical_identifier>urn:nasa:pds:mess-rs-raw:data.odf:mess_rs_07155_156_60s_odf</logical_identifier>

However, the label does get a warning for not belong to anyone which is expected.

{
  "status": "PASS",
  "label": "file:/home/qchau/sandbox/validate/src/test/resources/github308/invalid/VALID_odf07155_msgr_11.xml",
  "messages": [
    {
      "severity": "WARNING",
      "type": "warning.integrity.unreferenced_member",
      "message": "Identifier 'urn:nasa:pds:mess-rs-raw:data.odf:mess_rs_07155_156_60s_odf::1.0' is not a member of any collection within the given target"
    }

qchaupds pushed a commit that referenced this issue May 18, 2021
…ucts within the current bundle

1. Add test resources for github308 to src/test/resources
2. Add functions to support parsing for lid_reference, lidvid_reference and logical_identifier tags and move some constants from function to private class variables for readability in LabelUtil.java
3. Add new check if a reference is pointing to logical_identifier not in the current bundle in BundleReferentialIntegrityRule.java
4. Add debugs to CollectionReferentialIntegrityRule.java
5. Add new tests and update github51 message count in validate.feature

Refs:

#308 As a user, I want to check that all Internal References are valid references to other PDS4 products within the current validating bundle
qchaupds pushed a commit that referenced this issue May 19, 2021
…eference or lidvid_reference to map to a logical_identifier

1. Add getIdentifiersCommon() function to refactoring in LabelUtil.java
2. Use lid_reference or lidvid_reference to map to a logical_identifier instead of using a filename in BundleReferentialIntegrityRule.java
3. Remove slash when checking for combination of two string to avoid confusion in BundleReferentialIntegrityRule.java

Refs:

#308 As a user, I want to check that all Internal References are valid references to other PDS4 products within the current validating bundle
jordanpadams added a commit that referenced this issue May 20, 2021
jordanpadams added a commit that referenced this issue May 28, 2021
Check that all internal references are valid references to other prod…
@rchenatjpl
Copy link
Contributor

@qchaupds @jordanpadams val308b.zip

In the attached, validate should catch that the browse product's reference to a LID in this bundle doesn't exist. Eventually and maybe ideally, validate should catch that the data product's reference to a LID outside this bundle doesn't exist. Search for "xxx" in the .xml files. Validate now catches neither, though it does erroneously catch something related to validate#69

@jordanpadams
Copy link
Member

thanks @rchenatjpl . I created a new ticket for the bug you found here: #432

per your comment about catching LIDs outside this bundle, that is in our plans for next build once we have the data ingested into the registry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment