Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression where unlabeled files are not being identified #836

Merged
merged 4 commits into from
Feb 23, 2024

Conversation

al-niessner
Copy link
Contributor

@al-niessner al-niessner commented Feb 22, 2024

🗒️ Summary

Put some code back in that got lost during the saxon upgrade #734. Probably removed in an attempt to make something work but it was not the cause and then left it out since none of the regression tests failed.

⚙️ Test Data and/or Report

See automated tests of this PR as they should pass

♻️ Related Issues

Closes #822

Test for #6 was not sufficient for checking for unlabeled files, which is why some regression snuck into the code.

This commit also includes some cleanup of the test data since they are not needed for this test.

Refs #822
@al-niessner al-niessner self-assigned this Feb 22, 2024
@al-niessner al-niessner requested a review from a team as a code owner February 22, 2024 18:04
@al-niessner al-niessner marked this pull request as draft February 22, 2024 18:04
@jordanpadams jordanpadams changed the title unlabeld files not being identified Fix regression where unlabeled files are not being identified Feb 22, 2024
@al-niessner
Copy link
Contributor Author

@jordanpadams

Which 7 files do we expect to be unlabeled?

@jordanpadams
Copy link
Member

@al-niessner not positive if these are referenced in a label or not, but I don't have time to look through of the labels to verify. Here is my guess (it may only be 6...):

  • src/test/resources/github6/invalid//data_spectra/000DANGLING_FILE_WITH_EXTENSION.txt
  • src/test/resources/github6/invalid//data_spectra/000DANGLING_FILE_WITHOUT_EXTENSION
  • src/test/resources/github6/invalid//core
  • src/test/resources/github6/invalid//file_aux_should_be_ok.tab
  • src/test/resources/github6/invalid//core.vmac
  • src/test/resources/github6/invalid//file_with_bad_char_#_.txt

@al-niessner
Copy link
Contributor Author

@al-niessner not positive if these are referenced in a label or not, but I don't have time to look through of the labels to verify. Here is my guess (it may only be 6...):

  • src/test/resources/github6/invalid//data_spectra/000DANGLING_FILE_WITH_EXTENSION.txt
  • src/test/resources/github6/invalid//data_spectra/000DANGLING_FILE_WITHOUT_EXTENSION
  • src/test/resources/github6/invalid//core
  • src/test/resources/github6/invalid//file_aux_should_be_ok.tab
  • src/test/resources/github6/invalid//core.vmac
  • src/test/resources/github6/invalid//file_with_bad_char_#_.txt

@jordanpadams I am going to move away from this test to #597 and PR #615 because it is much cleaner and potentially related. Long post coming in near future saying no, but still cleaner for testing.

@al-niessner
Copy link
Contributor Author

The code responsible for throwing this error is in one place and one place only:

@ValidationTest
public void findUnreferencedTargets() {
LOG.debug("findUnreferencedTargets:getContext().getTarget(): {}", getContext().getTarget());
LOG.debug(
"findUnreferencedTargets:getContext().isRootTarget(),getContext().getAllowUnlabeledFiles() {},{}",
getContext().isRootTarget(), getContext().getAllowUnlabeledFiles());
LOG.debug("findUnreferencedTargets:getRegistrar().getUnreferencedTargets().size() {}",
getRegistrar().getUnreferencedTargets().size());
// Only run the test if we are the root target, to avoid duplicate errors.
if (getContext().isRootTarget() && !getContext().getAllowUnlabeledFiles()) {
HashSet<String> referencedBySkipped = this.buildReferencedFromSkipped();
for (String location : getRegistrar().getUnreferencedTargets()) {
LOG.debug("findUnreferencedTargets:location: {}", location);
try {
// issue42 - to skip Product level validation
if (!getContext().getSkipProductValidation() && !referencedBySkipped.contains(location)) {
reportError(PDS4Problems.UNLABELED_FILE, new URL(location), -1, -1);
}
} catch (MalformedURLException e) {
reportError(GenericProblems.UNCAUGHT_EXCEPTION, getContext().getTarget(), -1, -1,
e.getMessage());
}
}
}
}

Blames says that the only change was a year ago in #615. It is possible that we now have false negatives but need to be very careful. One, it is looking only for targets (labels) that it found but are not used by aggregates. It will not find any random extra file because of the loop:

for (String location : getRegistrar().getUnreferencedTargets()) {

The loop function in question is here:

https://github.com/NASA-PDS/validate/blame/918da20f5641443a8ceb146d97f327db2b1e0447/src/main/java/gov/nasa/pds/tools/validate/InMemoryRegistrar.java#L165

and has not changed significantly in 5 years (did code reformat 2 years ago). My next guess is that some code block that used to use that problem type was axed. Therefore I rolled back to v3.3.0 which the user reported as working. Still just the one location (FindUnreferencedFiles). Interestingly, v3.3.0 contains #615.

It means that either it never detected any random file or what we stuffed in as a target could have been regular files if not referenced by a label. Preparing an approach for this testing but it will not be simple.

@@ -48,6 +49,9 @@ public void registerTargets() {
try {
Crawler crawler = getContext().getCrawler();
WildcardOSFilter fileFilter = getContext().getFileFilters();
if (!"PDS4 Directory".equalsIgnoreCase(getContext().getRule().getCaption())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got removed in #734. Not sure why and did not leave myself a good comment. Without it, the filter is looking for just XML files.

@al-niessner al-niessner marked this pull request as ready for review February 23, 2024 00:11
@jordanpadams jordanpadams merged commit 58835e1 into main Feb 23, 2024
3 checks passed
@jordanpadams jordanpadams deleted the i822 branch February 23, 2024 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check for unlabeled files no longer works
3 participants