Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pds4.bundle option seems to not travel through enough subdirectories #444

Closed
Tracked by #30
lylehuber opened this issue Nov 12, 2021 · 5 comments · Fixed by #539
Closed
Tracked by #30

pds4.bundle option seems to not travel through enough subdirectories #444

lylehuber opened this issue Nov 12, 2021 · 5 comments · Fixed by #539

Comments

@lylehuber
Copy link

lylehuber commented Nov 12, 2021

🐛 Describe the bug

Items that are clearly listed in collection inventories but are a few subdirectories below that inventory are flagged as not
being in a collection.

📜 To Reproduce

Go the bundle at https:pds-atmospheres.nmsu.edu/PDS/data/PDS4/odya_bundle
Attached is the validate output. (Ignore the warning.file.not_referenced_in_label warnings because they were
originally PDS3 EXTRAS files.)

🕵️ Expected behavior

📚 Version of Software Used

2.0.7

🩺 Test Data / Additional context

🏞Screenshots

🖥 System Info

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

🦄 Related requirements

odya.txt

⚙️ Engineering Details

  • Test data can be found at pds-dev3:$TEST_DATA_HOME/registry
  • The 3277 files that throw the warning warning.integrity.unreferenced_member are all in the DATA directory. My guess is the bundle validation is looking for the collection inventory 1 level below the bundle, but that is not required.
@lylehuber lylehuber added bug Something isn't working needs:triage labels Nov 12, 2021
@jordanpadams
Copy link
Member

@lylehuber is this data online somewhere that we can grab it to see what is happening?

@lylehuber
Copy link
Author

lylehuber commented Nov 12, 2021

@jordanpadams
Copy link
Member

@qchaupds if you are at least able to poke at this to maybe help track down what the issue is before you head out, that would be great! if not, no big deal.

@qchaupds
Copy link
Contributor

The crawling of files is a complicated thing and may need an overhaul.

I would recommend taking a closer look at all the calls to listFiles() in src/main/java/gov/nasa/pds/tools/validate/crawler/FileCrawler.java
Be careful there are two FileCrawler.java files. The one you want is src/main/java/gov/nasa/pds/tools/validate/crawler/FileCrawler.java

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 128 % grep listFiles ./src/main/java/gov/nasa/pds/tools/validate/crawler/FileCrawler.java
for (File dir :Arrays.asList(directory.listFiles(directoryFilter))) {
Collection collections = FileUtils.listFiles(directory, fileFilter, null);
Collection collections = FileUtils.listFiles(directory, extensions, false);

to see if it is appropriate to add the recursive boolean flag to the call. Note that not all the current calls are correct. The 2nd call should not be null but a boolean. They all should have the recursive Boolean flag.

The function refinedFoundList() also need to have a much closer look.
Everything should be suspect.

https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html#listFiles-java.io.File-java.lang.String:A-boolean-

public static Collection listFiles(File directory,
String[] extensions,
boolean recursive)
Finds files within a given directory (and optionally its subdirectories) which match an array of extensions.
Parameters:
directory - the directory to search in
extensions - an array of extensions, ex. {"java","xml"}. If this parameter is null, all files are returned.
recursive - if true all subdirectories are searched as well
Returns:
a collection of java.io.File with the matching files

To test this, you may need to have a bundle with several sub directories deep similar to the link in this ticket and make sure that all labels are picked up and not just the first 2 levels.

Another place to look is where the crawl() function is called. Look to see if the correct recursive flag is passed in or not.
To make matter complicated, there is a command argument at line 77 that needs to be taken into consideration if the target at command line is a directory.

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 168 % vi ./src/main/java/gov/nasa/pds/validate/commandline/options/Flag.java

77 /**
78 * Flag that disables recursion when traversing a target directory.
79 */
80 LOCAL("L", "local", "Validate files only in the target directory rather " + "than recursively traversing down the subdirectories."),

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 171 % vi src/main/java/gov/nasa/pds/validate/ValidateLauncher.java

646 if (config.containsKey(ConfigKey.LOCAL)) {
647 if (config.getBoolean(ConfigKey.LOCAL) == true) {
648 setTraverse(false);
649 } else {
650 setTraverse(true);
651 }
652 }

{pds-dev3.jpl.nasa.gov}/home/qchau/sandbox/validate 162 % grep -rn "crawl(" src/main/java

src/main/java/gov/nasa/pds/tools/label/LocationValidator.java:200: ArrayList ignoreList = new ArrayList(); // List of items to be ignored from result of crawl() function.
src/main/java/gov/nasa/pds/tools/util/ReferentialIntegrityUtil.java:778: children = getContext().getCrawler().crawl(parentURL,false); // Get also the directories.
src/main/java/gov/nasa/pds/tools/util/ReferentialIntegrityUtil.java:838: children = getContext().getCrawler().crawl(crawlTarget,true); // Get also the directories.
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:121: children = crawler.crawl(url, regexFileFilter);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:151: dirs = crawler.crawl(url, true);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:158: //kids = crawler.crawl(dir.getUrl(), regexFileFilter);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:162: kids = crawler.crawl(dir.getUrl(), LABEL_EXTENSIONS_LIST, false, Constants.COLLECTION_NAME_TOKEN);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:245: dirs = crawler.crawl(url, true);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:252: //kids = crawler.crawl(dir.getUrl(), regexFileFilter);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:256: kids = crawler.crawl(dir.getUrl(), LABEL_EXTENSIONS_LIST, false, Constants.COLLECTION_NAME_TOKEN);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:391: List ignoreBundleList = new ArrayList(); // List of items to be removed from result of crawl() function.
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:431: List ignoreCollectionList = new ArrayList(); // List of items to be removed from result of crawl() function.
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:495: //allFiles = crawler.crawl(new File(dirName).toURI().toURL(),regexFileFilter);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:499: allFiles = crawler.crawl(new File(dirName).toURI().toURL(), LABEL_EXTENSIONS_LIST, false, BUNDLE_NAME_TOKEN);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:589: //allFiles = crawler.crawl(new File(dirName).toURI().toURL(),regexFileFilter);
src/main/java/gov/nasa/pds/tools/validate/BundleManager.java:593: allFiles = crawler.crawl(new File(dirName).toURI().toURL(), LABEL_EXTENSIONS_LIST, false, Constants.COLLECTION_NAME_TOKEN);
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:34: protected ArrayList ignoreList = new ArrayList(); // List of items to be removed from result of crawl() function.
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:42: // Function allow all item named to be removed from the crawl() function.
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:59: public List crawl(URL url) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:60: return crawl(url, true, this.fileFilter);
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:63: public List crawl(URL url, IOFileFilter fileFilter) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:64: return crawl(url, true, fileFilter);
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:67: public List crawl(URL url, boolean getDirectories) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:68: return crawl(url, getDirectories, this.fileFilter);
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:71: public List crawl(URL url, String[] extensions, boolean getDirectories, String nameToken) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:72: return crawl(url, extensions, getDirectories, nameToken);
src/main/java/gov/nasa/pds/tools/validate/crawler/Crawler.java:75: public abstract List crawl(URL url, boolean getDirectories, IOFileFilter fileFilter) throws IOException;
src/main/java/gov/nasa/pds/tools/validate/crawler/FileCrawler.java:132: public List crawl(URL fileUrl, boolean getDirectories, IOFileFilter fileFilter) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/crawler/FileCrawler.java:166: public List crawl(URL fileUrl, String[] extensions, boolean getDirectories, String nameToken, boolean ignoreCaseFlag) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/crawler/FileCrawler.java:192: public List crawl(URL fileUrl, String[] extensions, boolean getDirectories, String nameToken) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/crawler/FileCrawler.java:195: return(this.crawl(fileUrl, extensions, getDirectories, nameToken, true));
src/main/java/gov/nasa/pds/tools/validate/crawler/URLCrawler.java:60: public List crawl(URL url, boolean getDirectories, IOFileFilter fileFilter) throws IOException {
src/main/java/gov/nasa/pds/tools/validate/rule/AbstractFileSubtreeWalker.java:52: List children = crawler.crawl(url);
src/main/java/gov/nasa/pds/tools/validate/rule/MarkSubdirectoriesReferenced.java:46: List targets = crawler.crawl(getTarget(), FalseFileFilter.INSTANCE);
src/main/java/gov/nasa/pds/tools/validate/rule/RegisterTargets.java:58: for (Target child : crawler.crawl(getTarget(), getContext().isRecursive(), fileFilter)) {
src/main/java/gov/nasa/pds/tools/validate/rule/pds3/VolumeValidationRule.java:115: List children = crawler.crawl(url);
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/BundleContentsNamingRule.java:85: List targets = crawler.crawl(getTarget());
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/BundleReferentialIntegrityRule.java:96: List children = getContext().getCrawler().crawl(getTarget());
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/CollectionInBundleRule.java:48: List dirs = crawler.crawl(getContext().getTarget(), FalseFileFilter.INSTANCE);
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/CollectionReferentialIntegrityRule.java:80: //List children = crawler.crawl(getTarget(), regexFileFilter);
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/CollectionReferentialIntegrityRule.java:85: List children = crawler.crawl(getTarget(), extensions, false, Constants.COLLECTION_NAME_TOKEN);
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/CollectionReferentialIntegrityRule.java:93: children.addAll(crawler.crawl(child.getUrl(), regexFileFilter));
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/FileAndDirectoryNamingRule.java:83: checkFileAndDirectoryNaming(crawler.crawl(getTarget()));
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/LabelInFolderRule.java:81: targetList = crawler.crawl(target, true, getContext().getFileFilters());
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/LabelInFolderRule.java:137: targetList = crawler.crawl(target, false, getContext().getFileFilters());
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/SubDirectoryRule.java:55: List dirs = crawler.crawl(getContext().getTarget(), FalseFileFilter.INSTANCE);
src/main/java/gov/nasa/pds/tools/validate/rule/pds4/SubdirectoryNamingRule.java:93: List targets = crawler.crawl(getTarget(), FalseFileFilter.INSTANCE);

@miguelp1986
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants