You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recent changes to maintenance and installation procedures for the ISISDATA ancillary data area has resulted in noticeable impact on users. Of particular concern is the size of the ISISDATA installation has more than doubled rather suddenly. This is because of at least three reasons: 1) The OSIRIS-REx mission kernel set has been added to ISIS, 2) the way in which ISISDATA mission SPICE kernel archives are managed and downloaded has changed, and 3) moving the ISISDATA public download resource to Amazon's AWS S3 storage breaks symbolic links and results in additional copies of the linked file. In addition, there are reports of users that encounter problems processing ISIS mission data that are related to ISISDATA issues (#5107, #5103,#5093, #5024, etc...).
There are some efforts underway to address some of the issues related to ISISDATA. Because ISIS SPICE maintenance procedures have been changed to downloading and installing mission SPICE kernels directly from the NAIF SPICE archives (and other archives) without filter, the size of the ISIS installation has significantly increased in number of files and full installation size. I have developed a rather aggressive rclone filter file to exclude a rather large number of files that are not used by ISIS. There are numerous related issues (#5109, #5014, #5105).
I understand and share reluctance to apply filtering without any way to confirm it has not broken ISISDATA. There is an expressed need of tools to help evaluate and verify the ISISDATA installations to assure users they have a valid installation. And, with ISISDATA mission kernel installations coming directly from NASA/others SPICE archives that typically contain a much larger number of SPICE data than what ISIS needs, it would be most helpful to adopt some filtering process with assurances that the validity of ISISDATA is not compromised. To address these concerns, I have developed an ISIS application called isisdataeval that I hope will provide some helpful data to support and manage ISISDATA resources.
isisdataeval Overview
isisdataeval reads the contents of an ISISDATA area and verifies its contents. This is done by traversing the DATADIR directory and finding all kernel.????.db and kernel.????.conf. The contents of these files contain configurations of kernel file patterns that are used by spiceinit to attach/associate all required NAIF SPICE kernels to individual image cubes after ingestion into ISIS. Every File keyword found in a Selection group in databases is expanded using the same process applied in spiceinit. spiceinit translates (environment) variables using special ISIS translation values (e.g., mission names) and numerical versioning to resolve file naming patterns into absolute path references. A valid ISISDATA setup will result in valid formulations of absolute file name paths in File keyword values to exising files.
The second aspect of this application is use this as a tool to help assess problems encountered with user installations of ISISDATA. There are numerous occurances of users having problems with local ISISDATA installations. Many of these problems are related to the integrity of the local installation. isisdataeval computes a file hash value for each file in the DATADIR directory and all its subdirectories. This means it will also inspect calibration files that are in the ISISDATA installation. It will also calculate a volume hash. This hash is a running hash of all the files combined into a single hash value. If the TotalVolumeHash value does not match values computed in others, particularly the USGS ISISDATA source, then its likely the installation failed and/or files are corrupted/missing.
Initial Evaluation of ISISDATA Full Installation
The application has been tested on a full ISIS data download using the downloadIsisData and the rclone.conf file in the ISIS repository. The following commands were used to download/install and evaluate the complete ISISDATA installation on a remote mounted disk volume.
mkdir -p /opt/isisdatafull
./downloadIsisData all /opt/isisdatafull --config=rclone.conf -vv --log-file=isisdata_full.log
isisdataeval isisdata=/opt/isisdatafull \
datadir=/opt/isisdatafull \
toissues=isisdata_full_download_issues.csv \
toinventory=isisdata_full_download_inventory.csv \
toerrors=isisdata_full_download_inventory_errors.csv \
hash=md5 \
preference=IsisPreferences | tee -a isisdata_full_evaluation.log
The output of isisdataeval indicates issues with ISISDATA install that was generated on or about December 2, 2022:
The runtime for this dataset is recorded in the Accounting group of the print.prt log file. I add this here to indicate how expensive computing file hashes are. Note that when selecting to compute hashes, isisdataeval reads the entire nearly 2TB of ISISDATA.
Group = Accounting
ConnectTime = 06:24:35.7
CpuTime = 02:04:37.5
End_Group
This run indicates 45 validation issues and 10 inventory errors. Generally all the kernels that have a missing status are truly missing, but for various reasons.
For example, the missing Clementine IAK kernel is identified in this issue:
Note the problem is the filenames for the LWIR IAK use 4 digits rather than the "???" of the file spec. Note the Smart1 FK also is misconfigured in the same way.
In the case of inventory volume errors, there are 10 errors where all of the files apparently do not exist! What all these files have in common is they all have a + in their name. This is very bad because all filenames that are input into any ISIS application are processed by the FileName class which is designed to strip out ISIS cube file attributes that are indicated with a + at the end of the filename. In those 10 cases, the + and any remaining characters are stripped from the name when expanded creating a bad file name. This is certainly an issue when computing hash values. These files could have never been read by ISIS because of this! Essentially file names used by ISIS cannot contain a + and must be banned.
The full results (less the full ISISDATA inventory file, which is ~40MB), the output files are located here.
mkdir -p /opt/isisdatafiltered
./downloadIsisData all /opt/isisdatafiltered --config=rclone.conf --filter-from=isisdata_rclone_filter_from.lis -vv --log-file=isisdata_filtered.log
isisdataeval isisdata=/opt/isisdatafiltered \
datadir=/opt/isisdatafiltered \
toissues=isisdata_filtered_download_issues.csv \
toinventory=isisdata_filtered_download_inventory.csv \
toerrors=isisdata_filtered_download_inventory_errors.csv \
hash=md5 \
preference=IsisPreferences | tee -a isisdata_filtered_download.log
And the Accounting group for the filtered install:
Group = Accounting
ConnectTime = 05:33:60.0
CpuTime = 01:10:03.6
End_Group
The Results group of that run identified the same number of issues/errors as well as the same files contained in the full download of ISISDATA above. The files generated from this run are available here.
Summary
The isisdataeval documentation that describes this tool in greater detail. The documentation of the application is available in a raw gistl so you will have to download it and render it locally (is there a better way to share an HTML or PDF file?).
I think this application may also specifically address some of the needs identified in #5105 and #5109. It can also be useful in assisting users once this application becomes publically available.
How to reproduce
Possible Solution
Additional context
The text was updated successfully, but these errors were encountered:
Its not clear to me if any of the ISISDATA validation issues have been, or will be addressed. These issues seem to be addressable only by USGS/Astro developers, so I will keep this open until they are addressed or someone else closes this issue.
ISIS version(s) affected: All
ISISDATA Evaluation and Verification
Recent changes to maintenance and installation procedures for the ISISDATA ancillary data area has resulted in noticeable impact on users. Of particular concern is the size of the ISISDATA installation has more than doubled rather suddenly. This is because of at least three reasons: 1) The OSIRIS-REx mission kernel set has been added to ISIS, 2) the way in which ISISDATA mission SPICE kernel archives are managed and downloaded has changed, and 3) moving the ISISDATA public download resource to Amazon's AWS S3 storage breaks symbolic links and results in additional copies of the linked file. In addition, there are reports of users that encounter problems processing ISIS mission data that are related to ISISDATA issues (#5107, #5103,#5093, #5024, etc...).
There are some efforts underway to address some of the issues related to ISISDATA. Because ISIS SPICE maintenance procedures have been changed to downloading and installing mission SPICE kernels directly from the NAIF SPICE archives (and other archives) without filter, the size of the ISIS installation has significantly increased in number of files and full installation size. I have developed a rather aggressive rclone filter file to exclude a rather large number of files that are not used by ISIS. There are numerous related issues (#5109, #5014, #5105).
I understand and share reluctance to apply filtering without any way to confirm it has not broken ISISDATA. There is an expressed need of tools to help evaluate and verify the ISISDATA installations to assure users they have a valid installation. And, with ISISDATA mission kernel installations coming directly from NASA/others SPICE archives that typically contain a much larger number of SPICE data than what ISIS needs, it would be most helpful to adopt some filtering process with assurances that the validity of ISISDATA is not compromised. To address these concerns, I have developed an ISIS application called isisdataeval that I hope will provide some helpful data to support and manage ISISDATA resources.
isisdataeval Overview
isisdataeval reads the contents of an ISISDATA area and verifies its contents. This is done by traversing the DATADIR directory and finding all kernel.????.db and kernel.????.conf. The contents of these files contain configurations of kernel file patterns that are used by spiceinit to attach/associate all required NAIF SPICE kernels to individual image cubes after ingestion into ISIS. Every File keyword found in a Selection group in databases is
expanded
using the same process applied in spiceinit. spiceinit translates (environment) variables using special ISIS translation values (e.g., mission names) and numerical versioning to resolve file naming patterns into absolute path references. A valid ISISDATA setup will result in valid formulations of absolute file name paths in File keyword values to exising files.The second aspect of this application is use this as a tool to help assess problems encountered with user installations of ISISDATA. There are numerous occurances of users having problems with local ISISDATA installations. Many of these problems are related to the integrity of the local installation. isisdataeval computes a file hash value for each file in the DATADIR directory and all its subdirectories. This means it will also inspect calibration files that are in the ISISDATA installation. It will also calculate a volume hash. This hash is a running hash of all the files combined into a single hash value. If the TotalVolumeHash value does not match values computed in others, particularly the USGS ISISDATA source, then its likely the installation failed and/or files are corrupted/missing.
Initial Evaluation of ISISDATA Full Installation
The application has been tested on a full ISIS data download using the downloadIsisData and the rclone.conf file in the ISIS repository. The following commands were used to download/install and evaluate the complete ISISDATA installation on a remote mounted disk volume.
The output of isisdataeval indicates issues with ISISDATA install that was generated on or about December 2, 2022:
The runtime for this dataset is recorded in the Accounting group of the print.prt log file. I add this here to indicate how expensive computing file hashes are. Note that when selecting to compute hashes, isisdataeval reads the entire nearly 2TB of ISISDATA.
This run indicates 45 validation issues and 10 inventory errors. Generally all the kernels that have a
missing
status are truly missing, but for various reasons.For example, the missing Clementine IAK kernel is identified in this issue:
In this case, the kernel DB entry in
/opt/isisdatafull/clementine1/kernels/iak/kernels.0002.db
is:The contents of this directory are:
Note the problem is the filenames for the LWIR IAK use 4 digits rather than the "???" of the file spec. Note the Smart1 FK also is misconfigured in the same way.
In the case of inventory volume errors, there are 10 errors where all of the files apparently do not exist! What all these files have in common is they all have a
+
in their name. This is very bad because all filenames that are input into any ISIS application are processed by the FileName class which is designed to strip out ISIS cube file attributes that are indicated with a+
at the end of the filename. In those 10 cases, the+
and any remaining characters are stripped from the name when expanded creating a bad file name. This is certainly an issue when computing hash values. These files could have never been read by ISIS because of this! Essentially file names used by ISIS cannot contain a+
and must be banned.The full results (less the full ISISDATA inventory file, which is ~40MB), the output files are located here.
Comparison of Filtered ISISDATA via rclone
I then created the filtered version of ISISDATA using my rclone filter file, isisdata_rclone_filter_from.lis.
And the Accounting group for the filtered install:
The Results group of that run identified the same number of issues/errors as well as the same files contained in the full download of ISISDATA above. The files generated from this run are available here.
Summary
The isisdataeval documentation that describes this tool in greater detail. The documentation of the application is available in a raw gistl so you will have to download it and render it locally (is there a better way to share an HTML or PDF file?).
I think this application may also specifically address some of the needs identified in #5105 and #5109. It can also be useful in assisting users once this application becomes publically available.
How to reproduce
Possible Solution
Additional context
The text was updated successfully, but these errors were encountered: