Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

follow on to #118, registry-manager still unable to change archive status on bundle contents #136

Closed
plawton-umd opened this issue Dec 6, 2022 · 15 comments
Assignees
Labels
B13.1 bug Something isn't working s.high High severity

Comments

@plawton-umd
Copy link

plawton-umd commented Dec 6, 2022

🐛 Describe the bug

When changing a bundle's archive status to archives, the following error was displayed
[ERROR] Expected BEGIN_ARRAY but was STRING at line 1 column 166 path $._source.ref_lid_collection

This is a reopen of #118

📜 To Reproduce

See #118

  1. Set OPENSEARCH_URL to
    https://search-sbnumd-prod-o5i2rnn265gnwmv2quk4n7uram.us-west-2.es.amazonaws.com:443
    or equivalent

  2. Set AUTH_FILE to your authorization credentials file

  3. Run a command similar to

./registry-manager-4.6.0-SNAPSHOT/bin/registry-manager set-archive-status -status archived -lidvid "urn:nasa:pds:epoxi_mri::1.0" -es $OPENSEARCH_URL -auth $AUTH_FILE -l change_status.log

  1. see error
    [ERROR] Expected BEGIN_ARRAY but was STRING at line 1 column 166 path $._source.ref_lid_collection

🕵️ Expected behavior

I expect the ops:Tracking_Meta/ops:archive_status to be changed from "staged" to "archived"
for all collections and products in the bundle.

📚 Version of Software Used

./registry-manager-4.6.0-SNAPSHOT/bin/registry-manager -V
Registry Manager version: 4.6.0-SNAPSHOT
Build time: 2022-12-02T17:06:38Z

🩺 Test Data / Additional context

The log file contents were

2022-12-06 11:54:11,404 [INFO] Setting product status. LIDVID = urn:nasa:pds:epo
xi_mri::1.0, status = archived

The screen messages were

[INFO] Setting product status. LIDVID = urn:nasa:pds:epoxi_mri::1.0, status = archived
[ERROR] Expected BEGIN_ARRAY but was STRING at line 1 column 166 path $._source.ref_lid_collection

🏞Screenshots

🖥 System Info

  • OS: [e.g. iOS]
    Linux

  • Browser [e.g. chrome, safari]
    n/a

  • Version [e.g. 22]


🦄 Related requirements

⚙️ Engineering Details

@jordanpadams
Copy link
Member

@jimmie @tloubrieu-jpl @alexdunnjpl can we talk about this at the breakout today? do we need to update the SBN registry to make this work?

@jordanpadams jordanpadams transferred this issue from NASA-PDS/registry-mgr Dec 6, 2022
@jordanpadams jordanpadams changed the title registry-manager is not changing archive status on bundle contents follow on to #118, registry-manager still unable to change archive status on bundle contents Dec 6, 2022
@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Dec 6, 2022

@jordanpadams not sure I understand - this is a duplicate of #118 (and registry-common #29 and registry-mgr #57), resolved in registry-common #30

@alexdunnjpl
Copy link
Contributor

Regarding the existing data in the SBN registry, it will be necessary to re-harvest affected bundles with the --overwrite flag once the installation is updated to use a version of registry with the fix applied.

@jordanpadams
Copy link
Member

@alexdunnjpl sorry for confusion. the re-ingesting part was what I didn't get from the other tickets.

@plawton-umd can you try re-ingesting the bundle with the --overwrite flag and then try setting the archive status?

@alexdunnjpl
Copy link
Contributor

@plawton-umd in case it isn't clear, the re-ingestion will only work when using an updated version of the software.

@plawton-umd
Copy link
Author

@jordanpadams @alexdunnjpl If I understand correctly, the SBN registry needs to be updated to the new version of the registry software (in addition to the changes needed to deal with the shards issue). Once that registry software update is done, then I should re-ingest the data and attempt to update the archive status. When is the EN controlled SBN UMD's registry's software being updated? (Note: My availability for the next 3 days is limited, so next week can work - I am not attending AGU.)

@jordanpadams
Copy link
Member

@plawton-umd sorry for the confusion. we do not believe anything needs to be updated on our end. since you have the updated registry-mgr, you should just need to install the latest snapshot of harvest, and re-harvest / re-ingest the bundle that you are having difficulty with.

@jordanpadams
Copy link
Member

apologies for the inconvenience.

@plawton-umd
Copy link
Author

@jordanpadams Installed harvest-3.8.0-SNAPSHOT. Tried -O and a different time tried --overwrite. Same results - summary below. Via open search the products are still from the Oct 2022 load.

./harvest-3.8.0-SNAPSHOT/bin/harvest -O -o epoxi_mri/out -l epoxi_mri/harvest.log -c epoxi_mri/epoxi_mri.cfg
...
[WARN] Bundle urn:nasa:pds:epoxi_mri::1.0 already registered. Skipping.
...
[WARN] Collection urn:nasa:pds:epoxi_mri:hartley2_photometry::1.0 already registered. Skipping.
[INFO] Processing products...
[INFO] Skipping product ... epoxi_mri_v1.0_20210429_aip_v1.0.xml (LIDVID/LID is not in collection inventory or is already registered in Elasticsearch)
...
[SUMMARY] Summary:
[SUMMARY] Skipped files: 13
[SUMMARY] Loaded files: 0
[SUMMARY] Failed files: 0
[SUMMARY] Package ID: 9afdb467-e422-4984-ba67-edae5f21b7ce

Tried registry-manager anyway .... same error

./registry-manager-4.6.0-SNAPSHOT/bin/registry-manager set-archive-status -status archived -lidvid "urn:nasa:pds:epoxi_mri::1.0" -es $OPENSEARCH_URL -auth $AUTH_FILE -l mri_20221206c_status.log
[INFO] Setting product status. LIDVID = urn:nasa:pds:epoxi_mri::1.0, status = archived
[ERROR] Expected BEGIN_ARRAY but was STRING at line 1 column 166 path $._source.ref_lid_collection

Not sure what I am doing wrong.

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Dec 6, 2022

@plawton-umd it's unclear to me why harvest is ignoring --overwrite, but the skip warning messages you're seeing are subtly different to what I experience when trying to re-ingest without the --overwrite flag (included below for my own reference).

Could I trouble you to drop a copy of your epoxi_mri.cfg file here or in an email to me?

@jordanpadams @tloubrieu-jpl are there any differences that jump out at you with me running the main class directly from my IDE (see first line) vs the as-installed CLI command? I may have to do a proper installation from the snapshot to verify 1:1.

/usr/lib/java/openjdk/jdk-15.0.1/bin/java -javaagent:/snap/intellij-idea-ultimate/398/lib/idea_rt.jar=43749:/snap/intellij-idea-ultimate/398/bin -Dfile.encoding=UTF-8 -classpath /nomount/harvest/target/classes:/home/parallels/.m2/repository/commons-cli/commons-cli/1.4/commons-cli-1.4.jar:/home/parallels/.m2/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/home/parallels/.m2/repository/commons-codec/commons-codec/1.15/commons-codec-1.15.jar:/home/parallels/.m2/repository/org/apache/tika/tika-core/1.23/tika-core-1.23.jar:/home/parallels/.m2/repository/com/google/code/gson/gson/2.8.9/gson-2.8.9.jar:/home/parallels/.m2/repository/org/json/json/20210307/json-20210307.jar:/nomount/registry-common/target/registry-common-1.4.0-SNAPSHOT.jar:/home/parallels/.m2/repository/org/elasticsearch/client/elasticsearch-rest-client/7.15.1/elasticsearch-rest-client-7.15.1.jar:/home/parallels/.m2/repository/org/apache/httpcomponents/httpclient/4.5.10/httpclient-4.5.10.jar:/home/parallels/.m2/repository/org/apache/httpcomponents/httpcore/4.4.12/httpcore-4.4.12.jar:/home/parallels/.m2/repository/org/apache/httpcomponents/httpasyncclient/4.1.4/httpasyncclient-4.1.4.jar:/home/parallels/.m2/repository/org/apache/httpcomponents/httpcore-nio/4.4.12/httpcore-nio-4.4.12.jar:/home/parallels/.m2/repository/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar:/home/parallels/.m2/repository/org/apache/logging/log4j/log4j-api/2.17.1/log4j-api-2.17.1.jar:/home/parallels/.m2/repository/org/apache/logging/log4j/log4j-core/2.17.1/log4j-core-2.17.1.jar:/home/parallels/.m2/repository/org/apache/commons/commons-lang3/3.12.0/commons-lang3-3.12.0.jar gov.nasa.pds.harvest.HarvestMain -v DEBUG -c /nomount/harvest/.idea/plawton_umd_test_data/harvest-bundles.xml
[SUMMARY] Reading configuration from /nomount/harvest/.idea/plawton_umd_test_data/harvest-bundles.xml
[SUMMARY] Output directory: /tmp/harvest/out
[SUMMARY] Elasticsearch URL: https://localhost:9200, index: registry
[INFO] Connecting to Elasticsearch
[INFO] Loading PDS to ES data type mapping from /nomount/harvest/target/classes/elastic/data-dic-types.cfg
[INFO] Processing directory: /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/SUPPORT/NSSDCA/epoxi_mri_v1.0_20210429_sip_v1.0.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/SUPPORT/NSSDCA/epoxi_mri_v1.0_20210429_aip_v1.0.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/document/epoxi_photometry_v5.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/document/hartley2_mri_anomaly.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/data/azav_err.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/data/azav_phot.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/data/aper_phot.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/data/profile.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/data/aper_err.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/data/prof_err.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/overview.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry2/collection.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/document/epoxi_photometry_v5.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/document/hartley2_mri_anomaly.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/data/azav_err.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/data/azav_phot.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/data/aper_phot.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/data/profile.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/data/aper_err.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/data/prof_err.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/overview.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/hartley2_photometry/collection.xml
[INFO] Processing /nomount/harvest/.idea/plawton_umd_test_data/pds4-epoxi_mri-v1.0/bundle.xml
[WARN] Skipping registered product urn:nasa:pds:system_bundle:product_sip_deep_archive:epoxi_mri_v1.0_20210429::1.0
[WARN] Skipping registered product urn:nasa:pds:system_bundle:product_aip:epoxi_mri_v1.0_20210429::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:epoxi_photometry_v5::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:hartley2_mri_anomaly::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:azav_err::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:azav_phot::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:aper_phot::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:profile::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:aper_err::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:prof_err::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:overview::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry2::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:epoxi_photometry_v5::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:hartley2_mri_anomaly::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:azav_err::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:azav_phot::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:aper_phot::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:profile::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:aper_err::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:prof_err::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry:overview::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri:hartley2_photometry::1.0
[WARN] Skipping registered product urn:nasa:pds:epoxi_mri::1.0
[SUMMARY] Summary:
[SUMMARY] Skipped files: 23
[SUMMARY] Loaded files: 0
[SUMMARY] Failed files: 0
[SUMMARY] Package ID: fc1792f7-71f4-4c1f-98f1-c60ad47e8254

@jordanpadams
Copy link
Member

@plawton-umd @alexdunnjpl no idea. i would recommend installing and running from CLI and see what happens

sorry again @plawton-umd ! we will get to the bottom of this. thanks for being patient.

@tloubrieu-jpl @jimmie any chance you have some bandwidth to help test this bug and figure out what is happening here?

@tloubrieu-jpl
Copy link
Member

@jordanpadams @plawton-umd I will look at that tomorrow.

@tloubrieu-jpl
Copy link
Member

@plawton-umd I tried the --overwrite option on one f my harvest configuration file and that worked (it did not skip any file).

But I guess my configuration file is different from your and there might be a case that we overlooked while testing the --overwrite option.

Were you able to send your harvest configuration to @alexdunnjpl , if not can you send it to me ? or just attach it to this ticket (drag and drop works).

Note that until you are not able to reload the file in the registry (with harvest) you don't need to test the registry-mgr again since the error came originally from the way harvest loads the data in the registry.

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Dec 8, 2022

@plawton-umd , @alexdunnjpl shared your configuration file with me. The harvest/overwrite issue happens when you use the tag bundles to identify what needs to be harvested. It works when you use directories as in the following example:

 <directories>
    <path>/tmp/data/</path>
  </directories>

So a work-around for this ticket would be for you to use directories in your configuration.

We will work on the bundles/overwrite issue in a different ticket on harvest. See NASA-PDS/harvest#111

Sorry that you have to experience all these bugs.

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Dec 8, 2022

Fixed in NASA-PDS/harvest #113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B13.1 bug Something isn't working s.high High severity
Projects
None yet
Development

No branches or pull requests

5 participants