Skip to content

Registry Workshop 2022‐06‐28

thomas loubrieu edited this page Aug 22, 2023 · 1 revision

Walk through a basic procedure for publishing data to your PDS Node Registry.

Meeting Recording: https://jpl.webex.com/jpl/ldr.php?RCID=5e8a6ef8c4de62963b71ab22f39f0964

How to publish your data in PDS ?

Prerequisites

Basic understanding of the architecture:

Picture1

Install PDS Registry tools

See installation guide https://nasa-pds.github.io/registry/install/tools.html#tools

  • Registry Manager
  • Harvest

See Online Docs Here

Check your access to your OpenSearch database

From an authorized IP on your institution network.

For example:

% curl -u tloubrieu_en 'https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com'
Enter host password for user 'tloubrieu_en':
{
  "name" : "c297449108b402887f1dfbd4d66c2ea6",
  "cluster_name" : "445837347542:sbnpsi-prod",
  "cluster_uuid" : "GmDxA8ULQNy6JOdBBsRV5A",
  "version" : {
    "number" : "7.10.2",
    "build_type" : "tar",
    "build_hash" : "unknown",
    "build_date" : "2022-04-15T09:52:37.749040Z",
    "build_snapshot" : false,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

OpenSearch Endpoints

Node OpenSearch URI OpenSearch Dashboard URI (aka Kibana)
ATM https://search-atm-prod-mkvgzojag2ta65bnotqdpopzju.us-west-2.es.amazonaws.com https://search-atm-prod-mkvgzojag2ta65bnotqdpopzju.us-west-2.es.amazonaws.com/_dashboards
EN https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/_dashboards
GEO https://search-geo-prod-6iz6lwiw6luyffpsq52ndsrtbu.us-west-2.es.amazonaws.com https://search-geo-prod-6iz6lwiw6luyffpsq52ndsrtbu.us-west-2.es.amazonaws.com/_dashboards
IMG https://search-img-prod-tlnl5qlzgk5iknwhiemnc6aogy.us-west-2.es.amazonaws.com https://search-img-prod-tlnl5qlzgk5iknwhiemnc6aogy.us-west-2.es.amazonaws.com/_dashboards
NAIF https://search-naif-prod-pm7hsg36wqejex3whlnpj3d6ma.us-west-2.es.amazonaws.com https://search-naif-prod-pm7hsg36wqejex3whlnpj3d6ma.us-west-2.es.amazonaws.com/_dashboards
RMS https://search-rms-prod-hgkpolys7ww6cdoogbx5gsfy6m.us-west-2.es.amazonaws.com https://search-rms-prod-hgkpolys7ww6cdoogbx5gsfy6m.us-west-2.es.amazonaws.com/_dashboards
SBN (PSI) https://search-sbnpsi-prod-egowc5td43xn744siksghckq4i.us-west-2.es.amazonaws.com https://search-sbnpsi-prod-egowc5td43xn744siksghckq4i.us-west-2.es.amazonaws.com/_dashboards
SBN (UMD) https://search-sbnumd-prod-o5i2rnn265gnwmv2quk4n7uram.us-west-2.es.amazonaws.com https://search-sbnumd-prod-o5i2rnn265gnwmv2quk4n7uram.us-west-2.es.amazonaws.com/_dashboards

Identify the bundle you want to work with

Workshop

Create Your OpenSearch Authentication File

  • Create a new file named ‘auth.cfg‘

    • This can be anywhere, but you will need this file anytime you register data, so your $HOME directory is probably a good spot, or somewhere else you can easily access.
  • Add the following to the file:

# true - trust self-signed certificates; false - don't trust.
trust.self-signed = true
user = <your personal user name>
password = <you personal password>
  • Save the path that that auth.cfg file next step, for example: /Users/loubrieu/Documents/pds/registry_workshop_20220628/auth.cfg

See Online Docs Here

Prepare your harvest configuration file

  • Start from an example found in the harvest installation folder:
cp ${HARVEST_HOME}/conf/examples/bundles.xml  jobs/my-bundle-job.xml
  • Edit your job file jobs/my-bundle-job.xml:

  • Example of base URL:

| IMG Node | https://pds-imaging.jpl.nasa.gov/data/ | | EN | https://pds.nasa.gov/data/ |

Save the path for next step, for example: ./jobs/my-bundle-job.xml

Load the data in your registry

harvest -c jobs/my-bundle-job.xml

See Online Docs Here

Check that the data is available in the OpenSearch database

Get:

  • The URL of your OpenSearch database
  • The lidvid of your bundle and the URL of your bundle
curl -u tloubrieu_en 'https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q={_id:"urn:nasa:pds:insight_rad::2.1"}&pretty=true' \ 
                     | json_pp
  • Check that the file’s URL are reachable

See Online Docs Here

Check that the data is visible in the PDS Search API

In your node's API:

curl https://pds.nasa.gov/api/search-en/1.0/products/urn:nasa:pds:insight_rad::2.1

See other Node API endpoints: https://nasa-pds.github.io/pds-api/search-api-user-guide/endpoints.html#endpoints

Across the PDS:

curl https://pds.nasa.gov/api/search/1.0/products/urn:nasa:pds:insight_rad::2.1

It is not!

Why? Because the archive_status is “staged”

Find the status in the OpenSearch request result.

See Online Docs Here

Update the archive status (registry-manager)

  • First explore registry manager options:
registry-manager --help
registry-manager set-archive-status -help
  • Set the status to ‘archived’:
registry-manager set-archive-status -es https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443 -auth auth.cfg -lidvid urn:nasa:pds:insight_rad::2.1 -status archived

See Online Docs Here

Test access through the PDS API

On your Node’s API server:

curl https://pds.nasa.gov/api/search-en/1.0/products/urn:nasa:pds:insight_rad::2.1

On PDS API server:

curl https://pds.nasa.gov/api/search/1.0/products/urn:nasa:pds:insight_rad::2.1

or view through a web browser: https://pds.nasa.gov/api/search/1.0/products/urn:nasa:pds:insight_rad::2.1

Query someone else’s lidvid. Share your lidvd=ids in the webex chat and query for someone’s else's registered bundle product

curl https://pds.nasa.gov/api/search/1.0/products/urn:nasa:pds:insight_rad::2.1

Other query examples:

Query for the Bundle’s Collections:

curl https://pds.nasa.gov/api/search/1.0/bundles/urn:nasa:pds:insight_rad::2.1/collections

Query for Bundle’s Products:

curl https://pds.nasa.gov/api/search/1.0/bundles/urn:nasa:pds:insight_rad::2.1/products

See Search API User Guide for more details.

Delete the data (harvest delete-data):

If this is test data that you don’t want to leave in your registry.

  • Find the package-id in the OpenSearch results by querying the Registry again, and searching for "package-id".
    • Also known as a "run ID", this package ID can be used to access/remove past ingestions.
curl -u tloubrieu_en 'https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry/_search?q={_id:"urn:nasa:pds:insight_rad::2.1"}' \ 
                     | json_pp
  • Then based upon that package ID, you will
registry-manager delete-data -es https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443 -auth auth.cfg -packageId 3e755f49-0cde-4d80-bfe8-020fa6537a36

References

Clone this wiki locally