Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch mapping conflict issue when trying to change a type ([illegal_argument_exception]) #204

Closed
tloubrieu-jpl opened this issue Oct 28, 2024 · 10 comments · Fixed by NASA-PDS/registry-common#112
Assignees

Comments

@tloubrieu-jpl
Copy link
Member

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

Errors found in log:
While loading: https://pds-geosciences.wustl.edu/insight/urn-nasa-pds-insight_seis/data/xb/continuous_waveform/elyhk/2019/068/xb.elyhk.19.uk1.2019.068.6.a.xml

[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]

🕵️ Expected behavior

I expected the product to load without error.

📜 To Reproduce

Load the selected product in production.

🖥 Environment Info

No response

📚 Version of Software Used

harvest 4.0.2

🩺 Test Data / Additional context

See full log section provided by GEO node (Dan Scholes):

[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.19.uk1.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1850.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.19.uk1.2019.218.3.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.20.uk2.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1850.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.20.uk2.2019.218.3.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.21.uea.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1850.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.21.uea.2019.218.3.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd
[INFO] This LDD already loaded.
[INFO] Updating 'insight' LDD. Schema location: https://pds.nasa.gov/pds4/mission/insight/v1/PDS4_INSIGHT_1B00_1860.xsd
[INFO] This LDD already loaded.
[INFO] Updating Elasticsearch schema.
[ERROR] Request failed: [illegal_argument_exception] mapper [insight:Observation_Information/insight:release_number] cannot be changed from type [keyword] to [integer]
[INFO] Processing product \\isilon-pri-data\pds-san\data\insight\urn-nasa-pds-insight_seis\data\xb\continuous_waveform\elyhk\2019\218\xb.elyhk.22.uk2.2019.218.3.a.xml
[INFO] Updating LDDs.
[INFO] Updating 'pds' LDD. Schema location: https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1B00.xsd

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

Not sure what the response is here. It seems the opensearch mapping already thinks insight:Observation_Information/insight:release_number is a keyword. Once a mapping has been set it cannot be changed without a remap, hence the error. Did you want harvest to remember who set insight:Observation_Information/insight:release_number to a keyword? Did you want harvest to do a remap?

@tloubrieu-jpl
Copy link
Member Author

Hi @al-niessner ,

I guess the LDD was not available first the default keyword type was then assigned to the property.

That is a case which should occur frequently, but I believe sweeper or a central service of some sort should update the mapping and do the remap.

This is not preventing harvest from loading the product anyway ?

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

No idea about how or why it started as a keyword; I thought strings were the default with keywords for select items that have id in them but maybe not. As the mapping fills out, re-mappings should only result from changes in the PDS schema.

I do not know if there is a sweeper that does the remapping. Somebody used to do it by hand but it may have made it to a sweeper.

harvest cannot ingest material into opensearch if the types do not match. In this case, could probably get away with it, but, in general, cannot. The most pathological case is revision_number is an integer. Then the choice is made to move to semantic versioning so it becomes a string. There is no way to then push 1.2.3 into a field that opensearch thinks is an integer. Therefore, remapping has to take place before the document can be ingested.

@tloubrieu-jpl
Copy link
Member Author

tloubrieu-jpl commented Oct 29, 2024

To Be continued, this issue requires a serious design.
Options are:

  • don't load with default type
  • do everything as default in an initial import, refine type in a secondary index
  • ...

@jordanpadams jordanpadams changed the title OpenSearch mapping conflict OpenSearch mapping conflict issue when trying to change a type ([illegal_argument_exception]) Nov 7, 2024
@github-project-automation github-project-automation bot moved this to Sprint Backlog in B15.1 Nov 7, 2024
@alexdunnjpl
Copy link
Contributor

It seems like in the revision_number case, that a move from int to semver would justify using a differently-named field? If it's a problem elsewhere though, agreed.

From what @al-niessner is saying, there is no way to cast arbitrary field value to string-likes.

Changing a mapping is at this point (AOSS), a significant undertaking, requiring an ad-hoc migration of all documents since the reindex operation is not available. So whatever behaviour we implement should bias heavily toward ensuring that mappings only need to be altered when absolutely necessary (i.e. prefer not creating a mapping over creating a mapping with a default value).

So if harvest is applying defaults, that should be changed and the field be omitted from the mappings submitted to OpenSearch, and the user be made aware that someField will not be searchable until it is available in a published DD (at which point the reindexing sweeper will handle fixing that document).

@al-niessner
Copy link
Contributor

I do not think it will add documents already in an index that 'field' when 'field' is added. In that case, it will still require a re-index.

I concur - ditch harvest default. Quit until LDD is updated even if we read a local LDD because that puts it on the user side.

@tloubrieu-jpl
Copy link
Member Author

I see your point @al-niessner , but I would wait for @jordanpadams feedback on that because if we go that route, I agree that makes things simpler on our side, but that will also slow the intergration of products in our registry.

So it depends if we want :

  1. as much product as we can although not fully ready on the user side (and then not as searchable, more dev work to handle this cases)
  2. wait for the products to be fully validated.

@tloubrieu-jpl
Copy link
Member Author

@al-niessner, as a conclusion on this ticket, we need to make sure harvest does not assign default mapping type for fields which have no LDD because it very complicated to change the type of an indexed field in the opensearch we use.

The product with unknown type fields should be loaded but the unknown type fields must not be added to the mapping.

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

Do you want the message telling the user that it will not be searchable as an error, or a warning? Nobody reads warnings or below until it is too late, but for harvest it is clearly a warning while for the DB behavior it is clearly an error.

@tloubrieu-jpl
Copy link
Member Author

Hi @al-niessner ,

I would lean toward a warning in harvest, since it does not prevent the products from being loaded. This is an acceptable behavior.

Thanks

tloubrieu-jpl added a commit to NASA-PDS/registry-common that referenced this issue Dec 11, 2024
NASA-PDS/harvest#204: Throw error when field is not found in LDD vs. defaulting to keyword field
@github-project-automation github-project-automation bot moved this from ToDo to 🏁 Done in EN Portfolio Backlog Dec 11, 2024
@github-project-automation github-project-automation bot moved this from ToDo to 🏁 Done in B15.1 Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Done
Status: 🏁 Done
5 participants