Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(strucutred-properties): structured properties delete and schema change support #10711

Conversation

david-leifker
Copy link
Collaborator

@david-leifker david-leifker commented Jun 14, 2024

Structured Properties Delete & Breaking Change Support

  • Structured property indexing change to include a version and type field

    • Backwards compatible with legacy non-versioned properties
    • Version must be monotonically increasing by lexicographic case insensitive sort
    • Backwards incompatible changes allowed as long as version is increased
    • Old values are not removed, on breaking change target ES index field changes
    • Only the single latest version is use-able in the ES index
  • Structured property hard delete implemented via MCP SideEffect post transaction

    • Relies on entity search to identify and generate Patch MCPs for existing values applied to entities
    • Hard delete on the structured property removes all versions and types
    • Index mapping clean-up. Requires reindex after removal of all non-active values.
  • Bug Fixes

    • Filtering on string structured property values

Old Index Field Path
structuredProperties.<qualified name>
New Index Field Path
structuredProperties._versioned.<qualified name>.<version>.<type>

<qualified name> - Structured property qualified name with . replaced with _ to be compatible with ES index field naming restrictions.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment smoke_test Contains changes related to smoke tests labels Jun 14, 2024
@david-leifker david-leifker force-pushed the structured-properties-schema-change branch from b16ea78 to f1c6cc6 Compare June 15, 2024 01:29
@david-leifker david-leifker marked this pull request as ready for review June 15, 2024 01:30
@david-leifker david-leifker force-pushed the structured-properties-schema-change branch from f1c6cc6 to acd7214 Compare June 15, 2024 13:36
@david-leifker david-leifker force-pushed the structured-properties-schema-change branch from acd7214 to 6437f63 Compare June 15, 2024 13:51
@david-leifker david-leifker force-pushed the structured-properties-schema-change branch from 6437f63 to 725c0a9 Compare June 17, 2024 22:28
@david-leifker david-leifker changed the title feat(strucutred-properties): structured properties schema change support feat(strucutred-properties): structured properties delete and schema change support Jun 17, 2024
@david-leifker david-leifker force-pushed the structured-properties-schema-change branch from 725c0a9 to 3154abf Compare June 17, 2024 23:12
* Structured property indexing change to include a version and type field
  * Backwards compatible with legacy non-versioned properties
  * Version must be monotonically increasing by lexicographic case insensitive sort
  * Backwards incompatible changes allowed as long as version is increased
  * Old values are not removed, on breaking change target ES index field changes
  * Only the single latest version is use-able in the ES index
* Structured property hard delete implemented via MCP SideEffect post transaction
  * Relies on entity search to identify and generate Patch MCPs for existing values applied to entities
  * Hard delete on the structured property removes all versions and types
  * Elasticsearch reindexing process can remove deleted structured property mappings
* Bug Fixes:
  * Filtering on string structured property values
@david-leifker david-leifker force-pushed the structured-properties-schema-change branch from 3154abf to c5f5fb0 Compare June 18, 2024 00:07
@david-leifker david-leifker merged commit 333799c into datahub-project:master Jun 18, 2024
63 of 65 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
yoonhyejin pushed a commit that referenced this pull request Jul 16, 2024
@Nbagga14
Copy link
Contributor

@david-leifker
after this code change we are facing the particular issue regarding case insensitive searches , can you please help me with the fix?
We are encountering an issue with the /entities?action=search endpoint where search results are returned only if there is an exact case match. For instance, if we search for data indexed with the value "Test", results are returned only when the search query is "Test". Queries such as "TEST" or "test" do not return any results. We have attempted to modify the ELASTICSEARCH_QUERY_EXACT_MATCH_CASE_FACTOR environment variable to both 0.0 and 1.0, but this has not resolved the issue. Could you please advise on how to configure the search to be case-insensitive, allowing us to fetch results regardless of the case of the indexed data? Thank you.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX smoke_test Contains changes related to smoke tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants