Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design Revisit] Projection handling of $vector #461

Closed
maheshrajamani opened this issue Jul 10, 2023 · 3 comments
Closed

[Design Revisit] Projection handling of $vector #461

maheshrajamani opened this issue Jul 10, 2023 · 3 comments
Assignees

Comments

@maheshrajamani
Copy link
Contributor

maheshrajamani commented Jul 10, 2023

$vector field is designed to come in the json document. Since this data is stored as separate field in the physical table, do this need to be stored as part of doc_json field? If it's not to be stored in doc_json, need to handle this for projection.

@sync-by-unito
Copy link

sync-by-unito bot commented Jul 27, 2023

➤ Mahesh Rajamani commented:

Aaron Morton Can you opine on this?

@maheshrajamani maheshrajamani changed the title [Design] Projection handling of $vector [Design Revisit] Projection handling of $vector Jul 28, 2023
@sync-by-unito
Copy link

sync-by-unito bot commented Aug 18, 2023

➤ Mahesh Rajamani commented:

Aaron Morton Do we need to revisit this?

@sync-by-unito
Copy link

sync-by-unito bot commented Aug 21, 2023

➤ Aaron Morton commented:

I think we should keep it in the doc_json even though it is a duplication.

The reason is that if we drive a CDC process off the table, and the entire document is in the doc_json field, we can easily just grab that one field. There may be other situations where we want to be able rebuild the document, and having the doc_json field only partially store the document would make that more error prone.

With my long term thinking hat on, we have a general rule that we only SELECT the doc_json field the other fields that store data extracted from the doc_json field are only used in the WHERE (and ORDER) fields - we only use the SAI indexes of those fields. If we keep this rule, we can make storage optimizations to avoid storing the column value and only store it in the SAI index. We BREAK this rule with the current implementation of sorting, but it would be a great way to reduce the on disk size.

@sync-by-unito sync-by-unito bot closed this as completed Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant