-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make doc values accessible from update scripts #29290
Comments
Hey @bra-fsn, thanks for opening this feature request. I agree that would be a big saver, but the update api can only update the document based on the unmodified _source. A normalizer may have modified the original content, so doc values can't be used as replacement for the _source. Normalizers are not applied on numeric fields, but then you end up in cases where some doc value fields can be updated and some not, which makes it tricky. So this isn't something that we can support consistently in the near future with how handle the original content of a document today. I'll leave this open for others to chime in, but even without the above concern this is a high hanging fruit. |
Pinging @elastic/es-core-infra |
Pinging @elastic/es-distributed |
@martijnvg: Could you please give further details about what you've described above? Or you are talking about consistency issues (I don't know elastic internals), maybe that the doc values are updated differently than the doc itself, and a race condition could occur if more updates happen in parallel? (so for example in a painless script it's now guaranteed that the _source doesn't change while the script runs, but you can't guarantee this on doc values, so it may be that two subsequent calls to doc['field'] in the same script instance may give different results?) |
Sure and the way you propose this it kind of already is up to the user by using the square bracket notation (
No, that kind of problems will luckily not happen. If a normalizer is configured for a field and it changes the content of a doc values field at index time, then it is no longer the field value that was originally specified. Now if these doc values field are used to update a document, then this can lead to unexpected end result in the updated document. But besides this, with #29264 in, reading from doc values fields is no longer possible, because the the _source may be read from the translog instead of the lucene index and doc values are only available in the Lucene index. Forcing to read from the Lucene index in order to support this, is a bad idea, because that forces a refresh. |
You mean this should work (at least in current GA versions)?
I think it's OK. If somebody uses this, this should be evident. The same is true with every other doc values operation, which elastic current allows (like in the search path).
First of all, this is great, we're suffering from slow scripted updates. Glad to hear that the previous optimization will work again (for updates at least). |
Woops sorry I read the history of this conversation too quickly. |
Oh no :), but that is the same syntax we use in the other were scripts are allowed to access doc values. |
@jpountz : Thanks for joining. :) Also, it could help reindexing as well. For example we have some fields which we can't store in _source, because the DB would be so big, but we want to index the field. |
The fact that the update API doesn't allow to use doc values is so important to keep complexity reasonable that I'm reluctant to add such a feature. I know it's less convenient but it's always possible to implement the update logic on client-side by using versioning to make sure the right version of the document is being replaced. |
We discussed in FixItDay and agreed we do not want to introduce the complexity this would entail. I'm going to close this issue since there is technical consensus that we should not make such a change. |
There's another aspect here that I didn't see mentioned but I think is important for future readers - allowing access to lucene features (like doc values) collides with reading from the translog but also prevents us from shifting to running the update scripts on the coordinating nodes (#8369), either as a dedicated API or as part of bulk. |
Feature request:
Elasticsearch supports a very compact representation/storage of data: the doc values.
Sometimes it's useful to omit fields from _source and store them only in doc values, this can be a real space saver (#27374 (comment))
Of course when you do this, you can't do partial updates on documents, because the previous value of the field is only stored in doc values.
This could be solved by making doc values accessible from update scripts, so you could do a scripted update like this:
ctx._source.counter=doc['counter']+1;
It would be nice if doc values could be accessed this way in update scripts along with the previous version of _source.
The text was updated successfully, but these errors were encountered: