-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for synthetic source #87416
Docs for synthetic source #87416
Conversation
Pinging @elastic/es-docs (Team:Docs) |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Pinging @elastic/es-search (Team:Search) |
This adds some basic docs for synthetic source both to get us started documenting it and to show how I'd like to get it documented - with a central section in the docs for `_source` and "satellite" sections in each of the supported field types that link back to the central section.
I've pushed docs for the remaining fields. |
For the most part none of those things are true for synthetic source. The debugging one is a little complicated and so is the corruption. But for the most part, no, synthetic source exists to solve those problems. |
I pushed some more stuff into the "text" page as well. |
"bar": 2, | ||
"baz": 3, | ||
"foo": 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading this, as much as I understand why you mention it, keys ordering is never a guarantee in JSON land. I wonder if it's then needed to provide an example. Could we shorten it and say "You should never rely on keys ordering, but if you do beware you'll lose that with synthetic source as what you'll get back is not 100% what you sent"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that. I'm providing an example for everything so I like the.... balance? I don't know the right word. I could link to the spec and mention that ordering isn't supported by the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion but while the examples with arrays ought to be mentioned, taking this one out would make the docs page a little more compact without losing much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a tie breaker. @romseygeek, break a tie!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No pressure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @javanna, we don't guarantee key ordering anywhere so I don't think an example is necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Democracy wins. I'll remove the example. I'll reduce this to a note.
{ | ||
"foo.bar.baz": 1 | ||
} | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what is specific of synthetic source about how fields are mapped: isn't this all about dynamic mappings? Did you mean to say that synthetic _source recreate the objects structure or not depending on how fields are mapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It recreates the structure precisely as the objects are mapped. At first I tried to explicitly create a field mapped as foo.bar.baz
but the mapping infrastructure unraveled it so I went with this. I'll make an example, one moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, check this out:
$ curl -uelastic:password -XDELETE localhost:9200/test
$ curl -uelastic:password -XPUT -HContent-Type:application/json localhost:9200/test -d'{
"mappings": {
"properties": {
"foo.bar.baz": {
"type": "keyword"
}
}
}
}'
$ curl -uelastic:password localhost:9200/test/_mappings?pretty
{"acknowledged":true}{"acknowledged":true,"shards_acknowledged":true,"index":"test"}{
"test" : {
"mappings" : {
"properties" : {
"foo" : {
"properties" : {
"bar" : {
"properties" : {
"baz" : {
"type" : "keyword"
}
}
}
}
}
}
}
}
}
Compare:
$ curl -uelastic:password -XDELETE localhost:9200/test
$ curl -uelastic:password -XPUT -HContent-Type:application/json localhost:9200/test -d'{
"mappings": {
"subobjects": false,
"properties": {
"foo.bar.baz": {
"type": "keyword"
}
}
}
}'
$ curl -uelastic:password localhost:9200/test/_mappings?pretty
{"acknowledged":true}{"acknowledged":true,"shards_acknowledged":true,"index":"test"}{
"test" : {
"mappings" : {
"subobjects" : false,
"properties" : {
"foo.bar.baz" : {
"type" : "keyword"
}
}
}
}
}
So I wrote the example this way because it was the shortest way to get objects with dots in the name. But maybe it's unclear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess I was wondering why this needs to be specifically explained, isn't it the expected behaviour? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it'd be useful to explain "this does the right thing with that thing Luca just built". But maybe it's not worth it because it's not something folks want much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, maybe this is not so obvious :)
One thing to mention could be that you always recreate the object structure despite dots in fields names were provided in the first place (unless subobjects are disabled). Why not always flatten by the way? After all, there are two ways to provide that document and that leads to the same mapping, you have 50% chances to pick the variant that was sent in the first place.
By the way we may start accepting docs with subobjects even when subobjects are disabled and treat them like dotted fields, but I think the current behaviour of "flattening" would still be good in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no! I removed the section!
Why not always flatten by the way?
We asked some kibana friends and they said folks liked things shaped this way instead of flattened. The objects feel right to folks. Also, I figured if we were guessing anyway the "more objecty" approach was probably more likely to match what users sent. I figured flattened fields with dots in them is more rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with me. I think you removed only the section with subobjects false and you could make the remaining section about recreating objects shorter by saying that you follow the mappings structure hence prefer nested objects over dots in fields names but the opposite when subobjects are disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you removed only the section with subobjects false and you could make the remaining section about recreating objects shorter by saying that you follow the mappings structure hence prefer nested objects over dots in fields names but the opposite when subobjects are disabled.
Two things:
- Don't use the word
nested
unless you mean it. I get scared. - Do you want me to change the words above? Could you suggest a change? I think the "dynamic mappings work like BLAH" bit is useful because it explains where those objects come from. I don't mean to explain all of dynamic mappings, just enough to give folks a hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚗
Thanks a bunch for adding these Nik. They look great.
Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
ifeval::["{release-state}"=="unreleased"] | ||
In some cases, particularly where reducing the volume of your stored data is a | ||
priority, you may consider either using <<synthetic-source,synthetic `_source`>> | ||
or <<disable-source-field,disabling the `_source` field>> altogether. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth quickly describing the difference between these two options, or people need to go to their corresponding pages to find out?
|
||
Though very handy to have around, the source field takes up a significant amount | ||
of space on disk. Instead of storing source documents on disk exactly as you | ||
send them, Elasticsearch can reconstruct source content on the fly. Enable this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elasticsearch can reconstruct source content on the fly upon retrieval?
====== Fields named as they are mapped | ||
Synthetic source names fields as they are named in the mapping. When used | ||
with <<dynamic,dynamic mapping>>, fields with dots (`.`) in their names are, by | ||
default, interpreted as multiple objects. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, while dots in field names are preserved within objects that have subobjects disabled (with a link to the docs for the subobjects paramater)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple small comments, LGTM otherwise. Thanks!
🤘 I'll integrate. |
This adds some basic docs for synthetic source both to get us started
documenting it and to show how I'd like to get it documented - with a
central section in the docs for
_source
and "satellite" sections ineach of the supported field types that link back to the central section.
Preview