Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for synthetic source #87416

Merged
merged 29 commits into from
Jun 9, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 6, 2022

This adds some basic docs for synthetic source both to get us started
documenting it and to show how I'd like to get it documented - with a
central section in the docs for _source and "satellite" sections in
each of the supported field types that link back to the central section.

Preview

@nik9000 nik9000 added >docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics v8.4.0 labels Jun 6, 2022
@elasticmachine elasticmachine added Team:Docs Meta label for docs team Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jun 6, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

This adds some basic docs for synthetic source both to get us started
documenting it and to show how I'd like to get it documented - with a
central section in the docs for `_source` and "satellite" sections in
each of the supported field types that link back to the central section.
@nik9000 nik9000 requested review from davidkyle and kilfoyle and removed request for davidkyle June 6, 2022 17:21
@nik9000
Copy link
Member Author

nik9000 commented Jun 6, 2022

I've pushed docs for the remaining fields.

@nik9000 nik9000 mentioned this pull request Jun 6, 2022
50 tasks
@nik9000
Copy link
Member Author

nik9000 commented Jun 7, 2022

One thing I forgot to ask... near the bottom of the main _source field page there's the warning "Think before disabling the _source field". If those restrictions apply to synthetic source as well, we may want to move that warning up to the top of the page, or something like that.

For the most part none of those things are true for synthetic source. The debugging one is a little complicated and so is the corruption. But for the most part, no, synthetic source exists to solve those problems.

@nik9000
Copy link
Member Author

nik9000 commented Jun 7, 2022

I pushed some more stuff into the "text" page as well.

@nik9000 nik9000 requested a review from kilfoyle June 7, 2022 15:15
"bar": 2,
"baz": 3,
"foo": 1
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading this, as much as I understand why you mention it, keys ordering is never a guarantee in JSON land. I wonder if it's then needed to provide an example. Could we shorten it and say "You should never rely on keys ordering, but if you do beware you'll lose that with synthetic source as what you'll get back is not 100% what you sent"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that. I'm providing an example for everything so I like the.... balance? I don't know the right word. I could link to the spec and mention that ordering isn't supported by the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion but while the examples with arrays ought to be mentioned, taking this one out would make the docs page a little more compact without losing much?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a tie breaker. @romseygeek, break a tie!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pressure :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @javanna, we don't guarantee key ordering anywhere so I don't think an example is necessary here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Democracy wins. I'll remove the example. I'll reduce this to a note.

{
"foo.bar.baz": 1
}
----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what is specific of synthetic source about how fields are mapped: isn't this all about dynamic mappings? Did you mean to say that synthetic _source recreate the objects structure or not depending on how fields are mapped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It recreates the structure precisely as the objects are mapped. At first I tried to explicitly create a field mapped as foo.bar.baz but the mapping infrastructure unraveled it so I went with this. I'll make an example, one moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, check this out:

$ curl -uelastic:password -XDELETE localhost:9200/test
$ curl -uelastic:password -XPUT -HContent-Type:application/json localhost:9200/test -d'{
  "mappings": {
    "properties": {
      "foo.bar.baz": {
        "type": "keyword"
      }
    }
  }
}'
$ curl -uelastic:password localhost:9200/test/_mappings?pretty
{"acknowledged":true}{"acknowledged":true,"shards_acknowledged":true,"index":"test"}{
  "test" : {
    "mappings" : {
      "properties" : {
        "foo" : {
          "properties" : {
            "bar" : {
              "properties" : {
                "baz" : {
                  "type" : "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

Compare:

$ curl -uelastic:password -XDELETE localhost:9200/test
$ curl -uelastic:password -XPUT -HContent-Type:application/json localhost:9200/test -d'{
  "mappings": {
    "subobjects": false,
    "properties": {
      "foo.bar.baz": {
        "type": "keyword"
      }
    }
  }
}'
$ curl -uelastic:password localhost:9200/test/_mappings?pretty
{"acknowledged":true}{"acknowledged":true,"shards_acknowledged":true,"index":"test"}{
  "test" : {
    "mappings" : {
      "subobjects" : false,
      "properties" : {
        "foo.bar.baz" : {
          "type" : "keyword"
        }
      }
    }
  }
}

So I wrote the example this way because it was the shortest way to get objects with dots in the name. But maybe it's unclear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess I was wondering why this needs to be specifically explained, isn't it the expected behaviour? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it'd be useful to explain "this does the right thing with that thing Luca just built". But maybe it's not worth it because it's not something folks want much?

Copy link
Member

@javanna javanna Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, maybe this is not so obvious :)
One thing to mention could be that you always recreate the object structure despite dots in fields names were provided in the first place (unless subobjects are disabled). Why not always flatten by the way? After all, there are two ways to provide that document and that leads to the same mapping, you have 50% chances to pick the variant that was sent in the first place.

By the way we may start accepting docs with subobjects even when subobjects are disabled and treat them like dotted fields, but I think the current behaviour of "flattening" would still be good in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no! I removed the section!

Why not always flatten by the way?

We asked some kibana friends and they said folks liked things shaped this way instead of flattened. The objects feel right to folks. Also, I figured if we were guessing anyway the "more objecty" approach was probably more likely to match what users sent. I figured flattened fields with dots in them is more rare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with me. I think you removed only the section with subobjects false and you could make the remaining section about recreating objects shorter by saying that you follow the mappings structure hence prefer nested objects over dots in fields names but the opposite when subobjects are disabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you removed only the section with subobjects false and you could make the remaining section about recreating objects shorter by saying that you follow the mappings structure hence prefer nested objects over dots in fields names but the opposite when subobjects are disabled.

Two things:

  1. Don't use the word nested unless you mean it. I get scared.
  2. Do you want me to change the words above? Could you suggest a change? I think the "dynamic mappings work like BLAH" bit is useful because it explains where those objects come from. I don't mean to explain all of dynamic mappings, just enough to give folks a hint.

Copy link
Contributor

@kilfoyle kilfoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚗
Thanks a bunch for adding these Nik. They look great.

nik9000 and others added 3 commits June 7, 2022 16:47
ifeval::["{release-state}"=="unreleased"]
In some cases, particularly where reducing the volume of your stored data is a
priority, you may consider either using <<synthetic-source,synthetic `_source`>>
or <<disable-source-field,disabling the `_source` field>> altogether.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth quickly describing the difference between these two options, or people need to go to their corresponding pages to find out?


Though very handy to have around, the source field takes up a significant amount
of space on disk. Instead of storing source documents on disk exactly as you
send them, Elasticsearch can reconstruct source content on the fly. Enable this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch can reconstruct source content on the fly upon retrieval?

====== Fields named as they are mapped
Synthetic source names fields as they are named in the mapping. When used
with <<dynamic,dynamic mapping>>, fields with dots (`.`) in their names are, by
default, interpreted as multiple objects. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, while dots in field names are preserved within objects that have subobjects disabled (with a link to the docs for the subobjects paramater)

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple small comments, LGTM otherwise. Thanks!

@nik9000
Copy link
Member Author

nik9000 commented Jun 9, 2022

I left a couple small comments, LGTM otherwise. Thanks!

🤘

I'll integrate.

@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 9, 2022
@elasticsearchmachine elasticsearchmachine merged commit b18bafb into elastic:master Jun 9, 2022
@nik9000 nik9000 deleted the synthetic_source_docs_1 branch June 9, 2022 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Docs Meta label for docs team Team:Search Meta label for search team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants