-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the number of nested documents #27405
Limit the number of nested documents #27405
Conversation
bf33198
to
a5c68ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mayya-sharipova, I did a first round of review and left a couple of comments, mostly minor. The change looks good already, can you please also add test that show that the default values work as expected (e.g. intergration tests) and maybe a rest test for a small setting (like the ones in the unit test you already have)
@@ -305,6 +305,10 @@ public void addDynamicMapper(Mapper update) { | |||
|
|||
private SeqNoFieldMapper.SequenceIDFields seqID; | |||
|
|||
private final Long maxAllowedNumNestedDocs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can probably be a primitive long
@@ -305,6 +305,10 @@ public void addDynamicMapper(Mapper update) { | |||
|
|||
private SeqNoFieldMapper.SequenceIDFields seqID; | |||
|
|||
private final Long maxAllowedNumNestedDocs; | |||
|
|||
private Long numNestedDocs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can probably be a primitive long
@@ -321,6 +325,8 @@ public InternalParseContext(@Nullable Settings indexSettings, DocumentMapperPars | |||
this.version = null; | |||
this.sourceToParse = source; | |||
this.dynamicMappers = new ArrayList<>(); | |||
this.maxAllowedNumNestedDocs = indexSettings.getAsLong(MapperService.INDEX_MAPPING_NESTED_DOCS_LIMIT_SETTING.getKey(), 10000L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there is a way to get the default value from the MapperService.INDEX_MAPPING_NESTED_DOCS_LIMIT_SETTING itself, since it seems to be defined there. Would be good to not have to define the constant in two places in case it needs changing in the future. I did some initial digging but need to understand this a bit better myself I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbuescher thanks Christoph! I agree it is strange to define the constant in two places, but so far did not find the way. I will investigate more the way to define it a single place only.
@@ -366,6 +372,13 @@ public Document doc() { | |||
|
|||
@Override | |||
protected void addDoc(Document doc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, is this only called for nested documents? I see there is another ParseContext implementation with "addDoc()", I guess that one isn't used for parsing nested documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbuescher Yes, this method only invoked to add nested documents. The Lucene document for the non nested parts of the document being indexed is added to the documents
list in the constructor of this class.
@@ -524,4 +523,95 @@ public void testParentObjectMapperAreNested() throws Exception { | |||
assertFalse(objectMapper.parentObjectMapperAreNested(mapperService)); | |||
} | |||
|
|||
public void testLimitNestedDocs() throws Exception{ | |||
// setting limit to allow only two nested objects in the whole doc | |||
Long nested_docs_limit = 2L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: long
, otherwise it gets outboxed later anyway
DocumentMapper docMapper = mapperService.documentMapperParser().parse("type", new CompressedXContent(mapping)); | ||
|
||
// parsing a doc with 2 nested objects succeeds | ||
SourceToParse source1 = SourceToParse.source("test1", "type", "1", XContentFactory.jsonBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion so make this kind of xContent setup easier to read and less prone to accidental reformatting. I sometimes like to do something a little bit more verbose like this:
XContentBuilder docBuilder = XContentFactory.jsonBuilder();
docBuilder.startObject();
{
docBuilder.startArray("nested1");
{
docBuilder.startObject().field("field1", "11").field("field2", "21").endObject();
docBuilder.startObject().field("field1", "12").field("field2", "22").endObject();
}
docBuilder.endArray();
}
docBuilder.endObject();
SourceToParse source1 = SourceToParse.source("test1", "type", "1", docBuilder.bytes(), XContentType.JSON);
It allows to add some indentation that won't break so easily when reformatting the source.
); | ||
} | ||
|
||
public void testLimitNestedDocsMultipleNestedFields() throws Exception{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting (and a great test by the way), because I just realized that the maximum number of nested docs check is applied across all nested fields in the document. I didn't immediately get that from the desciption in #26962 or the PR so far, maybe worth adding to the migration notest and docs as well.
The maximum number of `nested` json objects within a single document across | ||
all nested fields, defaults to 10000. Indexing one document with an array of | ||
100 objects within a nested field, will actually create 101 documents, as | ||
each nested object will be indexed as a separate hidden document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add something that this limit applies accros all nested fields in a document. (see above)
@cbuescher Christoph, thanks for your review. I have tried to address your comments in the 2nd commit, can you please continue/finalize the review when you have available time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayya-sharipova thanks a lot, this looks very close. I left one more comment about testing the default setting and two nits, scratch my previous comment about "migration notest", I didn't see they were already there.
"Indexing a doc with No. nested objects less or equal to index.mapping.nested_objects.limit should succeed": | ||
- skip: | ||
version: " - 6.99.99" | ||
reason: index.mapping.nested_objects setting was introduced from this version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you slightly rephrase so the version this feature was introduced is in the "reason" description (e.g. something like "the XYZ feature has been added in 7.0.0")? I think the skip "reason" is printed when running the tests and it might be useful to see the reason including the version at a glance.
"Indexing a doc with No. nested objects more than index.mapping.nested_objects.limit should fail": | ||
- skip: | ||
version: " - 6.99.99" | ||
reason: index.mapping.nested_objects setting was introduced from this version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
.endObject().endObject().endObject().string(); | ||
DocumentMapper docMapper = mapperService.documentMapperParser().parse("type", new CompressedXContent(mapping)); | ||
|
||
// parsing a doc with 3 nested objects succeeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the case where #nested_docs < limit is already covered by the later tests, could you change this to that the the document has more than the default number of nested docs (simple loop) and gets rejected? This way we can test the default value works if no explicit setting is there. I'd also check if that unit test then takes a lot more time (which I doubt, gut feeling check is okay here I think)
Add an index level setting `index.mapping.nested_objects.limit` to control the number of nested json objects that can be in a single document across all fields. Defaults to 10000. Throw an error if the number of created nested documents exceed this limit during the parsing of a document. Closes elastic#26962
Add an index level setting `index.mapping.nested_objects.limit` to control the number of nested json objects that can be in a single document across all fields. Defaults to 10000. Throw an error if the number of created nested documents exceed this limit during the parsing of a document. Closes elastic#26962
Add an index level setting `index.mapping.nested_objects.limit` to control the number of nested json objects that can be in a single document across all fields. Defaults to 10000. Throw an error if the number of created nested documents exceed this limit during the parsing of a document. Closes elastic#26962
4984450
to
e45f7bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mayya-sharipova thanks a lot, this looks amost ready, I'm just a bit confused about the new test that checks the default setting and why it works (see line comment). Maybe you can clarify that, other than that I think this PR is ready.
docBuilder.startObject().field("field1", "11").field("field2", "21").endObject(); | ||
docBuilder.startObject().field("field1", "12").field("field2", "22").endObject(); | ||
docBuilder.startObject().field("field1", "13").field("field2", "23").endObject(); | ||
for(int i=0; i<=defaultMaxNoNestedDocs; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: some whitespaces around the operators etc... would be nice
docBuilder.startObject().field("field1", "12").field("field2", "22").endObject(); | ||
docBuilder.startObject().field("field1", "13").field("field2", "23").endObject(); | ||
for(int i=0; i<=defaultMaxNoNestedDocs; i++) { | ||
docBuilder.startObject().field("f", i).endObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suprised this triggers the no. fields warnings since "f" is not the field thats defined with the "nested" type above? I would have expected this to not trigger the limit but the exception tested below seems to indicate it triggers the setting. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbuescher Please correct me if I misunderstood your previous comment: "could you change this to that the the document has more than the default number of nested docs (simple loop) and gets rejected".
This is what this new test is about.
We have a parent nested field nested1
under which we have many json objects with a field f
, and since the number of these json objects is more than the default number, we get an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I probably read the test to quickly and missed the startArray() part. You are absolutely right, this is how I expected the test. Sorry for the noise.
} | ||
|
||
public void testLimitNestedDocs() throws Exception{ | ||
// setting limit to allow only two nested objects in the whole doc | ||
long nested_docs_limit = 2L; | ||
long maxNoNestedDocs = 2L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just as feedback, nothing to change really, but I liked the previous variable name better ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbuescher thanks for the feedback, I thought in the previous variable names I did not follow Java camel case standards for variable names (was more of python style)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, and you are right, camel case is preferred. I probably misread the "NoNestedDocs" part of the name as "no nested docs" and that confused me for a second, but either way is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbuescher thanks Christoph, you are rightNoNestedDocs
is confusing, I will avoid using No
for "number of" in the future.
Add an index level setting `index.mapping.nested_objects.limit` to control the number of nested json objects that can be in a single document across all fields. Defaults to 10000. Throw an error if the number of created nested documents exceed this limit during the parsing of a document. Closes elastic#26962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for bearing with me @mayya-sharipova ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add an index level setting
index.mapping.nested_objects.limit
to controlthe number of nested json objects that can be in a single document
across all fields. Defaults to 10000.
Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.
Closes #26962