-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Prevent fetching multi-field from source #48770
[ML] Prevent fetching multi-field from source #48770
Conversation
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates elastic#48756
Pinging @elastic/ml-core (:ml) |
@@ -283,10 +283,12 @@ public void testLookbackOnlyWithSourceDisabled() throws Exception { | |||
new LookbackOnlyTestHelper("test-lookback-only-with-source-disabled", "airline-data-disabled-source").execute(); | |||
} | |||
|
|||
@AwaitsFix(bugUrl = "This test uses painless which is not available in the integTest phase") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that since this test was written the test's environment is different and we have access to painless now. I re-enabled the test.
run elasticsearch-ci/1 (unrelated failure in SnapshotIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor nits around unnecessary hashset construction.
Overall, the interfaces seem cleaner.
private final T falseValue; | ||
|
||
BooleanMapper(ExtractedField field, T trueValue, T falseValue) { | ||
super(field.getName(), Collections.singleton(BooleanFieldMapper.CONTENT_TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like Collections.singleton(BooleanFieldMapper.CONTENT_TYPE)
could be a private static final field.
static final String TYPE = "geo_point"; | ||
|
||
public GeoPointField(String name) { | ||
super(name, Collections.singleton(TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#nit
this Collections.singleton(TYPE)
and the others that are simply passed up to super
could be private static final
so that a new HashSet isn't built on every ctor
call
run elasticsearch-ci/2 (unrelated failure) |
run elasticsearch-ci/default-distro |
1 similar comment
run elasticsearch-ci/default-distro |
@elasticmachine update branch |
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates elastic#48756 Backport of elastic#48770
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates elastic#48756 Backport of elastic#48770
In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that elastic#48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes elastic#48756
In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that #48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes #48756
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates #48756 Backport of #48770
Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates #48756 Backport of #48770
…48799) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that elastic#48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes elastic#48756 Backport of elastic#48799
…48799) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that elastic#48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes elastic#48756 Backport elastic#48799
…48806) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that #48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes #48756 Backport of #48799
…48807) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that #48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes #48756 Backport #48799
Aggregatable mutli-fields are at the moment wrongly mapped
as normal doc_value fields and thus they support fetching from
source. However, they do not exist in the source. This results
to failure to extract such fields.
This commit fixes this bug. While a fix could be worked out
on top of the existing code, it is evident the extraction logic
has become difficult to understand and maintain. As we also
want to deduplicate multi-fields for data frame analytics,
it seemed appropriate to refactor the code to simplify and
better handle the extraction of multi-fields.
Relates #48756