Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Job and datafeed mappings with index template #32594

Closed
wants to merge 4 commits into from

Conversation

davidkyle
Copy link
Member

Mappings for job and datafeed configurations and the index template.
For moving the configuration classes out of the clusterstate

@davidkyle davidkyle added >feature review :ml Machine learning labels Aug 2, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments and questions.

* is stored
* @return The index name
*/
public static String jobConfigIndexName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just configIndexName(), as it will also store datafeed configs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.field(TYPE, KEYWORD)
.endObject()
.startObject(Detector.CUSTOM_RULES_FIELD.getPreferredName())
.field(TYPE, NESTED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the nested document functionality here? If not then OBJECT would be lighter-weight. (Maybe at this level NESTED is useful - I'm not sure.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom rules are an array which is why I used nested

.field(ENABLED, false)
.endObject()
.startObject(DetectionRule.CONDITIONS_FIELD.getPreferredName())
.field(TYPE, NESTED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this level NESTED is definitely overkill and should be OBJECT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again conditions are an array. It's arguable that the extensive mappings are overkill anyway, not all fields need to be searchable and these mappings come with a maintenance burden. Is it conceivable that someone will want to search for a job where RuleCondition.AppliesTo == ACTUAL && RuleCondition.Operator == GT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there will be relatively few documents in the index, so we can get away with high-cost mappings. We have NESTED in our results index where it is much more costly due to the higher document volume and it's never been found to be the root cause of any problem, so it's probably fine.

.endObject()
.endObject()
.startObject(Detector.DETECTOR_INDEX.getPreferredName())
.field(TYPE, LONG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is INTEGER in other mappings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -175,14 +277,19 @@
*/
public static boolean isValidFieldName(String fieldName) {
String[] segments = DOT_PATTERN.split(fieldName);
return !RESERVED_FIELD_NAMES.contains(segments[0]);
return RESERVED_RESULT_FIELD_NAMES.contains(segments[0]) == false && RESERVED_CONFIG_FIELD_NAMES.contains(segments[0]) == false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary for the config field names to be included here? It will prevent fields being added to results documents just because they clash with a config field name. But that's not necessary because the config and results are in different indices. Is there another reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

.startObject(Job.RESULTS_INDEX_NAME.getPreferredName())
.field(TYPE, KEYWORD)
.endObject()
.startObject(Job.DELETED.getPreferredName()) // TODO can this be removed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question. At the moment this is what other APIs use to determine that a job is in the process of being deleted. Maybe storing that flag in an index won't be sufficiently atomic in the future. An alternative might be to make the job deletion process a persistent task, and use the existence of that persistent task to determine whether a job is being deleted. This field is probably just one of several places where indices won't give the same ordering guarantee that cluster state gave us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the loose eventually consistent guarantees this field is useless, we have considered other ways to mark the fact that a job is in the process of deletion I'd like to remove this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the DELETED field from the mapping

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :ml Machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants