Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] Add category anomalies to anomalies page #70982

Merged

Conversation

Kerry350
Copy link
Contributor

@Kerry350 Kerry350 commented Jul 7, 2020

Summary

Implements #64755 and #65046.

Screenshot 2020-07-07 17 37 33

Screenshot 2020-07-07 17 37 19

Overview

  • Implements an API endpoint for retrieving anomalies.
  • Sorting and pagination has been moved from the client side to the anomalies API.
    • The logic around next / previous page is encapsulated in the client side hook.
    • The ES "hack" to reverse the sort order and use search_after is used to implement paging backwards.
  • Category anomalies are rendered in the table.
  • LogEntryRateExample resources have been renamed LogEntryExample as they're in no way specific to log rate.

Notes / caveats

  • I was unsure on the best route to take regarding the document tiebreaker for the sorting. Usually we would use the source configurations' tiebreaker property, but the anomalies are ML documents and not Filebeat documents. I'd considered the _id, but the docs advise strongly against this unless the _id is duplicated in another field that has doc value enabled, given we don't control that that wasn't possible. I've used _doc, which is our default. I know the doc properties have their own caveats, but wasn't sure what else could be used.

  • The auto refresh of the page really doesn't work very well with the pagination. Given the pagination and sorting is now determined server side, if the time range changes, then the pagination needs to be reset. If the time range shifts, the information we have is incorrect, and we need to re-evaluate. Two possible solutions:

    • Turn auto refresh off by default, and have it as an opt in thing.
    • Make the refresh rate a bit longer - it's currently every 30 seconds, maybe this could be every 5 minutes?

If somebody is really looking to stream their anomalies, rather than page through them and look at historical data, it's just a case of setting the sort (on start time) and staying on page 1.

  • Pagination options (page size) are technically all hooked up server and client side, but not exposed in the UI. I didn't know if we wanted this, or just to just stick with a sensible number (I've used 25).

  • On the design dataset and category are in bold in the rows. Given recommendations seem to be to use i18n.translate over FormattedMessage now I omitted the bold. If we think this is really important, it can be changed.

Next steps

  • We should add dataset filtering. This is technically in the design for the adding categories work, but wasn't part of the outlined work. I'd like to open up this PR now due to FF, and an already large changeset, but that would be the next logical step around this functionality. (PR open)

  • There is definitely some further cleanup that can be done, e.g. we can remove some of the alternate formattings of the log rate results data now.

Testing

  • It's useful to have varied data. E.g. both log rate and category anomalies, and have these from more than one dataset.

- Adds an anomalies API endpoint
- Adds server side sorting / pagination to anomalies endpoint
- Renders category examples in the anomalies table
- Renames log entry rate examples to log entry examples
@Kerry350 Kerry350 added release_note:enhancement v8.0.0 Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.9.0 labels Jul 7, 2020
@Kerry350 Kerry350 added this to the Logs UI 7.9 milestone Jul 7, 2020
@Kerry350 Kerry350 requested a review from a team July 7, 2020 16:33
@Kerry350 Kerry350 self-assigned this Jul 7, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 I'll review this over the next days with the intention to minimize the conflicts between this and the modules list I'm working on (#64757).

@sgrodzicki sgrodzicki linked an issue Jul 8, 2020 that may be closed by this pull request
Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm submitting this in several parts to shorten the feedback cycle. I hope it won't be too inconvenient that I'm jumping around between various aspects and comment on them as I notice them.

I'm seeing a weird effect that seems to be related to the pagination. This is how I can reproduce it:

  1. load the page
  2. disable the auto-refresh
  3. paginate to the second page
  4. click "refresh" besides the datepicker

This causes the "no data" screen to be displayed for me. Another click on "refresh" causes data to be shown again, but the pagination state is lost (i.e. I'm on page one again).

@Kerry350
Copy link
Contributor Author

Kerry350 commented Jul 9, 2020

@weltenwort

This causes the "no data" screen to be displayed for me. Another click on "refresh" causes data to be shown again, but the pagination state is lost (i.e. I'm on page one again).

Hmm, I'm struggling to replicate this.

Following the same steps I see the following:

pagination

So I'm not thrown incorrectly into the "no data" state. I am returned to page 1, which would be expected as clicking "refresh" would change the time range we're looking at, and as soon as that changes pagination would need to be reset.

I've tried a few different date expressions etc 🤔

Kerry350 and others added 7 commits July 9, 2020 13:17
…entry_anomalies.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…entry_anomalies.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…entry_anomalies.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…entry_anomalies.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…entry_anomalies.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…entry_anomalies.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…entry_anomalies.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
@weltenwort
Copy link
Member

Ok, it seems I didn't control for all the variables 🙈 I can reproduce it if I expand a row on the second page and then click "refresh".

@Kerry350
Copy link
Contributor Author

Kerry350 commented Jul 9, 2020

@weltenwort

pagination-2

Still no luck reproducing with the expanded row, interesting 🤔

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing the page and reading the code I came a cross a few things that we might have to handle:

  • As mentioned below, the examples don't seem to be filtered by the category pattern.

  • The page doesn't work with only the categories job deployed. This is somewhat intertwined with the ml job list flyout, so it might make sense if I tackled this in [Logs UI] Show log analysis ML jobs in a list #71132 or even a follow-up PR. What would the page look like without log rate data?

  • When deploying the log rate job anew, there was quite a long period where the anomalies API reported results but the page showed the empty state prompt:

    image

    This is probably also a relic of the previous focus of the page on the log entry rate results. The "hasResults" checks and related conditions might need refining.

@Kerry350
Copy link
Contributor Author

Kerry350 commented Jul 9, 2020

As mentioned below, the examples don't seem to be filtered by the category pattern.
The example API used by the categories page does that. Should that API's functionality be merged into this one?

Hmm. Yeah, I missed this to be honest. Just applied the "examples around" like log rate anomalies. I'll get this working, probably with the outlook to merge what needs to be merged in a followup. I've merged certain things along the way, and left others separate.

The page doesn't work with only the categories job deployed. This is somewhat intertwined with the ml job list flyout, so it might make sense if I tackled this in #71132 or even a follow-up PR. What would the page look like without log rate data?

That would be great if you don't mind. I guess we'd want to show a callout in place of the chart without log rate data, telling the user to create the job.

When deploying the log rate job anew, there was quite a long period where the anomalies API reported results but the page showed the empty state prompt:

Ah, yes, need to tweak those has data checks (like you said).

Kerry350 and others added 14 commits July 13, 2020 11:28
…es.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…b.com:Kerry350/kibana into 64755-expand-anomalies-page-to-add-categories
…es.ts

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…b.com:Kerry350/kibana into 64755-expand-anomalies-page-to-add-categories
…/anomalies/table.tsx

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
…b.com:Kerry350/kibana into 64755-expand-anomalies-page-to-add-categories
@Kerry350
Copy link
Contributor Author

@weltenwort I think everything is addressed and working now. Main things:

  • I've refactored the anomalies hooks state to use a reducer, this makes it (imo) a lot more readable, as well as eradicating the bugs.

  • Empty datasets should now work fine. I'm still trying to verify this 100% (having trouble with the dev cluster), but the fix is there.

  • For the jobId extraction at the expanded row level I've actually added the jobId to the anomaly result returned from the API. We might want to revisit this, I'm not sure. Originally, the log rate module context was being utilised at that level. I didn't want to pull in the category module context there as that would almost definitely start treading on the toes of your work, which seemed unnecessary as this works fine. On the other hand given how unassuming the anomlies API is (it doesn't expect or require particular jobs to exist, it'll just try to gather everything) it could be useful to have that property.

  • Fixed an additional bug with the "View anomaly in machine learning" link - the category ID needed passing through for category anomalies.

We should probably rethink the philosophy when and how we check for jobs, when we use the stored custom_settings instead of the source config and what parameters we pass at some point.

Definitely. I think there's a few things we can improve once this all comes together and we have a chance to breathe and reassess.

Copy link
Member

@weltenwort weltenwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done regarding the reducer, feels pretty consistent and much easier to read now.

Great job overall, this was definitely a challenging enhancement to pivot the tab's focus like that 👏

sourceConfiguration: { sourceId },
} = useLogEntryRateModuleContext();
}> = ({ anomaly, timeRange }) => {
const { sourceId } = useLogSourceContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this reduces the scope of the dependency 👍

tooltip={tooltipProps}
baseTheme={isDarkMode ? DARK_THEME : LIGHT_THEME}
xDomain={{ min: timeRange.startTime, max: timeRange.endTime }}
<>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok then... I would have expected something like

return !isLoading && !series.length ? (
  <EuiEmptyPrompt ...>
) : (
  <LoadingOverlayWrapper ...>
);

to work as well.

@weltenwort
Copy link
Member

Forgot to mention, the empty category example retrieval now works for me 👍

And I think putting the jobId into the anomaly response is totally reasonable - it's not a leak of abstraction since anomalies are an ML concept in the first place.

@weltenwort
Copy link
Member

Another late thought, sorry 🙈 As the failing i18n check points out we lost a bit of accessibility of the loading spinner. Do we want to resurrect that LoadingOverlayContent?

@Kerry350
Copy link
Contributor Author

@weltenwort Thanks for the reviews 🙏

Pushed up those last little tweaks, hopefully this can go green in the background 🤞

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

@kbn/optimizer bundle module count

id value diff baseline
infra 607 +1 606

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@Kerry350 Kerry350 merged commit 4db5816 into elastic:master Jul 13, 2020
Kerry350 added a commit to Kerry350/kibana that referenced this pull request Jul 13, 2020
* Add category anomalies to anomalies page

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Kerry350 added a commit that referenced this pull request Jul 13, 2020
* Add category anomalies to anomalies page

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Logs UI Logs UI feature release_note:enhancement Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Logs UI] Show category anomalies in anomaly table
4 participants