Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrumentation about volume of data parsed during resynchronization #1675

Closed
Mohan-Dhawan opened this issue Feb 20, 2024 · 10 comments · Fixed by #1676
Closed

Add instrumentation about volume of data parsed during resynchronization #1675

Mohan-Dhawan opened this issue Feb 20, 2024 · 10 comments · Fixed by #1676
Assignees

Comments

@Mohan-Dhawan
Copy link

The spicy profiler provides no information about the time spent in the resynchronization code. It would be great to have metrics around the volume of data required to achieve resynchronization and the total time taken.

@rsmmr rsmmr self-assigned this Feb 21, 2024
rsmmr added a commit that referenced this issue Feb 21, 2024
With `--enable-profiling` the output for Spicy units/fields now
includes a new `volume` column, like this:

```
#name                                                   count       time      avg-%    total-%          volume
[...]
spicy/unit/test::A                                          1     285500      43.96      43.96               8
spicy/unit/test::A/__gap__                                  4       3167       0.12       0.49               0
spicy/unit/test::A/__synchronize__                          1      35500       5.47       5.47               4
spicy/unit/test::A::a                                       1      74833      11.52      11.52               -
spicy/unit/test::A::b                                       1      15333       2.36       2.36               1
spicy/unit/test::A::c                                       1      19125       2.94       2.94               1
spicy/unit/test::A::d                                       1       7583       1.17       1.17               1
spicy/unit/test::A::e                                       1       8042       1.24       1.24               1
```

Three different things here:

- The `volume` column for `spicy/unit/TYPE` and
  `spicy/unit/TYPE::FIELD` augments the already existing timing
  measurement and reports the total, aggregate number of bytes that
  this unit/field got to parse over the course of the processing.

- For units going into synchronization mode, there are now additional
  rows `spicy/unit/TYPE/__synchronize__` that report both CPU time and
  volume spent in synchronization while processing that unit.

- For units encountering input gaps during synchronization, there are
  now additional rows 'spicy/unit/TYPE/__gap__` that report total
  aggregate gap size encountered while processing the unit.

All the volume measurements are taken as differences of two offsets
inside the input stream. For normal unit/field parsing, we subtract
the final offset after parsing an instance from the initial offset
where its parsing started.[1] For synchronization, it's the offset
where synchronization stopped successfully minus where it started.[2]
For gaps, it's the offset where we continued after the gap minus where
the gap started.[3] All these differences are then added up for each
row over the course of total input stream processing.

Note that volume isn't counted if parsing for some reason never
reaches the point where the end measurement would be taken (e.g., a
parser error prevents it from being reached; in the output above
that's the case for `spicy/unit/test::A::a`).

Closes #1675.

[1] This *includes any ranges that the unit spent in synchronization
mode trying to recover from parse errors.

[2] This does *not* include any gaps encountered because they don't
affect stream offsets.

[3] Litte glitch: these values can currently by off by one due to some
internal ambiguity.
@rsmmr
Copy link
Member

rsmmr commented Feb 21, 2024

@Mohan-Dhawan give #1676 a try.

@Mohan-Dhawan
Copy link
Author

Thanks @rsmmr . I do get the volume stats in the output. It would also be nice to know if a higher reported volume in gaps or synchronize is detrimental to performance.

@rsmmr
Copy link
Member

rsmmr commented Feb 23, 2024

Thanks @rsmmr . I do get the volume stats in the output.

Can I see the output?

It would also be nice to know if a higher reported volume in gaps or synchronize is detrimental to performance.

I don't follow what you mean, can you elaborate how the numbers could be improved?

@Mohan-Dhawan
Copy link
Author

spicy/unit/<pdu>/__gap__                   172043743 12695491160       0.00      14.53      2013841114
spicy/unit/<pdu>/__synchronize__               82553 3648394463       0.00       4.18             184

What I wanted to know is what are the acceptable limits for performance for gap and for synchronize?

@rsmmr
Copy link
Member

rsmmr commented Feb 23, 2024

There's no general answer to that. You need to put in relation to the input volume / standard parsing.

@Mohan-Dhawan
Copy link
Author

Mohan-Dhawan commented Feb 23, 2024

The context here is that I have a 698MB trace with 8883 connections but having gaps in content. A flamegraph for its execution yielded that close to 66% of samples were in the unit responsible for synchronization. About half of those samples were from MatchState::advance and majority of the calls from it were to the function jrx_regexec_partial_min. Given that the volume of bytes in the __synchronize__ entry is just 184, is the high volume of jrx_* calls indicative of any edge case?

@rsmmr
Copy link
Member

rsmmr commented Feb 27, 2024

Can you send me the full output please?

@rsmmr
Copy link
Member

rsmmr commented Mar 6, 2024

For the record, I never received the full output, so we need to take the measurement with a grain of salt for now.

@Mohan-Dhawan
Copy link
Author

Mohan-Dhawan commented Mar 6, 2024

Hi @rsmmr . Sorry, it completely slipped out of my mind. I've sent you the detailed output in the Zeek Slack DM.

@rsmmr rsmmr closed this as completed in d90f191 Mar 6, 2024
@bbannier
Copy link
Member

Somewhat related, I bumped #1133 into TODO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants