-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instrumentation about volume of data parsed during resynchronization #1675
Comments
With `--enable-profiling` the output for Spicy units/fields now includes a new `volume` column, like this: ``` #name count time avg-% total-% volume [...] spicy/unit/test::A 1 285500 43.96 43.96 8 spicy/unit/test::A/__gap__ 4 3167 0.12 0.49 0 spicy/unit/test::A/__synchronize__ 1 35500 5.47 5.47 4 spicy/unit/test::A::a 1 74833 11.52 11.52 - spicy/unit/test::A::b 1 15333 2.36 2.36 1 spicy/unit/test::A::c 1 19125 2.94 2.94 1 spicy/unit/test::A::d 1 7583 1.17 1.17 1 spicy/unit/test::A::e 1 8042 1.24 1.24 1 ``` Three different things here: - The `volume` column for `spicy/unit/TYPE` and `spicy/unit/TYPE::FIELD` augments the already existing timing measurement and reports the total, aggregate number of bytes that this unit/field got to parse over the course of the processing. - For units going into synchronization mode, there are now additional rows `spicy/unit/TYPE/__synchronize__` that report both CPU time and volume spent in synchronization while processing that unit. - For units encountering input gaps during synchronization, there are now additional rows 'spicy/unit/TYPE/__gap__` that report total aggregate gap size encountered while processing the unit. All the volume measurements are taken as differences of two offsets inside the input stream. For normal unit/field parsing, we subtract the final offset after parsing an instance from the initial offset where its parsing started.[1] For synchronization, it's the offset where synchronization stopped successfully minus where it started.[2] For gaps, it's the offset where we continued after the gap minus where the gap started.[3] All these differences are then added up for each row over the course of total input stream processing. Note that volume isn't counted if parsing for some reason never reaches the point where the end measurement would be taken (e.g., a parser error prevents it from being reached; in the output above that's the case for `spicy/unit/test::A::a`). Closes #1675. [1] This *includes any ranges that the unit spent in synchronization mode trying to recover from parse errors. [2] This does *not* include any gaps encountered because they don't affect stream offsets. [3] Litte glitch: these values can currently by off by one due to some internal ambiguity.
@Mohan-Dhawan give #1676 a try. |
Thanks @rsmmr . I do get the volume stats in the output. It would also be nice to know if a higher reported volume in gaps or synchronize is detrimental to performance. |
Can I see the output?
I don't follow what you mean, can you elaborate how the numbers could be improved? |
What I wanted to know is what are the acceptable limits for performance for |
There's no general answer to that. You need to put in relation to the input volume / standard parsing. |
The context here is that I have a 698MB trace with |
Can you send me the full output please? |
For the record, I never received the full output, so we need to take the measurement with a grain of salt for now. |
Hi @rsmmr . Sorry, it completely slipped out of my mind. I've sent you the detailed output in the Zeek Slack DM. |
Somewhat related, I bumped #1133 into TODO. |
The spicy profiler provides no information about the time spent in the resynchronization code. It would be great to have metrics around the volume of data required to achieve resynchronization and the total time taken.
The text was updated successfully, but these errors were encountered: