-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow timestamps to be in epoch ms #850
Comments
Wow, that's surprising. I know that parsing iso datetimes can be a bottleneck (hence why we nailed down the format pretty strictly so we can just parse it with a regex), but creating a datetime string as well? In Python, we more or less just call the libc implementation of To be honest, I'm not a huge fan of accepting multiple datetime formats. Also, an integer epoch doesn't carry any timezone information, so we can't be sure that it's UTC. |
@beniwohli we could just make it a really clear requirement that it has to be UTC, but I agree that we increase the risk of getting it wrong if we don't state the timezone explicitly |
This change would reduce the allocation rate overhead by about 30%, which brings allocation rates down to under 1kb per request (including the reporting of the transactions). |
yup, you're right, I retract that argument. Sorry about that. Still, my surprise remains. Compared to all the other things we do, transaction and error timestamps have a relatively low cardinality. Does your test suite do any non-trivial work or is it just serializing more or less empty transactions? I'd like to avoid complicating the API due to a case of premature optimization, is all. |
In this benchmark, I'm measuring recording a HTTP request and a SQL query and reporting the transactions to a mock APM-server over HTTP. So there is no non-trivial work that the benchmark does because the point is to only measure the overhead imposed by the agent. Allocation rates are only one metric of this benchmark, but that simple change would reduce the allocation rate overhead by about 30%. |
Note that by default, the Java agent only captures stack traces when a span took longer than 5ms. Otherwise that would probably be the dominating factor. Also, I implemented |
My point is that if the agent doesn't do much more than creating an almost empty transaction, not doing those allocations looks like it has a huge impact. But if you measure a more realistic transaction with a couple dozen spans with contexts and frames and whatnot, the impact might suddenly look much smaller. All I'm saying is that we should be vary of taking such drastic steps as changing the intake API for something that might just be a micro-optimization. |
Keep in mind that we do report all transactions, even when not sampling them. In order for this to work at scale, we have to bring the overhead for creating and reporting a non-sampled transaction down to the bare minimum IMHO. Also, creating transactions and spans involve close to zero allocations, so it does not really matter how many spans there are in a transaction. Note that in this benchmark, the context is fully propagated with http context in the transaction and db context in the span, including the sql statement, so it is somewhat realistically. |
This is not quite accurate as monitoring SQL queries currently does allocate some memory (see elastic/apm-agent-java#35) |
That is indeed surprising. Googling a bit, it seems |
There is a newer date API in Java. Might be that the date parser is more efficient in this API. But it is only available in Java 8+ and we support Java 7 applications. Not sure what you mean with Using an epoch timestamp in an API seems like quite a normal thing to do and I'd even consider it more robust that parsing strings. It would also be a backwards compatible change if we support both in the JSON schema. Also, that doesn't seem to add a lot of complexity. So I can't see a lot of downsides to this approach. |
I think it's important to differentiate between whether generating ISO dates are inherently costly operation or if the Java formatter is inefficient. I'd like to avoid making changes to the API to work around performance problems in different languages' standard libraries. If it turns out more agents have the same problem, then it's another story. By the way, this is about formatting, not parsing, as that happens on the APM Server, right? @felixbarny |
For reference here's the difference in speed for generating an ISO string vs getting the current UTC time in milliseconds in Node.js/v8:
The test was run on my slow laptop. While it looks like ms is around 6 times faster than ISO, we have to remember that I'm still able to generate over a million ISO strings per second, so it's not worth optimizing for my sake. |
Generating a ISO date string is inherently more costly than just having an epoch timestamp in probably any programming language. Generating a raw timestamp does not require heap allocations in contrast to converting this timestamp to a ISO string representation. Of course it is debatable if it actually makes a difference. But just looking at the average latency values does not tell the whole picture: The problem is that when we are allocating objects, we put additional pressure on the GC which means that the GC will run more frequently with potentially longer pauses. That, in turn, means we don't have a predictable overhead of X ms per request. Due to the added GC pressure, the user requests are more frequently affected by GC. You typically don't see this in average response time values but in the higher percentiles like the 99th or 99.99th percentile latency. If we were just monitoring a batch job where only throughput is important, this would not matter. But latency is very important for a typical web application, as we don't want a portion of users having to wait for a much longer time when the application is monitored by our agents. That's the reason why I'm constantly arguing for having as few allocations as possible. It's not about having a low average overhead but rather about having a predictable overhead (even if it the average would be slightly higher) which does not greatly affect the higher latency percentiles. |
Just a few findings that might be relevant for this discussion: I was just reading up on the User Timing API in browsers as we use some of those timings as the source for marks in the intake API. As far as I understand it, the User Timing API is based on the Performance Timeline spec by W3C (which actually also is implemented as an experimental API in Node.js). According to the W3C spec, a mark is an instance of The So here they did decide on using a number to store the timestamp and not a string. Just thought that was interesting. |
After merging elastic/apm-agent-java#69, this is basically the only source of object allocations during JSON serialization and there is no way around of allocations when formatting an ISO date string. |
@felixbarny I'm not doubting that you see a significant impact in an artificial benchmark. What I'm wondering is if this impact carries over to a more real-world scenario. If there is a measurable impact when running the agent in an app that actually does something, then it might be worth changing the API over it. But if it vanishes, I'm not so sure anymore. I hope you don't take my hesitance over this API change the wrong way. I mostly want to avoid softening up the API spec for micro-optimizations in this or that agent, as any change will be set more or less in stone. |
FWIW, I'm not a fan of the datetime string-formatting either, for the same reasons. The only benefit to string formatting I can see is that a human can more easily read the value, but I don't think that's a strong enough reason to impose additional processing/memory/GC overhead. Having said that, this is not a significant issue in Go, since there's a garbage-free way of formatting timestamps: time.Time.AppendFormat. I'd suggest writing a specialised/optimised formatter for Java, which deals only with the specific format we care about. If we introduce a binary format, let's please make the timestamp an integer. |
Sure, if the agent is deployed in an application which is rather wasteful in terms of memory, you probably won't notice the difference. But we don't have control over the application the agent is used in and I'm sure our agents will also be used in well-optimized applications. All we have control over is how much overhead the agent adds to the application per request (which is what the benchmark is measuring - just the bare overhead of the agent) and I think it's worth aiming at the lowest overhead possible. For most languages, it is non-trivial to create an ISO formatted date timestamp in a performant and garbage-free way, whereas just writing an epoch timestamp is the most simple thing to do.
I really don't - I appreciate your feedback and I know that API changes are not to be taken lightly. But I think that an epoch timestamp is a better choice in API design anyway and that we would only gain from this change. |
With elastic/apm-agent-java#77 I wrote a custom formatter which can serialize ISO timestamps in a garbage free and performant way. So I'm OK with leaving the timestamp as is but would still suggest using an epoch timestamps in future APIs like intake v2 or the metrics API. |
I'm happy you found a way to work around it @felixbarny. I'm going to close this now because it's no longer a problem in the agents. If you run into problems again related to this, feel free to reopen. |
To follow up, this has now been implemented via #1413 |
Currently, timestamps in the intake API have to be a ISO date formatted string. Profiling has shown that this is actually the biggest cause of object allocations when serializing the payloads as JSON. Could we implement support for optionally transmitting the timestamp as a (64 bit) integer in epoch milliseconds?
This applies to
Transaction.timestamp
andError.timestamp
.The text was updated successfully, but these errors were encountered: