Allow timestamps to be in epoch ms #850

felixbarny · 2018-04-17T07:26:38Z

Currently, timestamps in the intake API have to be a ISO date formatted string. Profiling has shown that this is actually the biggest cause of object allocations when serializing the payloads as JSON. Could we implement support for optionally transmitting the timestamp as a (64 bit) integer in epoch milliseconds?

This applies to Transaction.timestamp and Error.timestamp.

The text was updated successfully, but these errors were encountered:

beniwohli · 2018-04-17T08:53:19Z

Wow, that's surprising. I know that parsing iso datetimes can be a bottleneck (hence why we nailed down the format pretty strictly so we can just parse it with a regex), but creating a datetime string as well? In Python, we more or less just call the libc implementation of strftime, which is probably some of the most tuned code you can find.

To be honest, I'm not a huge fan of accepting multiple datetime formats. Also, an integer epoch doesn't carry any timezone information, so we can't be sure that it's UTC.

watson · 2018-04-17T09:04:14Z

@beniwohli we could just make it a really clear requirement that it has to be UTC, but I agree that we increase the risk of getting it wrong if we don't state the timezone explicitly

felixbarny · 2018-04-17T11:22:18Z

Also, an integer epoch doesn't carry any timezone information, so we can't be sure that it's UTC.

An epoch timestamp is always in UTC per definition.

Here is a screenshot of an allocation profile of the Java agent where you can see that one of the biggest cause of allocations is parsing the date string during JSON serialization:

felixbarny · 2018-04-17T11:45:24Z

This change would reduce the allocation rate overhead by about 30%, which brings allocation rates down to under 1kb per request (including the reporting of the transactions).

beniwohli · 2018-04-17T11:47:53Z

An epoch timestamp is always in UTC per definition.

yup, you're right, I retract that argument. Sorry about that.

Still, my surprise remains. Compared to all the other things we do, transaction and error timestamps have a relatively low cardinality. Does your test suite do any non-trivial work or is it just serializing more or less empty transactions?

I'd like to avoid complicating the API due to a case of premature optimization, is all.

felixbarny · 2018-04-17T11:54:18Z

Does your test suite do any non-trivial work or is it just serializing more or less empty transactions?

In this benchmark, I'm measuring recording a HTTP request and a SQL query and reporting the transactions to a mock APM-server over HTTP. So there is no non-trivial work that the benchmark does because the point is to only measure the overhead imposed by the agent. Allocation rates are only one metric of this benchmark, but that simple change would reduce the allocation rate overhead by about 30%.

felixbarny · 2018-04-17T12:03:03Z

Note that by default, the Java agent only captures stack traces when a span took longer than 5ms. Otherwise that would probably be the dominating factor. Also, I implemented ignore_urls in a way that does not allocate memory by not using regex pattern matching, which would otherwise allocate a lot of memory as well. So there is not much left which does allocate memory and one of those things happens to be the construction of the ISO timestamp String.

beniwohli · 2018-04-17T12:14:03Z

My point is that if the agent doesn't do much more than creating an almost empty transaction, not doing those allocations looks like it has a huge impact. But if you measure a more realistic transaction with a couple dozen spans with contexts and frames and whatnot, the impact might suddenly look much smaller.

All I'm saying is that we should be vary of taking such drastic steps as changing the intake API for something that might just be a micro-optimization.

felixbarny · 2018-04-17T12:29:15Z

Keep in mind that we do report all transactions, even when not sampling them. In order for this to work at scale, we have to bring the overhead for creating and reporting a non-sampled transaction down to the bare minimum IMHO.

Also, creating transactions and spans involve close to zero allocations, so it does not really matter how many spans there are in a transaction. Note that in this benchmark, the context is fully propagated with http context in the transaction and db context in the span, including the sql statement, so it is somewhat realistically.

felixbarny · 2018-04-17T13:07:25Z

Also, creating transactions and spans involve close to zero allocations, so it does not really matter how many spans there are in a transaction.

This is not quite accurate as monitoring SQL queries currently does allocate some memory (see elastic/apm-agent-java#35)

roncohen · 2018-04-17T13:50:35Z

That is indeed surprising.

Googling a bit, it seems DateTimeFormatter is not very efficient. Could we investigate an alternative implementation?

felixbarny · 2018-04-17T14:02:58Z

There is a newer date API in Java. Might be that the date parser is more efficient in this API. But it is only available in Java 8+ and we support Java 7 applications.

Not sure what you mean with event.toISOString().

Using an epoch timestamp in an API seems like quite a normal thing to do and I'd even consider it more robust that parsing strings. It would also be a backwards compatible change if we support both in the JSON schema. Also, that doesn't seem to add a lot of complexity. So I can't see a lot of downsides to this approach.

roncohen · 2018-04-18T08:15:34Z

I think it's important to differentiate between whether generating ISO dates are inherently costly operation or if the Java formatter is inefficient. I'd like to avoid making changes to the API to work around performance problems in different languages' standard libraries. If it turns out more agents have the same problem, then it's another story.

By the way, this is about formatting, not parsing, as that happens on the APM Server, right? @felixbarny

watson · 2018-04-18T12:37:09Z

For reference here's the difference in speed for generating an ISO string vs getting the current UTC time in milliseconds in Node.js/v8:

new Date().toISOString() x 1,391,237 ops/sec ±1.58% (80 runs sampled)
Date.now() x 8,644,451 ops/sec ±1.19% (82 runs sampled)
Fastest is Date.now()

The test was run on my slow laptop. While it looks like ms is around 6 times faster than ISO, we have to remember that I'm still able to generate over a million ISO strings per second, so it's not worth optimizing for my sake.

felixbarny · 2018-04-18T13:32:10Z

I think it's important to differentiate between whether generating ISO dates are inherently costly operation or if the Java formatter is inefficient.

Generating a ISO date string is inherently more costly than just having an epoch timestamp in probably any programming language. Generating a raw timestamp does not require heap allocations in contrast to converting this timestamp to a ISO string representation. Of course it is debatable if it actually makes a difference. But just looking at the average latency values does not tell the whole picture:

The problem is that when we are allocating objects, we put additional pressure on the GC which means that the GC will run more frequently with potentially longer pauses. That, in turn, means we don't have a predictable overhead of X ms per request. Due to the added GC pressure, the user requests are more frequently affected by GC. You typically don't see this in average response time values but in the higher percentiles like the 99th or 99.99th percentile latency. If we were just monitoring a batch job where only throughput is important, this would not matter. But latency is very important for a typical web application, as we don't want a portion of users having to wait for a much longer time when the application is monitored by our agents.

That's the reason why I'm constantly arguing for having as few allocations as possible. It's not about having a low average overhead but rather about having a predictable overhead (even if it the average would be slightly higher) which does not greatly affect the higher latency percentiles.

watson · 2018-04-20T10:17:34Z

Just a few findings that might be relevant for this discussion:

I was just reading up on the User Timing API in browsers as we use some of those timings as the source for marks in the intake API.

As far as I understand it, the User Timing API is based on the Performance Timeline spec by W3C (which actually also is implemented as an experimental API in Node.js).

According to the W3C spec, a mark is an instance of PerformanceEntry, which in turn uses DOMHighResTimeStamp to store both absolute time and time deltas (startTime and duration respectively).

The DOMHighResTimeStamp type is a double and is used to store a time value. The unit is milliseconds and should be accurate to 5 µs (microseconds).

So here they did decide on using a number to store the timestamp and not a string. Just thought that was interesting.

felixbarny · 2018-04-23T08:42:38Z

After merging elastic/apm-agent-java#69, this is basically the only source of object allocations during JSON serialization and there is no way around of allocations when formatting an ISO date string.

felixbarny · 2018-05-01T07:29:21Z

I have now created a dashboard with the benchmark results:

Note the significant drop in allocations (from 473 to 82kb/op) when serializing timestamps as an epoch timestamp. Also, the ops/s (Score) increases.

beniwohli · 2018-05-01T07:39:39Z

@felixbarny I'm not doubting that you see a significant impact in an artificial benchmark. What I'm wondering is if this impact carries over to a more real-world scenario. If there is a measurable impact when running the agent in an app that actually does something, then it might be worth changing the API over it. But if it vanishes, I'm not so sure anymore.

I hope you don't take my hesitance over this API change the wrong way. I mostly want to avoid softening up the API spec for micro-optimizations in this or that agent, as any change will be set more or less in stone.

axw · 2018-05-01T08:21:49Z

FWIW, I'm not a fan of the datetime string-formatting either, for the same reasons. The only benefit to string formatting I can see is that a human can more easily read the value, but I don't think that's a strong enough reason to impose additional processing/memory/GC overhead.

Having said that, this is not a significant issue in Go, since there's a garbage-free way of formatting timestamps: time.Time.AppendFormat. I'd suggest writing a specialised/optimised formatter for Java, which deals only with the specific format we care about.

If we introduce a binary format, let's please make the timestamp an integer.

felixbarny · 2018-05-01T08:48:09Z

If there is a measurable impact when running the agent in an app that actually does something, then it might be worth changing the API over it. But if it vanishes, I'm not so sure anymore.

Sure, if the agent is deployed in an application which is rather wasteful in terms of memory, you probably won't notice the difference. But we don't have control over the application the agent is used in and I'm sure our agents will also be used in well-optimized applications. All we have control over is how much overhead the agent adds to the application per request (which is what the benchmark is measuring - just the bare overhead of the agent) and I think it's worth aiming at the lowest overhead possible. For most languages, it is non-trivial to create an ISO formatted date timestamp in a performant and garbage-free way, whereas just writing an epoch timestamp is the most simple thing to do.

I hope you don't take my hesitance over this API change the wrong way.

I really don't - I appreciate your feedback and I know that API changes are not to be taken lightly. But I think that an epoch timestamp is a better choice in API design anyway and that we would only gain from this change.

felixbarny · 2018-05-07T09:13:56Z

With elastic/apm-agent-java#77 I wrote a custom formatter which can serialize ISO timestamps in a garbage free and performant way.

So I'm OK with leaving the timestamp as is but would still suggest using an epoch timestamps in future APIs like intake v2 or the metrics API.

roncohen · 2018-07-02T14:42:31Z

I'm happy you found a way to work around it @felixbarny. I'm going to close this now because it's no longer a problem in the agents. If you run into problems again related to this, feel free to reopen.

felixbarny · 2018-10-15T15:53:51Z

To follow up, this has now been implemented via #1413

felixbarny mentioned this issue Apr 17, 2018

Add JMH benchmark to determine the overhead of the agent elastic/apm-agent-java#61

Merged

felixbarny mentioned this issue Apr 23, 2018

Replace Jackson with DSL-JSON elastic/apm-agent-java#69

Merged

alvarolobato added [zube]: Inbox enhancement labels May 14, 2018

alvarolobato added [zube]: Backlog and removed [zube]: Inbox labels Jul 2, 2018

roncohen closed this as completed Jul 2, 2018

zube bot added [zube]: Done and removed [zube]: Inbox labels Jul 2, 2018

zube bot reopened this Jul 2, 2018

zube bot added [zube]: Inbox and removed [zube]: Done labels Jul 2, 2018

zube bot closed this as completed Jul 2, 2018

zube bot added [zube]: Done and removed [zube]: Inbox labels Jul 2, 2018

alvarolobato removed the [zube]: Done label Jul 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow timestamps to be in epoch ms #850

Allow timestamps to be in epoch ms #850

felixbarny commented Apr 17, 2018 •

edited by alvarolobato

Loading

beniwohli commented Apr 17, 2018

watson commented Apr 17, 2018

felixbarny commented Apr 17, 2018

felixbarny commented Apr 17, 2018 •

edited

Loading

beniwohli commented Apr 17, 2018

felixbarny commented Apr 17, 2018 •

edited

Loading

felixbarny commented Apr 17, 2018

beniwohli commented Apr 17, 2018

felixbarny commented Apr 17, 2018 •

edited

Loading

felixbarny commented Apr 17, 2018

roncohen commented Apr 17, 2018

felixbarny commented Apr 17, 2018

roncohen commented Apr 18, 2018

watson commented Apr 18, 2018

felixbarny commented Apr 18, 2018

watson commented Apr 20, 2018

felixbarny commented Apr 23, 2018

felixbarny commented May 1, 2018 •

edited

Loading

beniwohli commented May 1, 2018

axw commented May 1, 2018

felixbarny commented May 1, 2018

felixbarny commented May 7, 2018

roncohen commented Jul 2, 2018

felixbarny commented Oct 15, 2018

Allow timestamps to be in epoch ms #850

Allow timestamps to be in epoch ms #850

Comments

felixbarny commented Apr 17, 2018 • edited by alvarolobato Loading

beniwohli commented Apr 17, 2018

watson commented Apr 17, 2018

felixbarny commented Apr 17, 2018

felixbarny commented Apr 17, 2018 • edited Loading

beniwohli commented Apr 17, 2018

felixbarny commented Apr 17, 2018 • edited Loading

felixbarny commented Apr 17, 2018

beniwohli commented Apr 17, 2018

felixbarny commented Apr 17, 2018 • edited Loading

felixbarny commented Apr 17, 2018

roncohen commented Apr 17, 2018

felixbarny commented Apr 17, 2018

roncohen commented Apr 18, 2018

watson commented Apr 18, 2018

felixbarny commented Apr 18, 2018

watson commented Apr 20, 2018

felixbarny commented Apr 23, 2018

felixbarny commented May 1, 2018 • edited Loading

beniwohli commented May 1, 2018

axw commented May 1, 2018

felixbarny commented May 1, 2018

felixbarny commented May 7, 2018

roncohen commented Jul 2, 2018

felixbarny commented Oct 15, 2018

felixbarny commented Apr 17, 2018 •

edited by alvarolobato

Loading

felixbarny commented Apr 17, 2018 •

edited

Loading

felixbarny commented Apr 17, 2018 •

edited

Loading

felixbarny commented Apr 17, 2018 •

edited

Loading

felixbarny commented May 1, 2018 •

edited

Loading