Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Nanosecond-precision timetamps #12797

Merged
merged 2 commits into from
Oct 11, 2021

Conversation

yaauie
Copy link
Member

@yaauie yaauie commented Apr 2, 2021

Release notes

Adds support for nanosecond-precision timestamps on events.

  • timestamps generated by Logstash will have best-available granularity, depending on JVM and platform (in many cases this is microseconds)
  • timestamps parsed by logstash will include all available granularity
  • timestamps will preserve nanosecond-granularity throughout the Logstash pipeline

Adds support in the Event API's @timestamp formatters to use java-style format strings (using a new syntax%{{ + JAVA_FORMAT + }} that aligns with Elasticsearch date math), while retaining support for the now-legacy joda-time format strings (%{+ + JODA_FORMAT + }). This enables our users to tap into the improved formatting options available in their JVM, or to continue without modification.

What does this PR do?

Changes our org.logstash.Timestamp to wrap Java 8+'s java.time.Instant, which allows us to maintain nanosecond-precision timestamps throughout processing. Access to an equivalent org.joda.time.DateTime is still available via the existing org.logstash.Timestamp#getTime() method, but has been deprecated in favor of org.logstash.Timestamp#getInstant().

Why is it important/What is the impact to the user?

As processing times speed up, millisecond granularity is not always enough. Inbound data increasingly has sub-millisecond granularity timestamps, and as long as we wrap org.joda.time.DateTime this granularity is truncated. This change-set allows the internal mechanisms of Logstash that hold moment-in-time data to have nanosecond granularity.

Timestamps produced here maintain their granularity through serialization to the PQ, the DLQ, toString and when JSON-encoded, whether those timestamps were parsed from a string or generated by Logstash. Generated timestamps are limited to the JVM and Platform's available granularity, which in many cases is microseconds.

Users also gain access to Java time's improved formatters, which include support fort ISO quarters, week-of-month, and a variety of timezone/offset-related format substitutions.

NOTE: Serialized events written by this code contain a higher granularity than can be safely parsed by releases of Logstash prior to these changes. As such, neither the PQ nor DLQ would support a "downgrade" after upgrading to a release of Logstash that includes these changes. This is why I am not aiming to back-port these changes to Logstash 7.x

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

It is unclear just yet if changing org.logstash.Timestamp is enough.

  • serializing: do we serialize nanoseconds by default?
  • downstream: can downstream services handle nanosecond precision ISO8601 strings gracefully (whether by maintaining precision or losing it), or do they reject the input?
    • elasticsearch time fields
  • does precision survive being serialised into the PQ and deserialised back out by the workers?

@andsel
Copy link
Contributor

andsel commented Sep 9, 2021

If you can address the notes on the javadoc comments, then the PR is LGTM

@andsel andsel self-requested a review September 14, 2021 08:15
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andsel
Copy link
Contributor

andsel commented Sep 14, 2021

The red CI is solved aligning this to master which contains PR #13209

Migrates internals of `org.logstash.Timestamp` from legacy `org.joda.time.*`
which is limited to millisecond-precision to modern `java.time.Instant`,
allowing us to retain nanosecond granularity of `@timestamp` values.

Timestamps that are generated by Logstash (such as when creating an event that
does _not_ have a `@timestamp` field) will be generated at the highest precision
available to the JVM and/or platform (in many cases, this is microseconds).

Timestamps that are _parsed_ from user input will capture the entire provided
precision, up to and including nanosecond granularity.

Throughout the flow in the pipeline, including serialization to PQ, DLQ, and
JSON, will retain all available precision.

BREAKING: This produces an effectively-breaking change to the serialization
          format of both the persistent queue (PQ) and dead-letter queue (DLQ),
          as the serialized format this changeset contains a higher granularity
          of timestamp than previous releases of Logstash were capable of
          parsing without error.
          As such, it _MUST NOT_ be back-ported to the 7.x series.
@yaauie
Copy link
Member Author

yaauie commented Oct 11, 2021

Jenkins test this again please

@yaauie yaauie merged commit 82081d8 into elastic:master Oct 11, 2021
@yaauie yaauie deleted the nanoseconds-support branch October 11, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants