-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nanosecond field mapper #37755
Add nanosecond field mapper #37755
Conversation
Pinging @elastic/es-search |
@elasticmachine run elasticsearch-ci/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @spinscale , I did a first pass and added some comments.
This means that it will accept dates with optional timestamps, which conform | ||
to the formats supported by | ||
<<strict-date-time,`strict_date_optional_time_nanos`>> or | ||
nanoseconds-since-the-epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it nanoseconds-since-the-epoch or epoch_millis
? I don't understand which one is used by default.
<1> The `date` field uses the default `format`. | ||
<2> This document uses a plain date. | ||
<3> This document includes a time. | ||
<4> This document uses milliseconds-since-the-epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, shouldn't this use the nanoseconds-since-the-epoch ?
<2> This document uses a plain date. | ||
<3> This document includes a time. | ||
<4> This document uses milliseconds-since-the-epoch. | ||
<5> Note that the `sort` values that are returned are all in milliseconds-since-the-epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be in nanoseconds since epoch if there is a single index (the field is declared as a nanosecond date) ?
import static org.elasticsearch.common.time.DateUtils.toLong; | ||
|
||
/** A {@link FieldMapper} for dates. */ | ||
public class NanosecondDateFieldMapper extends FieldMapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the nanosecond type could be included in DateFieldMapper
directly. The code is almost the same, only the parsing changes so I don't think we need a separate class ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid, looking if that works out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have not worked on the other items but just on this one. Maybe you can take a look if my change makes sense. I created an enum, that has different methods for converting a string to a long for milliseconds and nanoseconds. Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, it looks good
import static org.hamcrest.Matchers.arrayWithSize; | ||
import static org.hamcrest.Matchers.is; | ||
|
||
public class NanosecondIT extends ESIntegTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add rest tests rather than ESIntegTestCase
. These tests shouldn't bee too difficult to migrate to yaml ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo, great to see that all the work that has been done wrt java-time support now allows to support nanosecond timestamps with so little additional code. Some comments/questions:
- I'm not a fan of the
nanosecond
name because it doesn't tell whether this is a date/time or a duration. I'd rather like something liketimestamp
,timestamp_nanos
or evendate_nanos
. - I left a note about some computations that are wrong with negative timestamps (ie. before 1970). While we could fix it, I wonder that a better option might be to reject nanosecond timestamps before 1970 at index time? I can't think of a use-case that requires nanosecond precision before 1970.
- I agree with Jim it's a bit confusing that this new field supports nanoseconds but treats numbers as millis at index time.
- Does this field work with date_histogram aggregations already?
same mapping parameters as the `date` formatter can be used. | ||
|
||
NOTE: You can query across indices that have a date formatter and a | ||
nanosecond formatter. Internally the nanosecond based dates will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe s/formatter/field/? "date formatter" reads to me as "nanosecond field with a format that doesn't include nanos" but I think you mean date
fields?
|
||
long seconds = nanoSecondsSinceEpoch / 1_000_000_000; | ||
long nanos = nanoSecondsSinceEpoch % 1_000_000_000; | ||
return Instant.ofEpochSecond(seconds, nanos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect for negative timestamps? you might want to use Math.floorDiv/floorMod instead.
return 0; | ||
} | ||
|
||
return nanoSecondsSinceEpoch / 1_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect for negative times too.
I'd favor |
regarding treating the field differently with its default formatters: I do think that it makes sense to have it the same as the Also, there is no I am currently working on the fact that both can use the same default date formatter, so that the only difference would be in indexing, but not in configuration. Does that make sense? |
It might be indeed. We might want to add a big warning to the docs that this field will interpret numbers as millis.
Yes, thanks. |
If everyone is good with rejecting pre 1970 dates, I'm happy to add that :-) |
regarding aggregations: those are still fully in milliseconds (I already updated the docs with this limitation, but havent pushed it yet, working on Jim's review comments at the moments). |
This adds a dedicated field mapper that supports nanosecond resolution - at the price of a reduced date range. When using the date field mapper, the time is stored as milliseconds since the epoch in a long in lucene. This field mapper stores the time in nanoseconds since the epoch - which means its range is much smaller, ranging roughly from 1677 to 2262. If you are searching across indices where a date field is in nanoseconds and another date field in milliseconds, then an internal mechanism will convert the nanoseconds to milliseconds losing precision and then merge the search results. Relates elastic#27330
- fix docs - switch integration test to YAML test - fix missing numeric types/fielddata types - remove dead code in SearchPhaseController - rename mapper from nanosecond to date_nanos - only allow positive nanosecond dates (after 1970) - add more tests, ensure that doc value fields are working - add more tests, ensure that aggregations are working This is just a super dirty prototype I hashed together on friday afternoon, I will come up with a cleaner implementation next.
1599452
to
ec6ef2a
Compare
@jimczi I made some more changes that we talked about. One more open thing is to throw an exception when search_after is used in combination with nanosecond/millisecond indices. Can you point me to the code I should take a look at to do this? |
@jimczi I reverted the merging code, the custom nanosecond sort field, and kept the fieldmapper, the custom fielddata for script doc values and field doc values plus tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great @spinscale . I left some comments and questions but it's getting close ;).
server/src/main/java/org/elasticsearch/index/fielddata/plain/SortedNumericDVIndexFieldData.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/fielddata/plain/SortedNumericDVIndexFieldData.java
Outdated
Show resolved
Hide resolved
@@ -56,7 +56,27 @@ public long ramBytesUsed() { | |||
public final ScriptDocValues<?> getLegacyFieldValues() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is unused since #30831. I'll open a separate pr to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the code in here for now, but can also remove it, whatever you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #38087
server/src/main/java/org/elasticsearch/index/fielddata/plain/SortedNumericDVIndexFieldData.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a minor comment but it looks ready to me.
Thanks for iterating @spinscale
@@ -181,21 +247,24 @@ public TypeParser() { | |||
} | |||
} | |||
|
|||
public static final class DateFieldType extends MappedFieldType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class can remain final ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, done
@elasticmachine run tests elasticsearch-ci/1 |
1 similar comment
@elasticmachine run tests elasticsearch-ci/1 |
This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.
When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1677 to 2262.
If you are searching across indices where a date field is in nanoseconds
and another date field in milliseconds, then an internal mechanism will
convert the nanoseconds to milliseconds losing precision and then merge
the search results.
Relates #27330
Closes #32601