Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nanosecond field mapper #37755

Merged
merged 35 commits into from
Feb 4, 2019

Conversation

spinscale
Copy link
Contributor

@spinscale spinscale commented Jan 23, 2019

This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1677 to 2262.

If you are searching across indices where a date field is in nanoseconds
and another date field in milliseconds, then an internal mechanism will
convert the nanoseconds to milliseconds losing precision and then merge
the search results.

Relates #27330
Closes #32601

@spinscale spinscale added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types v7.0.0 labels Jan 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@spinscale
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/1

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @spinscale , I did a first pass and added some comments.

This means that it will accept dates with optional timestamps, which conform
to the formats supported by
<<strict-date-time,`strict_date_optional_time_nanos`>> or
nanoseconds-since-the-epoch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it nanoseconds-since-the-epoch or epoch_millis ? I don't understand which one is used by default.

<1> The `date` field uses the default `format`.
<2> This document uses a plain date.
<3> This document includes a time.
<4> This document uses milliseconds-since-the-epoch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why, shouldn't this use the nanoseconds-since-the-epoch ?

<2> This document uses a plain date.
<3> This document includes a time.
<4> This document uses milliseconds-since-the-epoch.
<5> Note that the `sort` values that are returned are all in milliseconds-since-the-epoch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be in nanoseconds since epoch if there is a single index (the field is declared as a nanosecond date) ?

import static org.elasticsearch.common.time.DateUtils.toLong;

/** A {@link FieldMapper} for dates. */
public class NanosecondDateFieldMapper extends FieldMapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the nanosecond type could be included in DateFieldMapper directly. The code is almost the same, only the parsing changes so I don't think we need a separate class ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid, looking if that works out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have not worked on the other items but just on this one. Maybe you can take a look if my change makes sense. I created an enum, that has different methods for converting a string to a long for milliseconds and nanoseconds. Does that make sense to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, it looks good

import static org.hamcrest.Matchers.arrayWithSize;
import static org.hamcrest.Matchers.is;

public class NanosecondIT extends ESIntegTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add rest tests rather than ESIntegTestCase. These tests shouldn't bee too difficult to migrate to yaml ?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo, great to see that all the work that has been done wrt java-time support now allows to support nanosecond timestamps with so little additional code. Some comments/questions:

  • I'm not a fan of the nanosecond name because it doesn't tell whether this is a date/time or a duration. I'd rather like something like timestamp, timestamp_nanos or even date_nanos.
  • I left a note about some computations that are wrong with negative timestamps (ie. before 1970). While we could fix it, I wonder that a better option might be to reject nanosecond timestamps before 1970 at index time? I can't think of a use-case that requires nanosecond precision before 1970.
  • I agree with Jim it's a bit confusing that this new field supports nanoseconds but treats numbers as millis at index time.
  • Does this field work with date_histogram aggregations already?

same mapping parameters as the `date` formatter can be used.

NOTE: You can query across indices that have a date formatter and a
nanosecond formatter. Internally the nanosecond based dates will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe s/formatter/field/? "date formatter" reads to me as "nanosecond field with a format that doesn't include nanos" but I think you mean date fields?


long seconds = nanoSecondsSinceEpoch / 1_000_000_000;
long nanos = nanoSecondsSinceEpoch % 1_000_000_000;
return Instant.ofEpochSecond(seconds, nanos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect for negative timestamps? you might want to use Math.floorDiv/floorMod instead.

return 0;
}

return nanoSecondsSinceEpoch / 1_000_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect for negative times too.

@spinscale
Copy link
Contributor Author

I'd favor date_nanos here, easier to remember than anything else...

@spinscale
Copy link
Contributor Author

regarding treating the field differently with its default formatters: I do think that it makes sense to have it the same as the date field. Otherwise just switching from date to this one will suddenly index a completely different date when indexing just a number - I think that is more confusing?

Also, there is no epoch_nanos date formatter currently - I'd prefer people using human readable dates when indexing.

I am currently working on the fact that both can use the same default date formatter, so that the only difference would be in indexing, but not in configuration.

Does that make sense?

@jpountz
Copy link
Contributor

jpountz commented Jan 24, 2019

Otherwise just switching from date to this one will suddenly index a completely different date when indexing just a number - I think that is more confusing?

It might be indeed. We might want to add a big warning to the docs that this field will interpret numbers as millis.

Does that make sense?

Yes, thanks.

@spinscale
Copy link
Contributor Author

If everyone is good with rejecting pre 1970 dates, I'm happy to add that :-)

@spinscale
Copy link
Contributor Author

regarding aggregations: those are still fully in milliseconds (I already updated the docs with this limitation, but havent pushed it yet, working on Jim's review comments at the moments).

This adds a dedicated field mapper that supports nanosecond resolution -
at the price of a reduced date range.

When using the date field mapper, the time is stored as milliseconds since the epoch
in a long in lucene. This field mapper stores the time in nanoseconds
since the epoch - which means its range is much smaller, ranging roughly from
1677 to 2262.

If you are searching across indices where a date field is in nanoseconds
and another date field in milliseconds, then an internal mechanism will
convert the nanoseconds to milliseconds losing precision and then merge
the search results.

Relates elastic#27330
- fix docs
- switch integration test to YAML test
- fix missing numeric types/fielddata types
- remove dead code in SearchPhaseController
- rename mapper from nanosecond to date_nanos
- only allow positive nanosecond dates (after 1970)
- add more tests, ensure that doc value fields are working
- add more tests, ensure that aggregations are working

This is just a super dirty prototype I hashed together on friday
afternoon, I will come up with a cleaner implementation next.
@spinscale
Copy link
Contributor Author

@jimczi I made some more changes that we talked about.

One more open thing is to throw an exception when search_after is used in combination with nanosecond/millisecond indices. Can you point me to the code I should take a look at to do this?

@spinscale
Copy link
Contributor Author

@jimczi I reverted the merging code, the custom nanosecond sort field, and kept the fieldmapper, the custom fielddata for script doc values and field doc values plus tests

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great @spinscale . I left some comments and questions but it's getting close ;).

@@ -56,7 +56,27 @@ public long ramBytesUsed() {
public final ScriptDocValues<?> getLegacyFieldValues() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unused since #30831. I'll open a separate pr to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the code in here for now, but can also remove it, whatever you prefer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #38087

docs/reference/mapping/types/date_nanos.asciidoc Outdated Show resolved Hide resolved
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment but it looks ready to me.
Thanks for iterating @spinscale

@@ -181,21 +247,24 @@ public TypeParser() {
}
}

public static final class DateFieldType extends MappedFieldType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class can remain final ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done

@spinscale
Copy link
Contributor Author

@elasticmachine run tests elasticsearch-ci/1

1 similar comment
@spinscale
Copy link
Contributor Author

@elasticmachine run tests elasticsearch-ci/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Foundations/Mapping Index mappings, including merging and defining field types v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants