Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in JavaDateFormatter when using DOY #89096

Closed
HayDegha0917 opened this issue Aug 3, 2022 · 16 comments · Fixed by #89693
Closed

Bug in JavaDateFormatter when using DOY #89096

HayDegha0917 opened this issue Aug 3, 2022 · 16 comments · Fixed by #89693
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team team-discuss

Comments

@HayDegha0917
Copy link

HayDegha0917 commented Aug 3, 2022

Elasticsearch Version

7.10.2

Installed Plugins

No response

Java Version

1.8

OS Version

CentOS

Problem Description

When defining a mapping that has a field that uses DOY format, data gets indexed correctly However, we run into issues when using the rounding parser. See steps below for a test case.

We believe the issue is related to this section of code:

https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java#L48-L59

Steps to Reproduce

DELETE /test-doy-date

PUT /test-doy-date
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis"
      }
    }
  }
}

POST _bulk
{ "index" : { "_index" : "test-doy-date", "_id" : "1" } }
{ "timestamp" : "2022-104T14:08:30.100" }
{ "index" : { "_index" : "test-doy-date", "_id" : "2" } }
{ "timestamp" : "2022-104T14:08:30.540Z" }
{ "index" : { "_index" : "test-doy-date", "_id" : "3" } }
{ "timestamp" : "2022-104T14:08:31.100111" }
{ "index" : { "_index" : "test-doy-date", "_id" : "4" } }
{ "timestamp" : "2022-104T14:08:31.234567Z" }

GET /test-doy-date/_search

GET /test-doy-date/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2022-104T14:08:30.293",
            "lte": "2022-104T14:08:31.355",
            "format": "yyyy-DDD'T'HH:mm:ss.SSS"
          }
        }
      }
    }
  }
}

All steps complete without error except for the range filter, which outputs:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]]"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test-doy-date",
        "node" : "8QwQl8a5SvWteKZqObjW5g",
        "reason" : {
          "type" : "parse_exception",
          "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]]",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]",
            "caused_by" : {
              "type" : "date_time_parse_exception",
              "reason" : "date_time_parse_exception: Text '2022-104T14:08:31.355' could not be parsed: Conflict found: Field DayOfYear 1 differs from DayOfYear 104 derived from 2022-01-01",
              "caused_by" : {
                "type" : "date_time_exception",
                "reason" : "date_time_exception: Conflict found: Field DayOfYear 1 differs from DayOfYear 104 derived from 2022-01-01"
              }
            }
          }
        }
      }
    ]
  },
  "status" : 400
}

If we omit the explicit format in the filter, we instead get

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]]"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test-doy-date",
        "node" : "MB7SIp_pSrSComH5XlvIPQ",
        "reason" : {
          "type" : "parse_exception",
          "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]]",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]",
            "caused_by" : {
              "type" : "date_time_parse_exception",
              "reason" : "Failed to parse with all enclosed parsers"
            }
          }
        }
      }
    ]
  },
  "status" : 400
}

If we change lte to lt in the filter, we get a valid response:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:31.100111"
        }
      },
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:31.234567Z"
        }
      },
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:30.540Z"
        }
      }
    ]
  }
}

If we use a custom format in the field mapping that does not use DOY, everything works fine.

Logs (if relevant)

No response

@HayDegha0917 HayDegha0917 added >bug needs:triage Requires assignment of a team area label labels Aug 3, 2022
@HayDegha0917
Copy link
Author

HayDegha0917 commented Aug 3, 2022

Note that we tried a few other variants to "make it work":

  • Moved the order of the custom date format in the field mapping so it comes up first. Still get the same error.
  • Dropped the custom formats in the field mapping. Use an explicit format in the query. Still get the same error.

@nik9000
Copy link
Member

nik9000 commented Aug 4, 2022

I'll debug this a bit and figure out where it lands.

@nik9000
Copy link
Member

nik9000 commented Aug 4, 2022

yeah - it looks indeed like it's https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java#L48-L59. All those default looks wrong if you are using month of year. The reason lt works is because it turns off the "rounding" that you bumped into.

FWIW, if I comment out those two bits of rounding it doesn't crash. Not sure if that's the fix, but it doesn't crash.

Here's the test I'm running locally:

curl -uelastic:password -HContent-Type:application/json -XDELETE localhost:9200/test-doy-date

curl -uelastic:password -HContent-Type:application/json -XPUT localhost:9200/test-doy-date?pretty -d'
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "yyyy-DDD'"'"'T'"'"'HH:mm:ss.SSS||yyyy-DDD'"'"'T'"'"'HH:mm:ss.SSSX||yyyy-DDD'"'"'T'"'"'HH:mm:ss.n||yyyy-DDD'"'"'T'"'"'HH:mm:ss.nX||strict_date_optional_time||epoch_millis"
      }
    }
  }
}
'

curl -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_bulk?pretty&refresh' -d'
{ "index" : { "_index" : "test-doy-date", "_id" : "1" } }
{ "timestamp" : "2022-104T14:08:30.100" }
{ "index" : { "_index" : "test-doy-date", "_id" : "2" } }
{ "timestamp" : "2022-104T14:08:30.540Z" }
{ "index" : { "_index" : "test-doy-date", "_id" : "3" } }
{ "timestamp" : "2022-104T14:08:31.100111" }
{ "index" : { "_index" : "test-doy-date", "_id" : "4" } }
{ "timestamp" : "2022-104T14:08:31.234567Z" }
'

curl -uelastic:password -HContent-Type:application/json -XGET localhost:9200/test-doy-date/_search?pretty

curl -uelastic:password -HContent-Type:application/json -XGET 'localhost:9200/test-doy-date/_search?pretty&error_trace' -d'
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2022-104T14:08:30.293",
            "lte": "2022-104T14:08:31.355",
            "format": "yyyy-DDD'"'"'T'"'"'HH:mm:ss.SSS"
          }
        }
      }
    }
  }
}'

@nik9000 nik9000 added :Core/Infra/Core Core issues without another label team-discuss and removed needs:triage Requires assignment of a team area label labels Aug 4, 2022
@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Aug 4, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@nik9000
Copy link
Member

nik9000 commented Aug 4, 2022

I've sent this to the core-infra folks who know this code much better than I do. This is specifically about the "round up" parser.

@HayDegha0917
Copy link
Author

@nik9000 do you know the conditions that cause ES to use the "round up" parser? It would help us communicate the extent of the issue to our users.

Also, please note that this issue does not occur with ES6, presumably because it uses Joda time rather than Java time.

@nik9000
Copy link
Member

nik9000 commented Aug 4, 2022

Also, please note that this issue does not occur with ES6, presumably because it uses Joda time rather than Java time.

Yeah. I imagine that's what it was. That was a big change and lots of stuff snuck in, unfortunately.

@nik9000 do you know the conditions that cause ES to use the "round up" parser? It would help us communicate the extent of the issue to our users.

It looks like gt and lte on range queries on date, date_nanos, and date_range fields. That looks like it. There are some other places that "round up" but they do it to non-dates. I think. It's not 100% clear everywhere without a bunch of digging.

@HayDegha0917
Copy link
Author

@nik9000 I am surprised this was not reported before. In my industry, DOY searches are quite common because we several systems that prefer to denote dates as DOY (Mars 2020, Deep Space Network, etc.). I assume that others have run into the issue but are simply converting DOY into MM/DD. Unfortunately, this is not a great option for us because we allow users to submit free-form queries, so we would have to parse and convert those queries.

How/when could we expect to have a timeframe for a fix?

@nik9000
Copy link
Member

nik9000 commented Aug 9, 2022 via email

@pgomulka
Copy link
Contributor

pgomulka commented Aug 9, 2022

it is the same family of bugs as #58986

it is because when parsing yyyy-DDD'T'HH:mm:ss.SSS we will try to default missing fields like Month etc https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java#L48-L59
So when parsing 2022-104T14:08:30.293 this results in interim - before resolving the date
{SecondOfMinute=30, DayOfMonth=1, YearOfEra=2022, DayOfYear=104, MinuteOfHour=8, MonthOfYear=1, HourOfDay=14, NanoOfSecond=293000000},null
Note that ther is MonthOfYear and DayOfMonth defaulted. When java resolves a date part to 2022-01-01. See that a monthofyear is used before dayOfYear:
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/time/chrono/AbstractChronology.java#L439

I think the conclusion is the same as in the linked issue. #58986 (comment)
I am afraid I don't know a way to fix this. I will raise this to the core/infra team

@HayDegha0917
Copy link
Author

HayDegha0917 commented Aug 9, 2022

@pgomulka I don't pretend to understand the inner workings fo ES, but I don't understand why you don't delegate the job of parsing to the JDK and then apply whatever rounding algorithm you want to the outcome. In fact, you could easily parse to the internal representation then back to a standard representation and then apply your round-up logic (SimpleDateFormat,parse().format())). Or are you saying that the bug exists in OpenJDK?

@pgomulka
Copy link
Contributor

pgomulka commented Aug 9, 2022

@HayDegha0917 I did open an issue to the JDK regarding this https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8250514 but the problem comes down to ES not being able to know up front what format (and fields effectively) will be used. If we knew what format will be used, we could use the advice from the JDK issue.

The reason this worked previously in 6.x was because joda had parseInto method, which from what I understand was a "predecessor" of parseDefaulting in java.time. Turns out it works differently in some cases.

I am open for ideas and PRs. Do read the original issue

@HayDegha0917
Copy link
Author

@pgomulka I see the advice in the JDK issue and, from their perspective, it makes complete sense.

Have you tried default all of the available fields of ChronoField in

builder -> builder.parseDefaulting(ChronoField.MONTH_OF_YEAR, 1L)
? I assume that you have and it did not work.

I also do not entirely understand why you are explicitly setting the default values. I have not programmed in Java for some time, so I am not sure what would happen if you simply omitted all the parseDefaulting calls, maybe in conjunction with switching to SMART parsing. From a user perspective, I can tell you that I would prefer the call to fail fast because I failed to supply a field rather than getting an error that prevents me from using a date format.

@HayDegha0917
Copy link
Author

We did some additional investigation and found this additional issue. If you follow all the steps of the test case above and, instead of submitting the query in DSL format, you submit it as a query string, then Elasticsearch swallows the exception silently and returns incorrect results:

GET /test-doy-date/_search
{
  "query": {
    "query_string": {
      "query": "timestamp:[2022-104T14:08:30.293 TO 2022-104T14:08:31.355]"
    }
  }
}

The query above returns

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

In fact, it should return 3 results. If you modify the query as follows:

GET /test-doy-date/_search
{
  "query": {
    "query_string": {
      "query": "timestamp:[2022-104T14:08:30.293 TO 2022-104T14:08:31.355}"
    }
  }
}

then it returns 3 results:

{
  "took" : 143,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:31.100111"
        }
      },
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:31.234567Z"
        }
      },
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:30.540Z"
        }
      }
    ]
  }
}

Consequently, the user does not even have a notion that something went wrong.

@grcevski
Copy link
Contributor

Hi @HayDegha0917, thanks for these additional instructions. We have a solution in mind and we are working on a fix that will handle this correctly.

@HayDegha0917
Copy link
Author

@grcevski good to hear! Thanks for directing attention to this issue. Please let us know when a patch is available (or, I suppose, GitHub will let us know).

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Aug 29, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes elastic#89096
closes elastic#58986
pgomulka added a commit that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes #89096
closes #58986
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes elastic#89096
closes elastic#58986
pgomulka added a commit that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfMonth, MonthOfYear, ClockHourOfAMPM and HourOfAmPM

closes #89096
closes #58986
backports #89693
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes elastic#89096
closes elastic#58986
pgomulka added a commit that referenced this issue Sep 5, 2022
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear

However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.

This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM

closes #89096
closes #58986
backports #89693
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team team-discuss
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants