Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: use micro instead of nano seconds for timestamps #6604

Merged
merged 1 commit into from
May 16, 2016
Merged

sql: use micro instead of nano seconds for timestamps #6604

merged 1 commit into from
May 16, 2016

Conversation

maddyblue
Copy link
Contributor

@maddyblue maddyblue commented May 10, 2016

Matches Postgres.

Fixes #6597


This change is Reviewable

@dt
Copy link
Member

dt commented May 10, 2016

:lgtm:

Previously, mjibson (Matt Jibson) wrote…

sql: use micro instead of nano seconds for timestamps

Matches Postgres.

Fixes #6597


Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


acceptance/python_test.go, line 38 [r1] (raw file):

# verify we can parse a timestamp
cur = conn.cursor()
cur.execute("SELECT now()")

Could probably just throw this in the select above, and then slice it for the assertion?


Comments from Reviewable

@maddyblue
Copy link
Contributor Author

Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


acceptance/python_test.go, line 38 [r1] (raw file):

Previously, dt (David Taylor) wrote…

Could probably just throw this in the select above, and then slice it for the assertion?


I'm inclined to leave it like this because it's clear that the comment is referring specifically to that statement and I think the slicing reduces the code clarity since it merges two logical tests into one statement.


Comments from Reviewable

@dt
Copy link
Member

dt commented May 10, 2016

Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


acceptance/python_test.go, line 38 [r1] (raw file):

Previously, mjibson (Matt Jibson) wrote…

I'm inclined to leave it like this because it's clear that the comment is referring specifically to that statement and I think the slicing reduces the code clarity since it merges two logical tests into one statement.


👍


Comments from Reviewable

@maddyblue
Copy link
Contributor Author

@dt hints about how to fix the current test failure? It's due to that old-vs-new-binary testing thing. The format changed, so...

@dt
Copy link
Member

dt commented May 10, 2016

@mjibson maybe change the SELECT in the acceptance test to just get unix time or something?

@maddyblue
Copy link
Contributor Author

DO NOT MERGE

There is a pending design decision about whether or not we want to do this. See the discussion at https://gitter.im/cockroachdb/cockroach?at=57322aef12fa465406ebb182

@knz
Copy link
Contributor

knz commented May 11, 2016

Reviewed 3 of 3 files at r1.
Review status: 2 of 4 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


sql/pgwire/types.go, line 211 [r1] (raw file):

const pgTimeStampFormatNoOffset = "2006-01-02 15:04:05.999999"
const pgTimeStampFormat = pgTimeStampFormatNoOffset + "-07:00"

Out of curiosity, is there a reason to fix the timezone to New York time? This feels wrong to me.


Comments from Reviewable

@dt
Copy link
Member

dt commented May 11, 2016

Review status: 2 of 4 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


sql/pgwire/types.go, line 211 [r1] (raw file):

const pgTimeStampFormatNoOffset = "2006-01-02 15:04:05.999999"
const pgTimeStampFormat = pgTimeStampFormatNoOffset + "-07:00"

That isn't New York (-5) -- it's the reference time from Go.

The layout defines the format by showing how the reference time, defined to be
Mon Jan 2 15:04:05 -0700 MST 2006 would be interpreted if it were the value.

https://golang.org/pkg/time/#Parse


Comments from Reviewable

@dt
Copy link
Member

dt commented May 13, 2016

:lgtm:

Previously, mjibson (Matt Jibson) wrote…

DO NOT MERGE

There is a pending design decision about whether or not we want to do this. See the discussion at https://gitter.im/cockroachdb/cockroach?at=57322aef12fa465406ebb182


Reviewed 2 of 3 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 8 of 8 files at r5.
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:

This is a nice solution (better than the others in the doc). We'll have to put a big "here be dragons" warning on the _ns functions.

Should we advise people to strip the nanoseconds from their existing timestamp data? System.lease has existing nanosecond data that is difficult to strip; going forward should we round all lease expirations to microseconds?

Previously, dt (David Taylor) wrote…

:lgtm:


Reviewed 1 of 3 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 8 of 8 files at r5.
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


acceptance/python_test.go, line 36 [r5] (raw file):

v = cur.fetchall()
assert v == [(1, 5)]
# verify we can parse a timestamp

Not just any timestamp: inserting a hand-constructed timestamp and reading it back wouldn't trigger the bug. I'm not sure how to say that in a comment, though. We can't just say "a timestamp with non-zero nanoseconds" since that's no longer true. Maybe just a link back to the bug.


sql/parser/builtins.go, line 639 [r5] (raw file):

  },

  "format_timestamp_ns": {

This appears to be untested.


sql/parser/datum.go, line 826 [r5] (raw file):

// Format implements the NodeFormatter interface.
func (d *DTimestampTZ) Format(buf *bytes.Buffer, f FmtFlags) {
  buf.WriteString(d.UTC().Format(timestampNodeFormat))

Not new in this change, but are DTimestamp.Format and DTimestampTZ supposed to be identical?


sql/parser/eval.go, line 1998 [r5] (raw file):

const (
  dateFormat                            = "2006-01-02"
  timestampFormat                       = "2006-01-02 15:04:05"

Where do we use these second-resolution formats?


sql/testdata/datetime, line 375 [r5] (raw file):

# nanoseconds are truncated to microseconds
query I
SELECT extract(nanosecond from timestamp '2001-04-10 12:04:59.34565423')

Test/document the way this interacts with unique indexes: Inserting two timestamps that differ only in nanoseconds will fail with this syntax, while it works if parse_timestamp_ns is used.


sql/testdata/datetime, line 387 [r5] (raw file):

# current_timestamp_ns has nanoseconds
query B
SELECT extract(nanosecond from current_timestamp_ns()) % 1000 != 0

These tests will fail about 1 in 1000 times (or more often if the underlying clock is just padding out a less-precise timer to give a value in nanoseconds). The nightly stress runs go for at least 300 iterations on this package, so this will have an annoyingly high false positive rate. We need to generate multiple timestamps and verify that at least some of them have non-zero nanoseconds. (here and in the system.lease test below)


Comments from Reviewable

@maddyblue
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


acceptance/python_test.go, line 36 [r5] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Not just any timestamp: inserting a hand-constructed timestamp and reading it back wouldn't trigger the bug. I'm not sure how to say that in a comment, though. We can't just say "a timestamp with non-zero nanoseconds" since that's no longer true. Maybe just a link back to the bug.


Done.


sql/parser/builtins.go, line 639 [r5] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This appears to be untested.


Done.


sql/parser/datum.go, line 826 [r5] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Not new in this change, but are DTimestamp.Format and DTimestampTZ supposed to be identical?


That behavior is unspecified. However I agree with you and will propose a fix to this in a followup PR.


sql/parser/eval.go, line 1998 [r5] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Where do we use these second-resolution formats?


The subsecond precision part is only required for output, not parsing. See: https://play.golang.org/p/2yTGWPoEt4


sql/testdata/datetime, line 387 [r5] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

These tests will fail about 1 in 1000 times (or more often if the underlying clock is just padding out a less-precise timer to give a value in nanoseconds). The nightly stress runs go for at least 300 iterations on this package, so this will have an annoyingly high false positive rate. We need to generate multiple timestamps and verify that at least some of them have non-zero nanoseconds. (here and in the system.lease test below)


I'm not sure we can do this in the testdata tests. The timestamp data gets set at the beginning of each statement. Thus, we'd have to have multiple statements and OR the results between them. I think these would have to be moved to a _test.go file to properly test them. Do you know another way?


Comments from Reviewable

@maddyblue
Copy link
Contributor Author

Review status: 8 of 10 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


sql/testdata/datetime, line 375 [r5] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Test/document the way this interacts with unique indexes: Inserting two timestamps that differ only in nanoseconds will fail with this syntax, while it works if parse_timestamp_ns is used.


Done.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 8 of 10 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


sql/testdata/datetime, line 387 [r5] (raw file):

Previously, mjibson (Matt Jibson) wrote…

I'm not sure we can do this in the testdata tests. The timestamp data gets set at the beginning of each statement. Thus, we'd have to have multiple statements and OR the results between them. I think these would have to be moved to a _test.go file to properly test them. Do you know another way?


You could do it in a testdata file with a table: INSERT INTO t VALUES (now()), repeated a few times, then SELECT 1 FROM t WHERE extract(nanosecond from value) % 1000 != 0 LIMIT 1


Comments from Reviewable

#6597 is caused because we send timestamps over pgwire with nanoseconds,
but the python psycopg2 driver expects only microseconds (other drives
may as well). The postgres spec confirms microseconds are correct and we
should not be sending nanoseconds.

This change wraps most uses of parser.DTimestamp[TZ] in a function that
rounds nanoseconds to microseconds.

If this new code were run on existing data, we would have a problem. Since
nanoseconds are rounded, it would be impossible to successfully use equality
comparison with any existing timestamp columns. This is because the existing
data has nanoseconds, and thus won’t ever equate to a microsecond-only
datum. A user cannot `SELECT * from t`, then `SELECT * from t WHERE ts_pk
= $1` to fetch a specific timestamp row. Further, if a timestamp column
is a primary key, then a user could never select that single row using
the PK index. If a user tried to migrate data to this new format, they
could have uniqueness violations since two previously unique timestamps
(due to nanoseconds) are now equal.

To get around this, new functions have been added. Since, by default,
all parsed timestamps are rounded, add a new function `parse_timestamp_ns`
that does not round. Similarly, `format_timestamp_ns` formats a timestamp
as a string with nanosecond precision.

The three time sources (statement timestamp, transaction timestamp,
timeutil.Now()) have also been rounded. `current_timestamp_ns`
has been added to provide a non-rounding timestamp function from the
transaction timestamp. Note however that in order to see the nanoseconds
of `current_timestamp_ns`, either `extract(nanosecond from x)` or
`format_timestamp_ns` must be used.

The `system.lease` table, since it does not go over pgwire, retains
nanoseconds.

Fixes #6597
@maddyblue
Copy link
Contributor Author

Review status: 6 of 10 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


sql/testdata/datetime, line 387 [r5] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

You could do it in a testdata file with a table: INSERT INTO t VALUES (now()), repeated a few times, then SELECT 1 FROM t WHERE extract(nanosecond from value) % 1000 != 0 LIMIT 1


Right!


Comments from Reviewable

@maddyblue maddyblue merged commit c25890a into cockroachdb:master May 16, 2016
@maddyblue maddyblue deleted the sql-micro-timestamp branch May 16, 2016 17:24
@jseldess
Copy link
Contributor

Docs updated with cockroachdb/docs#305

craig bot pushed a commit that referenced this pull request Jan 31, 2019
34202: sql: remove nanoseconds from INTERVAL r=mjibson a=mjibson

Nanoseconds were not representable over pgwire binary mode and were being
truncated. We previously encountered this problem with timestamps (#6597)
and removed nanoseconds from timestamps at that time. We should have
done the same for intervals, since they have the same kind of problem,
but did not.

It is no longer possible to create intervals with nanosecond
precision. Parsing from string or converting from float or decimal will
round to the nearest microsecond. Similarly any arithmetic operation
(add, sub, mul, div) on intervals will also round to nearest micro. We
round instead of truncate because that's what Postgres does.

Existing on-disk intervals that contain nanoseconds will retain their
underlying value when doing encode/decode operations (so that indexes can
be correctly maintained. However there is no longer any way to retrieve
the nanosecond part. Converting to string, float, or decimal will first
round and then convert.

The reasoning for this restriction on existing on-disk nanoseconds is
related to the original bug, where we were truncating nanos to micros over
binary pgwire. The problem there was that depending on how you queried
the data (text or binary mode), you would get a different result, and
one of them was wrong. Similarly, it would be wrong to have the results
of an interval -> string conversion return a different result than just
querying the interval.

It is unfortunate that upgrading from 2.1 -> 2.2 will completely remove
the ability for users to continue accessing their nanoseconds. Due to
that, we must describe in the major release notes this change. Users who
require nanoseconds to be present will have to modify their application
to use a different data type before upgrading. Further, applications
that do comparisons on intervals may have some edge cases errors due to
rounding and seeming equality. That is, some intervals with nanos will
be rounded up to the next microsecond, possibly changing the results of
an existing query. Also, it is not possible to compare equality to any
existing interval with on-disk nanos. We believe the number of users
affected by this will be very small, and that it is still a necessary
change because of the unavoidable pgwire binary mode bug above, which
may already have been unknowningly affecting them.

Other implementations were worked on, like one where the user could
specify the desired precision of each operation (similary to how
timestamps work). This ended up being very tedious since there
are many operations and they all required the same microsecond
precision. Timestamps are different since there are some operations
that actually do need nanosecond precision, but intervals have no such
need. Thus, it was better to remove the precision argument and hard
code rounding. Another attempt was made to replace Nanos with Micros,
with an additional nanos field to hold on-disk nanoseconds. This had
difficult problems since all of our encoding infra uses nanoseconds
on disk. Converting the Micros field to nanos increased the possibilty
of overflow due multiplying by 1000. Handling the possibility of this
overflow in all possibly locations would require many large and risky
changes.

The implementation changes here are a bit odd and surprising at
first. This change leaves the duration.Nanos field, but (excepting the
Decode func) automatically rounds Nanos to nearist micro. This does leave
open the possible misuse of the Nanos field, since durations are created
directly instead of via a constructor. However, I think this problem is
less of a risk as the other attempts listed above.

See #6604 and #8864 for previous PRs and discussion about this problem
when we fixed it for timestamps.

Fixes #32143

Release note (sql change): INTERVAL values are now stored with microsecond
precision instead of nanoseconds. Existing intervals with nanoseconds
are no longer able to return their nanosecond part. An existing table t
with nanoseconds in intervals of column s can round them to the nearest
microsecond with `UPDATE t SET s = s + '0s'`. Note that this could
potentially cause uniqueness problems if the interval is a primary key.

Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants