-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: use micro instead of nano seconds for timestamps #6604
Conversation
|
Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. acceptance/python_test.go, line 38 [r1] (raw file):
|
Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. acceptance/python_test.go, line 38 [r1] (raw file):
|
@dt hints about how to fix the current test failure? It's due to that old-vs-new-binary testing thing. The format changed, so... |
@mjibson maybe change the |
DO NOT MERGE There is a pending design decision about whether or not we want to do this. See the discussion at https://gitter.im/cockroachdb/cockroach?at=57322aef12fa465406ebb182 |
Reviewed 3 of 3 files at r1. sql/pgwire/types.go, line 211 [r1] (raw file):
Out of curiosity, is there a reason to fix the timezone to New York time? This feels wrong to me. Comments from Reviewable |
Review status: 2 of 4 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful. sql/pgwire/types.go, line 211 [r1] (raw file):
That isn't New York (-5) -- it's the reference time from Go.
https://golang.org/pkg/time/#Parse Comments from Reviewable |
|
This is a nice solution (better than the others in the doc). We'll have to put a big "here be dragons" warning on the Should we advise people to strip the nanoseconds from their existing timestamp data? System.lease has existing nanosecond data that is difficult to strip; going forward should we round all lease expirations to microseconds?
|
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. acceptance/python_test.go, line 36 [r5] (raw file):
|
Review status: 8 of 10 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. sql/testdata/datetime, line 375 [r5] (raw file):
|
Review status: 8 of 10 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. sql/testdata/datetime, line 387 [r5] (raw file):
|
#6597 is caused because we send timestamps over pgwire with nanoseconds, but the python psycopg2 driver expects only microseconds (other drives may as well). The postgres spec confirms microseconds are correct and we should not be sending nanoseconds. This change wraps most uses of parser.DTimestamp[TZ] in a function that rounds nanoseconds to microseconds. If this new code were run on existing data, we would have a problem. Since nanoseconds are rounded, it would be impossible to successfully use equality comparison with any existing timestamp columns. This is because the existing data has nanoseconds, and thus won’t ever equate to a microsecond-only datum. A user cannot `SELECT * from t`, then `SELECT * from t WHERE ts_pk = $1` to fetch a specific timestamp row. Further, if a timestamp column is a primary key, then a user could never select that single row using the PK index. If a user tried to migrate data to this new format, they could have uniqueness violations since two previously unique timestamps (due to nanoseconds) are now equal. To get around this, new functions have been added. Since, by default, all parsed timestamps are rounded, add a new function `parse_timestamp_ns` that does not round. Similarly, `format_timestamp_ns` formats a timestamp as a string with nanosecond precision. The three time sources (statement timestamp, transaction timestamp, timeutil.Now()) have also been rounded. `current_timestamp_ns` has been added to provide a non-rounding timestamp function from the transaction timestamp. Note however that in order to see the nanoseconds of `current_timestamp_ns`, either `extract(nanosecond from x)` or `format_timestamp_ns` must be used. The `system.lease` table, since it does not go over pgwire, retains nanoseconds. Fixes #6597
Review status: 6 of 10 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending. sql/testdata/datetime, line 387 [r5] (raw file):
|
Docs updated with cockroachdb/docs#305 |
34202: sql: remove nanoseconds from INTERVAL r=mjibson a=mjibson Nanoseconds were not representable over pgwire binary mode and were being truncated. We previously encountered this problem with timestamps (#6597) and removed nanoseconds from timestamps at that time. We should have done the same for intervals, since they have the same kind of problem, but did not. It is no longer possible to create intervals with nanosecond precision. Parsing from string or converting from float or decimal will round to the nearest microsecond. Similarly any arithmetic operation (add, sub, mul, div) on intervals will also round to nearest micro. We round instead of truncate because that's what Postgres does. Existing on-disk intervals that contain nanoseconds will retain their underlying value when doing encode/decode operations (so that indexes can be correctly maintained. However there is no longer any way to retrieve the nanosecond part. Converting to string, float, or decimal will first round and then convert. The reasoning for this restriction on existing on-disk nanoseconds is related to the original bug, where we were truncating nanos to micros over binary pgwire. The problem there was that depending on how you queried the data (text or binary mode), you would get a different result, and one of them was wrong. Similarly, it would be wrong to have the results of an interval -> string conversion return a different result than just querying the interval. It is unfortunate that upgrading from 2.1 -> 2.2 will completely remove the ability for users to continue accessing their nanoseconds. Due to that, we must describe in the major release notes this change. Users who require nanoseconds to be present will have to modify their application to use a different data type before upgrading. Further, applications that do comparisons on intervals may have some edge cases errors due to rounding and seeming equality. That is, some intervals with nanos will be rounded up to the next microsecond, possibly changing the results of an existing query. Also, it is not possible to compare equality to any existing interval with on-disk nanos. We believe the number of users affected by this will be very small, and that it is still a necessary change because of the unavoidable pgwire binary mode bug above, which may already have been unknowningly affecting them. Other implementations were worked on, like one where the user could specify the desired precision of each operation (similary to how timestamps work). This ended up being very tedious since there are many operations and they all required the same microsecond precision. Timestamps are different since there are some operations that actually do need nanosecond precision, but intervals have no such need. Thus, it was better to remove the precision argument and hard code rounding. Another attempt was made to replace Nanos with Micros, with an additional nanos field to hold on-disk nanoseconds. This had difficult problems since all of our encoding infra uses nanoseconds on disk. Converting the Micros field to nanos increased the possibilty of overflow due multiplying by 1000. Handling the possibility of this overflow in all possibly locations would require many large and risky changes. The implementation changes here are a bit odd and surprising at first. This change leaves the duration.Nanos field, but (excepting the Decode func) automatically rounds Nanos to nearist micro. This does leave open the possible misuse of the Nanos field, since durations are created directly instead of via a constructor. However, I think this problem is less of a risk as the other attempts listed above. See #6604 and #8864 for previous PRs and discussion about this problem when we fixed it for timestamps. Fixes #32143 Release note (sql change): INTERVAL values are now stored with microsecond precision instead of nanoseconds. Existing intervals with nanoseconds are no longer able to return their nanosecond part. An existing table t with nanoseconds in intervals of column s can round them to the nearest microsecond with `UPDATE t SET s = s + '0s'`. Note that this could potentially cause uniqueness problems if the interval is a primary key. Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Matches Postgres.
Fixes #6597
This change is