Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve common time and date crates support #745

Merged
merged 7 commits into from
Nov 10, 2023

Conversation

Anfid
Copy link
Contributor

@Anfid Anfid commented Jun 15, 2023

Fixes: #741

Goals

Goal of this MR is to improve the support of common time crates.

Changes

Breaking changes were introduced and a major version bump would be required.

  • Reworked scylla::frame::value date and time types:
    • Align structs with internal CQL representation;
    • Rename to add Cql- prefix to explicitly show direct relation to CQL types;
  • Moved chrono dependency of scylla-cql behind scylla-cql/chrono feature;
  • Added optional time dependency with full serialization support;
  • Implemented traits to convert between chrono/time types and CQL types;
  • Removed implementation of FromCqlValue<CqlValue> for chrono::Duration as it may be misleading. CqlTime and CqlTimestamp structs are no longer a chrono::Duration wrappers;
  • CqlValue::as_date function signature changed to return time::Date instead of chrono::NaiveDate for better naming consistency;
  • CqlValue::as_cql_time and CqlValue::as_cql_timestamp functions now ensure that row is of Time or Timestamp type correspondingly. Upstream behavior is to parse all bignum types as time or timestamp;

Points for discussion

  1. If we can guarantee that CQL time type never exceeds 23:59:59.999,999,999 extra guarantees can be given about CqlTime struct and implementations of TryFrom<CqlTime> for chrono::NaiveTime and time::Time can be replaced with From<CqlTime>. However, this would require restricting access to internal value of CqlTime (like implementing constructor and getter). I wasn't sure if we can give this guarantee, so it's not present in current implementation;
  2. Support for std::time::SystemTime was intentionally not implemented as it represents local time and not UTC time. This may and probably will lead to incorrect values in the database tables, as CQL timestamp is UTC;
  3. scylla::history module requires chrono dependency for scylla crate. I wasn't sure about how to best handle it nor its intended use case, so I didn't touch it. Potentially it could be nice to change history::TimePoint type based on what features were used.

Other concerns

I wasn't able to run test/dockerized/run.sh script without changes. On my machine no containers were found with grep cluster_scylla command on line 5. It worked after replacing it with grep cluster-scylla locally. I would need more info on behavior on machines other than mine to push this fix.

This MR will produce a lot of conflicts with #665, I'll tag here it so it won't go unnoticed.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Pre-merge checklist

  • Set up CI for feature flags combinations.

I would appreciate some help with setting up CI for feature flag combinations as added functionality is NOT tested with github CI.

@avelanarius avelanarius requested a review from wprzytula August 2, 2023 10:50
@wprzytula wprzytula added the API-breaking This might introduce incompatible API changes label Aug 3, 2023
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! This is a very valuable PR.
Only small nits here, mainly typos and old naming leftovers. Wonderful to have the time dependencies separated with a feature, and with proper test coverage.
Will you be able to address the comments?

Thanks a lot for contributing.

docs/source/data-types/date.md Outdated Show resolved Hide resolved
docs/source/data-types/timestamp.md Outdated Show resolved Hide resolved
scylla-cql/src/frame/value.rs Outdated Show resolved Hide resolved
Comment on lines 50 to 62
/// Wrapper used to differentiate between Time and Timestamp as sending values
/// Milliseconds since unix epoch
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct Timestamp(pub Duration);
pub struct CqlTimestamp(pub i64);

/// Wrapper used to differentiate between Time and Timestamp as sending values
/// Nanoseconds since midnight
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docstrings, too. We no longer have Time nor Timestamp types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since original docstrings don't make a lot of sense with this implementation, I decided to rewrite the entire docstring, rather than just update the type name

scylla-cql/src/frame/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/frame/response/cql_to_rust.rs Outdated Show resolved Hide resolved
Comment on lines 218 to 209
#[cfg(feature = "chrono")]
impl FromCqlVal<CqlValue> for DateTime<Utc> {
fn from_cql(cql_val: CqlValue) -> Result<Self, FromCqlValError> {
let timestamp = cql_val.as_bigint().ok_or(FromCqlValError::BadCqlType)?;
match Utc.timestamp_millis_opt(timestamp) {
chrono::LocalResult::Single(datetime) => Ok(datetime),
_ => Err(FromCqlValError::BadVal),
}
CqlTimestamp(timestamp)
.try_into()
.map_err(|_| FromCqlValError::BadVal)
}
}

#[cfg(feature = "time")]
impl FromCqlVal<CqlValue> for time::OffsetDateTime {
fn from_cql(cql_val: CqlValue) -> Result<Self, FromCqlValError> {
let timestamp = cql_val.as_bigint().ok_or(FromCqlValError::BadCqlType)?;
CqlTimestamp(timestamp)
.try_into()
.map_err(|_| FromCqlValError::BadVal)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description, you wrote:

CqlValue::as_cql_time and CqlValue::as_cql_timestamp functions now ensure that row is of Time or Timestamp type correspondingly. Upstream behavior is to parse all bignum types as time or timestamp;

Great that you caught that! Could you ensure this here, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, not sure why I was hesitant to type check them too in the original commit

scylla/src/transport/cql_types_test.rs Outdated Show resolved Hide resolved
scylla/src/transport/cql_types_test.rs Outdated Show resolved Hide resolved
scylla/src/transport/cql_types_test.rs Outdated Show resolved Hide resolved
@wprzytula
Copy link
Collaborator

I would appreciate some help with setting up CI for feature flag combinations as added functionality is NOT tested with github CI.

How can we help you? Would you like us to provide a code snippet to put in CI config files that sets up tests for feature flag combinations?

@Anfid
Copy link
Contributor Author

Anfid commented Aug 3, 2023

Would you like us to provide a code snippet to put in CI config files that sets up tests for feature flag combinations?

Yes, that would help a lot. I believe you also have the permission to push commits directly to this MR, so just pushing an extra commit is also an option, if that's easier for you.

Alternatively, you could just say which feature combinations you want tested, but it may take me extra time to implement it in GitHub CI, as I'm not super familiar with it. My main concern was that testing feature flag combinations extensively may get quite costly, as each flag combination would require a recompilation and might mess up with cache. I wasn't able to decide on where would be a good balance between test exhaustiveness and CI execution time for this project, so I just left a comment asking for some help with it instead.

@wprzytula
Copy link
Collaborator

Yes, that would help a lot. I believe you also have the permission to push commits directly to this MR, so just pushing an extra commit is also an option, if that's easier for you.

Let's do so. Once you make the corrections I requested, I will add a commit with CI-related changes.

@Anfid Anfid force-pushed the time-serialization branch from 8504102 to 8808191 Compare August 23, 2023 00:22
@Anfid
Copy link
Contributor Author

Anfid commented Aug 23, 2023

Finally got some time to work on corrections, everything should be resolved now. After resolving conflicts with main, some unexpected issues came up in scylla/src/utils/pretty.rs. Math in formatting has to be reworked, so I'll work on that shortly.

Meanwhile, could you please also check if issue with test/dockerized/run.sh script I described is something local to my machine (macOS) or is it an issue that should be fixed for everyone? I can also push an update for this script, since I have already updated it locally to run the tests. (never mind, script was already fixed upstream)

@Anfid Anfid force-pushed the time-serialization branch from 8808191 to 00ef6d9 Compare August 23, 2023 12:47
@Anfid
Copy link
Contributor Author

Anfid commented Aug 23, 2023

Conflicts with the base branch were resolved, please also see additional changes to scylla/src/utils/pretty.rs. I decided to keep using chrono for rendering time types.

It should also be possible to remove scylla's dependency on chrono, but I believe it to be out of scope of this PR, because it would require a lot of other changes unrelated to time crates support.

Copy link
Collaborator

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay, I'll take over the review now and will give it more concentrated attention.

Thanks, I like the direction of the PR very much. I like the idea of CqlTime/CqlTimestamp/CqlDate which are independent from external crates. I appreciate comprehensive tests, implementations of conversion traits and the updated documentation (folks usually forget about the last one).

It's a lot of changes introduced at once, so I hope I didn't miss anything, but the implementation looks OK. I left a small number of asks, the biggest ones being:

  1. Use the new CqlXYZ types in the CqlValue. This will require mechanical changes in multiple places, but I believe this should simplify some of the code so that manually wrapping/unwrapping from/to these types is not needed.
  2. Make the new CqlXYZ types more hermetic and provide a constructor + getter for them. This would be nice to have but I think can be done in a followup.

Points for discussion

  1. If we can guarantee that CQL time type never exceeds 23:59:59.999,999,999 extra guarantees can be given about CqlTime struct and implementations of TryFrom<CqlTime> for chrono::NaiveTime and time::Time can be replaced with From<CqlTime>. However, this would require restricting access to internal value of CqlTime (like implementing constructor and getter). I wasn't sure if we can give this guarantee, so it's not present in current implementation;

I like the idea to restrict the access to the internal field of CqlTime/CqlTimestamp/CqlDate. I'm not sure what are your concerns about not being able to provide the guarantees, but to me it looks like we just need to do as you wrote: make the inner field private, introduce a constructor and a getter. Of course, the rest of the module that the types are defined in will have access to the internal field but you can reduce the risk of manipulating on the type directly by moving it to a dedicated module and then re-exporting it back in value.rs.

That said, I don't consider it a blocker for merging this PR, we can leave it for a followup if you want.

  1. Support for std::time::SystemTime was intentionally not implemented as it represents local time and not UTC time. This may and probably will lead to incorrect values in the database tables, as CQL timestamp is UTC;

Sure, makes perfect sense.

  1. scylla::history module requires chrono dependency for scylla crate. I wasn't sure about how to best handle it nor its intended use case, so I didn't touch it. Potentially it could be nice to change history::TimePoint type based on what features were used.

@cvybhu do you remember scylla::history uses chrono? AFAIR the history module was supposed to allow to track what the driver was doing during the query; I can see some time points being created from SystemTime::now() - would it be possible to switch to using SystemTime for that? Additionally, perhaps it makes sense to use Instants instead of SystemTime as the latter is not monotonic (https://doc.rust-lang.org/std/time/struct.SystemTime.html).


As for the CI checks, as the time/chrono features seem pretty orthogonal, I think it's sufficient to test whether the crate compiles with and without all serialization-related features, and only run tests with all serialization features enabled.

I pushed some commits to your branch that modify the existing Github workflows. The CI will most likely fail because of the new version of clippy that started to complain about some existing code (please rebase), but it should be a good starting point. They should work after rebase, but I can't tell for sure because I didn't test it - I'll correct them if there are any issues after rebase.

docs/source/data-types/data-types.md Outdated Show resolved Hide resolved
scylla-cql/src/frame/response/result.rs Outdated Show resolved Hide resolved
scylla-cql/src/frame/response/cql_to_rust.rs Outdated Show resolved Hide resolved
scylla-cql/src/frame/value.rs Show resolved Hide resolved
scylla/src/utils/pretty.rs Show resolved Hide resolved
Anfid and others added 6 commits November 4, 2023 17:02
When running scylla-cql tests separately, serde 'derive' feature is not
set explicitly. This results in failing builds when the feature is enabled.
Builds were not failing earlier due to 'scylla-cql' being used as a dependency
of a crate that sets 'serde/derive' feature.
Adds support of 'time' and 'chrono' types for CQL queries.

Changelist:
- Rework scylla-cql 'Time', 'Date' and 'Timestamp' types to align with
  CQL data representation;
- Add 'Cql'- prefix to builtin date and time types;
- Implement conversion traits between chrono, time and Cql- types;
- Implement CQL 'Value' trait for 'chrono' and 'time' types;
- Implement 'FromCqlVal' trait for 'chrono' and 'time' types.
This feature enables all other features related to serialization. It
will be useful in CI to enable and test all serialization-related
features at once.
The definitions of GitHub workflow files for Scylla and Cassandra tests
are adjusted so that:

- We check that the code compiles with and without serialization
  features,
- All serialization features are enabled when Scylla/Cassandra tests are
  running.
@Anfid Anfid force-pushed the time-serialization branch from e7b7de2 to 9b23a79 Compare November 5, 2023 11:29
@Anfid
Copy link
Contributor Author

Anfid commented Nov 5, 2023

Sorry for the huge delay on my side too :)

I've rebased the changes. There were a couple of conflicts I had to resolve, you might want to check those parts of code. In particular, there were conflicts with .github workflows, I've updated your implementation to run against --all-targets, as it is currently in upstream.

There was an issue with 2 of my unit tests where I used #[cfg(features = "chrono")] instead of #[cfg(feature = "chrono")]. Luckily updated Rust compiler was able to catch it and I fixed the tests accordingly.

All CQL time types in the CqlValue type were changed to wrapper types, as you've suggested. I've left their internal fields public for now, I'll try to update it in the follow-up PR.

I like the idea to restrict the access to the internal field of CqlTime/CqlTimestamp/CqlDate. I'm not sure what are your concerns about not being able to provide the guarantees

My concerns are the leap second representation within the internal database CQL type. I'm not sure if it's supported, so I'd prefer to see any explicit documentation on that before giving any guarantees to the users of the library. Regarding restricting access to the internal fields of wrapper types. I really like that idea, but I'd prefer to do it in a separate follow-up PR to avoid further delays.

@piodul
Copy link
Collaborator

piodul commented Nov 9, 2023

All CQL time types in the CqlValue type were changed to wrapper types, as you've suggested. I've left their internal fields public for now, I'll try to update it in the follow-up PR.

👍

I like the idea to restrict the access to the internal field of CqlTime/CqlTimestamp/CqlDate. I'm not sure what are your concerns about not being able to provide the guarantees

My concerns are the leap second representation within the internal database CQL type. I'm not sure if it's supported, so I'd prefer to see any explicit documentation on that before giving any guarantees to the users of the library. Regarding restricting access to the internal fields of wrapper types. I really like that idea, but I'd prefer to do it in a separate follow-up PR to avoid further delays.

I see. This is from the protocol spec:

6.16 time

  An 8 byte two's complement long representing nanoseconds since midnight.
  Valid values are in the range 0 to 86399999999999

It looks like leap seconds are not supported, so - provided that the internal field of CqlTime is made private - maybe it would make sense to change the conversion from CqlTime to chrono::NaiveTime to be infallible, but not the other way around. It's fine to move this to the followup which makes the internal fields private.

I'll re-review the PR now.

@piodul
Copy link
Collaborator

piodul commented Nov 9, 2023

It looks like leap seconds are not supported, so - provided that the internal field of CqlTime is made private - maybe it would make sense to change the conversion from CqlTime to chrono::NaiveTime to be infallible, but not the other way around. It's fine to move this to the followup which makes the internal fields private.

Sigh... Specification is one thing, but implementation is another. It looks like Scylla happily accepts an out-of-range time value.

Although we already have an 'if' on a deserialization path which returns an error when out-of-range values are returned from the database, I think it should be removed to take account for incorrectly inserted values into the database. Likewise, I take back what I said about the conversion from CqlTime to NaiveTime - it should be fallible.

Copy link
Collaborator

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The only thing that needs fixing is the lockfile in the MSRV check. As you modified the Cargo.toml files, you need to regenerate the Cargo.toml.msrv.

Our current MSRV is 1.66. Last time the lockfile was regenerated, all the newest versions of the dependencies worked with it except for the toml_datetime indirect dependency, but time may pull in some other problematic dependencies.

AFAIK the process is:

  • Generate the Cargo.lock locally
  • Pin versions for deps that won't compile in 1.66 to the last version which works, for toml_update it would be cargo update -p toml_datetime --precise 0.6.3
  • Rename to Cargo.lock.msrv

@Lorak-mmk please correct me if something is wrong in the above procedure. By the way - we should either document it or make it more convenient to generate Cargo.toml.msrv (a script maybe?).

@Anfid
Copy link
Contributor Author

Anfid commented Nov 10, 2023

Done. It seems that a lot of dependencies can be updated without breaking MSRV, but I decided to avoid being intrusive with changes unrelated to the purpose of this PR and pinned all versions to whatever was in the previous lockfile

@Anfid
Copy link
Contributor Author

Anfid commented Nov 10, 2023

Oh, my bad, I'll fix it shortly

@Anfid Anfid force-pushed the time-serialization branch from 1a54a84 to 1ad8d44 Compare November 10, 2023 07:03
@piodul piodul merged commit 6ed30ab into scylladb:main Nov 10, 2023
9 checks passed
piodul pushed a commit that referenced this pull request Nov 10, 2023
Adds support of 'time' and 'chrono' types for CQL queries.

Changelist:
- Rework scylla-cql 'Time', 'Date' and 'Timestamp' types to align with
  CQL data representation;
- Add 'Cql'- prefix to builtin date and time types;
- Implement conversion traits between chrono, time and Cql- types;
- Implement CQL 'Value' trait for 'chrono' and 'time' types;
- Implement 'FromCqlVal' trait for 'chrono' and 'time' types.
- Encapsulate CQL types in a wrapper

Closes #745
@piodul
Copy link
Collaborator

piodul commented Nov 10, 2023

(Forgot to squash before merging previously, I dropped the merge commit and pushed a squashed version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Time crate serialization support
3 participants