Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support jiff as a date time lib backend #3487

Open
tisonkun opened this issue Sep 5, 2024 · 23 comments
Open

Support jiff as a date time lib backend #3487

tisonkun opened this issue Sep 5, 2024 · 23 comments
Labels
enhancement New feature or request

Comments

@tisonkun
Copy link
Contributor

tisonkun commented Sep 5, 2024

As in https://docs.rs/sqlx/0.8.2/sqlx/postgres/types/index.html for chrono and time, being integrated with jiff.

Since the abstractions are similar, it should be viable.

cc @BurntSushi

I may take a closer look later but I can't commit it :P

@tisonkun tisonkun added the enhancement New feature or request label Sep 5, 2024
@abonander
Copy link
Collaborator

I have no strong opinions on jiff yet but I think I'd like to give it some time to mature. It's still in its first release cycle.

See also this comment: BurntSushi/jiff#50 (comment)

@tisonkun

This comment was marked as outdated.

@tisonkun

This comment was marked as outdated.

@tisonkun
Copy link
Contributor Author

For reference, here is a wrapper that can be used to bridge jiff's Timestamp to Postgres' TIMESTAMPTZ:

use std::str::FromStr;

use jiff::SignedDuration;
use serde::Deserialize;
use serde::Serialize;
use sqlx::encode::IsNull;
use sqlx::error::BoxDynError;
use sqlx::postgres::types::Oid;
use sqlx::postgres::PgArgumentBuffer;
use sqlx::postgres::PgHasArrayType;
use sqlx::postgres::PgTypeInfo;
use sqlx::postgres::PgValueFormat;
use sqlx::Database;
use sqlx::Decode;
use sqlx::Encode;
use sqlx::Postgres;
use sqlx::Type;

/// A module for Jiff support of SQLx.

// TODO(tisonkun): either switch to the upstream [1] or spawn a dedicate open-source crate.
// [1] https://github.com/launchbadge/sqlx/pull/3511
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct Timestamp(pub jiff::Timestamp);

impl Type<Postgres> for Timestamp {
    fn type_info() -> PgTypeInfo {
        // 1184 => PgType::Timestamptz
        PgTypeInfo::with_oid(Oid(1184))
    }
}

impl PgHasArrayType for Timestamp {
    fn array_type_info() -> PgTypeInfo {
        // 1185 => PgType::TimestamptzArray
        PgTypeInfo::with_oid(Oid(1185))
    }
}

impl Encode<'_, Postgres> for Timestamp {
    fn encode_by_ref(&self, buf: &mut PgArgumentBuffer) -> Result<IsNull, BoxDynError> {
        // TIMESTAMP is encoded as the microseconds since the epoch
        let micros = self
            .0
            .duration_since(postgres_epoch_timestamp())
            .as_micros();
        let micros = i64::try_from(micros)
            .map_err(|_| format!("Timestamp {} out of range for Postgres: {micros}", self.0))?;
        Encode::<Postgres>::encode(micros, buf)
    }

    fn size_hint(&self) -> usize {
        size_of::<i64>()
    }
}

impl<'r> Decode<'r, Postgres> for Timestamp {
    fn decode(value: <Postgres as Database>::ValueRef<'r>) -> Result<Self, BoxDynError> {
        Ok(match value.format() {
            PgValueFormat::Binary => {
                // TIMESTAMP is encoded as the microseconds since the epoch
                let us = Decode::<Postgres>::decode(value)?;
                let ts = postgres_epoch_timestamp().checked_add(SignedDuration::from_micros(us))?;
                Timestamp(ts)
            }
            PgValueFormat::Text => {
                let s = value.as_str()?;
                let ts = jiff::Timestamp::from_str(s)?;
                Timestamp(ts)
            }
        })
    }
}

fn postgres_epoch_timestamp() -> jiff::Timestamp {
    jiff::Timestamp::from_str("2000-01-01T00:00:00Z")
        .expect("2000-01-01T00:00:00Z is a valid timestamp")
}

For full integration if we are back to here and continue, sfackler/rust-postgres#1164 is a good reference for data type mapping and edge cases (error handling).

@bionicles
Copy link

bionicles commented Feb 4, 2025

bump, it would be truly awesome if jiff zoned datetimes / timestamps could "just work" with postgres using sqlx ... i'm trying the sqlx chrono feature and somehow my 2025 timestamptz teleported all the way back to 1970!

@BurntSushi
Copy link

Yeah I'm working on it now. Note that it will require going through wrapper types, but that's the best we can do I think.

@abonander
Copy link
Collaborator

I'm tossing around the idea of adding a couple new traits, roughly like so:

pub trait DecodeAs<DB: Database, T> {
    type Error: Error + Send + Sync + 'static;

    fn decode_as(row: DB::Row) -> Result<T, Self::Error>;
}

pub trait EncodeAs<DB: Database, T> {
    type Error: Error + Send + Sync + 'static;

    fn encode_as<'a>(value: &T, args: DB::Arguments<'a>) -> Result<T, Self::Error>
}

impl<DB: Database, T> DecodeAs<DB, T> for T { /* ... */ }

impl<DB: Database, T> EncodeAs<DB, T> for T { /* ... */ }

// jiff-sqlx

pub struct ZonedAsTimestampTz;

impl DecodeAs<Postgres, Zoned> for ZonedAsTimestampTz { /* ... */ }

impl EncodeAs<Postgres, Zoned> for ZonedAsTimestampTz { /* ... */ }

For the macros, the user would specify ZonedAsTimestampTz as the override but they would emit code invoking the EncodeAs/DecodeAs impls.

With #3383, the user could just add global overrides (copy-pastable from jiff-sqlx's README) and this would be almost entirely transparent.

For non-macro usage, the FromRow derive could gain a control attribute, e.g. #[sqlx(as = "ZonedAsTimestampTz")], and the query builders would gain a bind_as() method (although that might be confused with named bind parameters).


However, that may not add much over what's currently supported.

query_as! already emits .into() calls when decoding, so as long as the wrapper types implement Into and the user utilizes global overrides from #3383, this would mostly work without any changes to the trait system.

EncodeAs plus global overrides would potentially improve typechecking of bind parameters, however that only applies to Postgres anyway.

And we can add a FromRow attribute regardless.

@BurntSushi
Copy link

Yeah the wrapper types will have bi-directional From impls. I don't think I have enough context to say much about DecodeAs or EncodeAs though.

Also, one thing to clarify here is that I'm not sure there will be a Zoned impl for timestamptz. The latter is really just a timestamp. No time zone. For Zoned to work, I think it would need to be for a regular text field encoded as a human readable RFC 9557 timestamp.

@abonander
Copy link
Collaborator

timestamptz is a little more complicated than that, because while it may be stored and transmitted in the binary protocol as UTC, in the text encoding it may include a UTC offset, and users occasionally want to preserve that.

@BurntSushi
Copy link

@abonander Hmmm. That seems quite fraught! How does the choice between binary and plain text get made? It seems like it would be very easy to shoot yourself in the foot if you just used jiff::Zoned for that.

I'll probably have to add something like this:

struct TimestampWithOffset {
    pub timestamp: jiff::Timestamp,
    pub offset: jiff::tz::Offset,
}

to the jiff-sqlx crate to support that use case in a way that doesn't involve huge footguns.

@BurntSushi
Copy link

The sqlx chrono integration here might be buggy. See:

PgValueFormat::Text => {
let s = value.as_str()?;
DateTime::parse_from_str(
s,
if s.contains('+') || s.contains('-') {
// Contains a time-zone specifier
// This is given for timestamptz for some reason
// Postgres already guarantees this to always be UTC
"%Y-%m-%d %H:%M:%S%.f%#z"
} else {
"%Y-%m-%d %H:%M:%S%.f"
},
)?
}

But chrono::DateTime::parse_from_str requires a "time zone" (just an offset in this case). Namely, all assertions pass here:

fn main() -> anyhow::Result<()> {
    let dt = parse("2025-02-20 17:00:00-05:00")?;
    assert_eq!(dt.to_string(), "2025-02-20 17:00:00 -05:00");

    // If you try to pass in something without an offset,
    // then parsing will fail.
    assert!(parse("2025-02-20 17:00:00").is_err());

    Ok(())
}

fn parse(s: &str) -> anyhow::Result<chrono::DateTime<chrono::FixedOffset>> {
    Ok(chrono::DateTime::parse_from_str(
        s,
        if s.contains('+') || s.contains('-') {
            // Contains a time-zone specifier
            // This is given for timestamptz for some reason
            // Postgres already guarantees this to always be UTC
            "%Y-%m-%d %H:%M:%S%.f%#z"
        } else {
            "%Y-%m-%d %H:%M:%S%.f"
        },
    )?)
}

I'm not sure how exactly to test this with PostgreSQL. (I've had a helluva time finding authoritative documentation on the actual format of types transmitted over the wire.)

@abonander
Copy link
Collaborator

Why is the use of Zoned an issue? Can it not convert to UTC when encoding? Yes, it loses the timezone information, but it likely would anyways.

The big problem is that timestamp encoding is very different between databases.

Postgres

I was mainly talking about Postgres before, which has both a binary protocol and a text protocol for every type.

The text protocol is mainly used when executing an non-prepared statement, i.e. the Simple Query flow. Values in rows returned by a simple query are always text-encoded, so we need to support parsing timestamps in a text format, but because simple queries don't support bind parameters, we don't need to worry about encoding them.

Text-encoded timestamps are returned with a timezone offset specified as a connection-level setting. SQLx sets this to UTC by default for consistency, but many users set this to a different timezone so date/time handling in SQL is in local time instead (certain functions rely on this, such as extract()), and then text-encoded timestamps are returned with that offset explicitly.

When preparing a statement for execution (Extended Query flow), you can specify for bind parameters and result columns whether you prefer text or binary encoding. For simplicity and efficiency, SQLx always specifies binary encoding.

Binary-encoded timestamps are always transmitted, stored, and returned in UTC, but they don't use the Unix epoch. They're specified as the number of microseconds (signed) from January 1, 2000, which you have to actually read the source code to find out (or learn this from reading someone else's implementation, like Diesel's): https://github.com/postgres/postgres/blob/306dc520b9dfd6014613961962a89940a431a069/src/include/datatype/timestamp.h#L235

For timestamp [without time zone], the encoding is identical, except Postgres ignores any time zone offset and stores and returns the timestamp as-is. Text-encoded timestamps are returned without an offset. For this, users should use an un-zoned type like chrono::NaiveDateTime.

MySQL

MySQL timestamps always use a binary protocol, but are encoded as date and time: https://dev.mysql.com/doc/dev/mysql-server/8.4.3/page_protocol_binary_resultset.html

The timezone in this case is like Postgres, a connection-level setting. It's not included in the encoding at all, so we default to UTC and assume this everywhere. Users also like to override this setting, so this is very much a "proceed at your own risk" situation. We just trust them to choose a date/time type that doesn't assume UTC.

SQLite

SQLite has no special date/time encoding, but it can handle ISO-8601. Timestamps are encoded and returned as text, with or without a timezone offset and are not modified in-flight (Z is inferred if no offset is specified): https://www.sqlite.org/lang_datefunc.html#time_values

@BurntSushi
Copy link

Thank you for all this information! I really appreciate it. This will increase the chances I get this right the first time. :-)

Why is the use of Zoned an issue? Can it not convert to UTC when encoding? Yes, it loses the timezone information, but it likely would anyways.

Oh it absolutely could. It's just a major footgun. Let's say you have 2024-03-09T17:30:00-05:00[America/New_York] as a Zoned value. Then you think, "oh cool, jiff-sqlx has support for this! I'll just drop it in there." But when you get it back, it's very likely going to be 2025-03-09T22:30:00+00:00[+00:00]. Oops, the time zone info is dropped. In the best case, you get an offset back, so it would be 2025-03-09T22:30:00-05:00[-05:00]. Better, but then you go to add 1 day to it, and bam, you get a bug:

use jiff::{ToSpan, Zoned};

fn main() -> anyhow::Result<()> {
    let zdt: Zoned = "2024-03-09T17:30:00-05:00[-05:00]".parse()?;
    let next_day = zdt.checked_add(1.day())?;
    assert_eq!(next_day.to_string(), "2024-03-10T17:30:00-05:00[-05:00]");

    let actual_next_day: Zoned =
        "2024-03-10T17:30:00[America/New_York]".parse()?;
    assert_eq!(
        actual_next_day.to_string(),
        "2024-03-10T17:30:00-04:00[America/New_York]",
    );
    Ok(())
}

Notice how the offset is wrong. So you're an hour off because your arithmetic didn't take DST into account. And that happened because your Zoned value got silently and lossily transmitted into PostgreSQL. Here's what it should look like:

use jiff::{ToSpan, Zoned};

fn main() -> anyhow::Result<()> {
    let zdt: Zoned = "2024-03-09T17:30:00-05:00[America/New_York]".parse()?;
    assert_eq!(
        zdt.checked_add(1.day())?.to_string(),
        "2024-03-10T17:30:00-04:00[America/New_York]",
    );
    Ok(())
}

But if you stored a Zoned as an RFC 9557 timestamp (the strings above with the brackets), then the time zone is retained and no data gets lost. The downside is that these aren't PostgreSQL values. They're just TEXT fields I guess. So it's sub-optimal in that respect, but I guess if you want to use PostgreSQL's native timestamps, then you should just use jiff::Timestamp. (Or the jiff::TimestampWithOffset type I mentioned above.)

(I haven't read the rest of your comment. Have to go make dinner.)

@abonander
Copy link
Collaborator

So you're an hour off because your arithmetic didn't take DST into account. And that happened because your Zoned value got silently and lossily transmitted into PostgreSQL.

Well, to be fair the actual bug seems to be that assuming adding 1 day is exactly equivalent to adding 24 hours in all circumstances. Or conversely, assuming that a timestamp with a UTC-5 offset should be equivalent to the same time in New York on March 10th. I'm not really sure how that's Postgres's fault, to be honest...

But yeah, this is the reason why I personally prefer to just handle everything as UTC. IMHO, time zones are the frontend's problem (converting pure data to something meaningful to the user). All this messy human stuff I just don't want to have to think about.

My gut feeling is that users probably won't want to deviate from the native types, because sometimes you have to do time-aware stuff in SQL and having to convert to/from a specific text encoding is going to be annoying.

@BurntSushi
Copy link

BurntSushi commented Feb 5, 2025

RE PostgreSQL format:

Thank you, that is incredibly useful information. I think I can use that to test the text encoding paths. I figured out most of the format details by reading source code (including in sqlx). I was mostly just venting about there not being official upstream docs for it. It definitely gives a feeling of "this isn't guaranteed and things can break between versions," even if, as I assume is the case, that isn't true in practice.

Well, to be fair the actual bug seems to be that assuming adding 1 day is exactly equivalent to adding 24 hours in all circumstances. Or conversely, assuming that a timestamp with a UTC-5 offset should be equivalent to the same time in New York on March 10th.

I think we're missing each other here. The problem is not any sort of assumption around how fixed offset datetimes behave, but rather, the implicit and silent dropping of data. A Zoned is a timestamp and a time zone. So when you go to serialize it and the time zone gets lost, that could be potentially surprising behavior! The assumption that fails here is not anything about datetime arithmetic, because you can absolutely rely on Jiff to do proper DST-aware arithmetic on Zoned values with the appropriate time zone. The assumption that fails here is that the time zone gets lost when you store your Zoned in your database.

I'm not really sure how that's Postgres's fault, to be honest...

Let's try to reframe this discussion. I'm not trying to blame anyone here. I mean, I have a pretty negative opinion about how PostgreSQL (and I guess more generally, SQL itself) handles datetimes. Particularly the names. But the point in question here is whether a Zoned should use PostgreSQL's TIMESTAMP WITH TIME ZONE type. I think it shouldn't be because it leads to footguns, as explained above. So if I made Zoned use TIMESTAMP WITH TIME ZONE, then that would be my fault.

But yeah, this is the reason why I personally prefer to just handle everything as UTC. IMHO, time zones are the frontend's problem (converting pure data to something meaningful to the user). All this messy human stuff I just don't want to have to think about.

Right! If you can use UTC, and you can for many things, then you absolutely should. And PostgreSQL's types are generally okay for that use case. And if you can use UTC, then you just use jiff::Timestamp. That's it. But if you're using Zoned, then you're specifically opting into "messy human stuff," then dropping the time zone silently is bad juju.

My gut feeling is that users probably won't want to deviate from the native types, because sometimes you have to do time-aware stuff in SQL and having to convert to/from a specific text encoding is going to be annoying.

Yup! I agree. Here's my order of preference:

1: Provide integration with jiff::Zoned that uses a TEXT type with RFC 9557 as the format.
2: Don't provide integration with jiff::Zoned.
9999999: Provide integration with jiff::Zoned in a way that silently drops the time zone.

I think it's reasonable to pick 1 or 2, but I don't think the last option is reasonable. It might be reasonable in the context of a specific application that is knowingly dropping data, but it's wildly inappropriate to provide as a general purpose wrapper as a paved path for jiff::Zoned integration with PostgreSQL.

A major point in the design of jiff, particularly over chrono, is that Jiff losslessly transmits zoned datetimes.

I do prefer to take the conservative approach here, so perhaps I'll start with option (2) and wait for real user feedback. Once I have an initial version of jiff-sqlx written, I'll share it here. Feedback will be very welcome. :-)

@tisonkun
Copy link
Contributor Author

tisonkun commented Feb 6, 2025

The sqlx chrono integration here might be buggy. See:

Yeah. I report it at #703 (comment) but it seems missed.

@tisonkun
Copy link
Contributor Author

tisonkun commented Feb 6, 2025

2: Don't provide integration with jiff::Zoned.

I'd support this option and this is how we're using it so far: store the timestamp and externally manage the timezone. As you noticed above, at least the Postgres data model doesn't match Zoned's design well.

@BurntSushi
Copy link

2: Don't provide integration with jiff::Zoned.

I'd support this option and this is how we're using it so far: store the timestamp and externally manage the timezone. As you noticed above, at least the Postgres data model doesn't match Zoned's design well.

Yeah. That really only works for times in the past though.

@abonander
Copy link
Collaborator

The sqlx chrono integration here might be buggy. See:

Yeah. I report it at #703 (comment) but it seems missed.

A PR is always welcome.

@BurntSushi
Copy link

@abonander How does one implement Type<Sqlite> for a datetime type outside of SQLx?

I see, for example, the chrono impl for NaiveDate:

impl Type<Sqlite> for NaiveDate {
fn type_info() -> SqliteTypeInfo {
SqliteTypeInfo(DataType::Date)
}
fn compatible(ty: &SqliteTypeInfo) -> bool {
matches!(ty.0, DataType::Date | DataType::Text)
}
}

But it's using internal APIs to build a SqliteTypeInfo. And I can't see any public APIs to build a SqliteTypeInfo other than using an existing impl to create it. For example: https://github.com/hasali19/camino/blob/056ae71c1173424e68985e9ada4009ff4e666d0f/src/sqlx_impls.rs#L19

But there aren't any existing impls for the date types, other than for the chrono and time features.

@BurntSushi
Copy link

And like, doing this sort of parsing based on the type outside of SQLx also seems tricky?

fn decode_datetime(value: SqliteValueRef<'_>) -> Result<DateTime<FixedOffset>, BoxDynError> {
let dt = match value.type_info().0 {
DataType::Text => decode_datetime_from_text(value.text()?),
DataType::Int4 | DataType::Integer => decode_datetime_from_int(value.int64()),
DataType::Float => decode_datetime_from_float(value.double()),
_ => None,
};
if let Some(dt) = dt {
Ok(dt)
} else {
Err(format!("invalid datetime: {}", value.text()?).into())
}
}

I think I could use TypeInfo::name, but it seems unfortunate to need to do string matching when the crate-internal impls can use a simple enum.

@BurntSushi
Copy link

And I guess I have a similar problem with MySQL:

impl Type<MySql> for NaiveDate {
fn type_info() -> MySqlTypeInfo {
MySqlTypeInfo::binary(ColumnType::Date)
}
}

It kinda looks like PostgreSQL is somewhat of the odd one out, as it but not SQLite or MySQL has constructors for building type info: https://docs.rs/sqlx/latest/sqlx/postgres/struct.PgTypeInfo.html

BurntSushi added a commit to BurntSushi/jiff that referenced this issue Feb 7, 2025
This PR adds a new `jiff-sqlx` crate. It defines wrapper types for
`Timestamp`, `DateTime`, `Date`, `Time` and `Span`. For each wrapper
type, the SQLx encoding traits are implemented. (Except, with `Span`,
only the decoding trait is implemented.)

This is similar to #141, but organizes things a bit differently. This
also comes with SQLite support. MySQL support is missing since it seems,
at present, to require exposing APIs in SQLx for a correct
implementation.

This initial implementation also omits `Zoned` entirely. I've left a
comment in the source code explaining why. The quick summary is that, at
least for PostgreSQL, I don't see a way to provide support for it
without either silently losing data (the time zone) or just storing it
as an RFC 9557 timestamp in a `TEXT` field. The downside of the latter
is that it doesn't use PostgreSQL native datetime types. (Becuase we
can't. Because PostgreSQL doesn't support storing anything other than
civil time and timestamps with respect to its datetime types.) I do
personally lean toward just using RFC 9557 as a `TEXT` type, but I'd
like to collect real use cases first to make sure that's the right way
to go.

Ref #50, Closes #141
Ref launchbadge/sqlx#3487
@BurntSushi
Copy link

I have a PR up here to add jiff-sqlx to the Jiff repo: BurntSushi/jiff#240

I'd love for feedback if anyone feels up to it. But otherwise I'm happy to iterate on it after a release too. I'm sure I won't get it right the first time.

BurntSushi added a commit to BurntSushi/jiff that referenced this issue Feb 7, 2025
This PR adds a new `jiff-sqlx` crate. It defines wrapper types for
`Timestamp`, `DateTime`, `Date`, `Time` and `Span`. For each wrapper
type, the SQLx encoding traits are implemented. (Except, with `Span`,
only the decoding trait is implemented.)

This is similar to #141, but organizes things a bit differently. This
also comes with SQLite support. MySQL support is missing since it seems,
at present, to require exposing APIs in SQLx for a correct
implementation.

This initial implementation also omits `Zoned` entirely. I've left a
comment in the source code explaining why. The quick summary is that, at
least for PostgreSQL, I don't see a way to provide support for it
without either silently losing data (the time zone) or just storing it
as an RFC 9557 timestamp in a `TEXT` field. The downside of the latter
is that it doesn't use PostgreSQL native datetime types. (Becuase we
can't. Because PostgreSQL doesn't support storing anything other than
civil time and timestamps with respect to its datetime types.) I do
personally lean toward just using RFC 9557 as a `TEXT` type, but I'd
like to collect real use cases first to make sure that's the right way
to go.

Ref #50, Closes #141
Ref launchbadge/sqlx#3487
BurntSushi added a commit to BurntSushi/jiff that referenced this issue Feb 7, 2025
This PR adds a new `jiff-sqlx` crate. It defines wrapper types for
`Timestamp`, `DateTime`, `Date`, `Time` and `Span`. For each wrapper
type, the SQLx encoding traits are implemented. (Except, with `Span`,
only the decoding trait is implemented.)

This is similar to #141, but organizes things a bit differently. This
also comes with SQLite support. MySQL support is missing since it seems,
at present, to require exposing APIs in SQLx for a correct
implementation.

This initial implementation also omits `Zoned` entirely. I've left a
comment in the source code explaining why. The quick summary is that, at
least for PostgreSQL, I don't see a way to provide support for it
without either silently losing data (the time zone) or just storing it
as an RFC 9557 timestamp in a `TEXT` field. The downside of the latter
is that it doesn't use PostgreSQL native datetime types. (Becuase we
can't. Because PostgreSQL doesn't support storing anything other than
civil time and timestamps with respect to its datetime types.) I do
personally lean toward just using RFC 9557 as a `TEXT` type, but I'd
like to collect real use cases first to make sure that's the right way
to go.

Ref #50, Closes #141
Ref launchbadge/sqlx#3487
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants