Consider improvements for passing values to queries #463

cvybhu · 2022-07-01T13:01:09Z

There have been some ideas and discussions about improving the way values are passed to queries.
The purpose of this issue is to gather all of them in one place and decide what to do.

Unsigned values

As described in #409, our driver doesn't support passing unsigned integer values like u64 in a query.
This could be solved by implementing the Value trait for them, but we aren't sure if that's the best way to go about it.

When given an u64 value the driver should probably cast it to i64 and send it to the database, but it isn't exactly clear what should happen if the u64 value doesn't fit in i64.

We could throw a runtime error, but that could be a hidden trap for users who happily passed an u64 thinking nothing can go wrong.
Or we could send it as a BigDecimal (as one of users on Slack suggested) but that's way less intuitive.

Another problem is that this makes our API even less strongly typed - someone might pass an u64 by mistake and think that the compiler wouldn't allow anything like that.

New interface for `Value`?

It might be a good idea to extend the interface of Value trait.
Currently it looks like this:

pub trait Value {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig>;
}

We could add a field to specify the target column type:

pub trait Value {
    fn serialize(&self, buf: &mut Vec<u8>, target: &ColumnType) -> Result<(), ValueTooBig>;
}

This way serialize would know what kind of column it's serializing for.
Given a big u64 value it could decide to output either i64 or BigDecimal depending on the target column type.
Additionally this would help avoid mistakes with passing incorrectly sized values, e.g passing i32 instead of i64.

Should user pass column types along with values?

To make the new interface possible we would have to get the type of columns from somewhere.

One way would be to just force the user to specify types along with the values. That would make the API even more strongly typed, but harder and more verbose to use.

Another way would be to use metadata returned from the database after preparing a statement. PreparedMetadata seems to contain the information we need. For unprepared queries we could just prepare them under the hood, they aren't performance sensitive anyway.

AFAIR I think python driver does some local type checking, maybe they have figured out how to do this already and we could replicate their solution.

Tasks

Give feedback

The text was updated successfully, but these errors were encountered:

psarna · 2022-07-01T15:17:00Z

New interface for Value?

Or, a new trait.

I also definitely agree that we should use prepared statement metadata if available - additional validation is one of the reasons for its existence in the protocol. We should probably make the option opt-in or opt-out, to allow performance-sensitive workloads to skip additional runtime checks.

piodul · 2023-08-24T12:59:39Z

I sat and pondered on this task for a little while, and tried to specify it a bit more and split into subtasks. I see it like this:

Getting rid of the Session::query problem

For unprepared statements, we don't have a way to determine column names and types of the bind markers. Obtaining this info requires parsing the statement and knowing the current schema - which we could theoretically do on the driver, but it's a huge effort and I'm not sure cannot be done without introducing races. In the case of prepared statements, this information is available because it is computed by the DB and returned in the response to the prepare request.

I see two ways out:

Before sending an unprepared statement, quietly prepare it first to obtain the information about the bind markers (it's only necessary to prepare on one node). This increases latency of unprepared queries twofold, but using unprepared queries is a performance antipattern anyway. We could provide an optimization: if the list of values to be bound is empty, we just send it as an unprepared query.
Remove the values argument from the Session::query and only allow for queries without bind markers. This is simpler to implement than 1. If the user wanted to send an unprepared query with some bind markers, now they will have to prepare it beforehand - latency-wise it will be the same as for 1., though 1. does not have to prepare the statement on every node which would be a win if somebody wants to execute the query only once.

Add new serialization traits

Let's use the convention introduced in the not-yet-done deserialization refactor and let's name the new traits SerializeCql and SerializeRow:

pub trait SerializeRow {
    fn serialize(&self, ctx: &RowSerializationContext<'_>, out: &mut impl std::io::Write) -> Result<(), std::io::Error>;
}

pub trait SerializeCql {
    fn serialize(&self, typ: &CqlType, out: &mut impl std::io::Write) -> Result<(), std::io::Error>;
}

The RowSerializationContext should contain information about the bind markers: column names and their types. Later, we may add more stuff there.

As this change will affect all Session::query and Session::execute calls which should be ubiquitous in the user codebase, we should make it as easy as possible to migrate to the new API, preferably step by step.

Current serialization API is heavily based on SerializedValues type which is an untyped container for serialized values. Session::{execute,query} receive the argument list as an impl ValueList type - which is just a trait that allows to convert the type to SerializedValues. In general, we should implement SerializeRow for each type that implements ValueList, which also includes SerializedValues.

Calls like session.execute(&prepared, (1, 2, 3)) should work automatically out of the box as we will implement SerializeRow for types that implement SerializedValues, and SerializeCql for types that implement Value. This should be the most common case.
In a generic context like session.execute(&prepared, generic_list) where generic_list: impl ValueList, the piece of code should be adjusted so that generic_list.serialized()? is called instead - which will convert to SerializedValues which does implement the new SerializedRow.
For the case of user impls of ValueList and Value, we can provide macros (not necessarily procedural) that generate a SerializedRow/SerializedCql implementation based on the existing ValueList/Value.

It will be necessary to update some examples and documentation, but I don't expect it to be much work compared to the deserialization refactor.

Macros for the new traits

The new SerializeRow and SerializeCql traits will need to have their corresponding procedural macros implemented. Those macros should match struct fields to columns/UDT fields by name. This should be done by default, but there should be an attribute that allows disabling it.

@cvybhu @havaker @wprzytula @Lorak-mmk thoughts? If there are no objections I will create sub-tasks.

If we go with way 2. in "Getting rid of the Session::query problem" (which I prefer) then we can probably work on it in parallel with the others. "Macros for the new traits" should be done after "Add new serialization traits".

wprzytula · 2023-08-24T13:09:45Z

We could provide an optimization: if the list of values to be bound is empty, we just send it as an unprepared query.

This is a great idea, if only we settle on implementing 1st solution. I am also for the 2nd, though.

for types that implement SerializedValues

You most probably meant ValueList.

My thoughts are mostly positive. Let's do it!

Lorak-mmk · 2023-08-24T13:11:23Z

One possible problem with 2nd solution: it may encourage people to perform textual parameter substitution as they can't use query parameters - leding to security vulnerabilities (noSQL injection)

piodul · 2023-08-25T15:52:30Z

One possible problem with 2nd solution: it may encourage people to perform textual parameter substitution as they can't use query parameters - leding to security vulnerabilities (noSQL injection)

Hmm, right. Maybe the 1st option is better after all. Another quite convincing argument in its favor that I can see right now is that it will just break less of the existing code.

piodul · 2023-08-25T16:35:30Z

I've created the sub-tasks. If there are some subtask-specific issues to discuss, then let's do it on the subtasks.

Lorak-mmk · 2024-01-03T01:13:20Z

@piodul Can we close this now, or is there more to implement?

piodul · 2024-01-03T07:42:43Z

To be fair, I think we've reduced the scope of the issue a little while we were working on 0.11 release. We did all that was necessary to ensure type safety, but the issue actually starts by describing a different problem and improving type-safety was a side effect: namely, it suggests making it possible to do things like pass integers of one size into the database column of other size and let the driver handle down/upcasting in a safe way.

We already have #409, so I think we can reuse that issue for the sake of the discussion whether we should implement support for such casts at all (personally I'm not a big fan of the idea).

I'll close this issue.

psarna mentioned this issue Sep 14, 2022

When will this crate be out of beta? #552

Open

cvybhu mentioned this issue Oct 10, 2022

PreparedMetadata is not updated during repreparation #575

Open

Gor027 mentioned this issue Oct 19, 2022

Add extensive information about the driver in README scylladb/cpp-rust-driver#81

Merged

5 tasks

piodul added the API-breaking This might introduce incompatible API changes label Mar 28, 2023

piodul added this to the 1.0.0 milestone Mar 28, 2023

piodul mentioned this issue May 31, 2023

Driver does not support f64 #732

Closed

piodul mentioned this issue Jul 13, 2023

Add a test for wrong/missing named bind markers in prepared statement #759

Closed

piodul mentioned this issue Aug 22, 2023

How to read column with names that conflict with rust reserved keywords? #794

Closed

This was referenced Aug 25, 2023

Serialization refactor: obtain information about column types when doing Session::execute #800

Closed

Serialization refactor: add new serialization traits #801

Closed

Serialization refactor: macros for the new traits #802

Closed

piodul mentioned this issue Aug 29, 2023

"Truncated frame" when insersing data using Batch #798

Closed

piodul mentioned this issue Sep 20, 2023

Improve calculation for SerializedValues::with_capacity #809

Merged

8 tasks

This was referenced Oct 10, 2023

Is std::mem::size_for_val the recommended way to calculate the total allocation for SerializedValues? #826

Closed

Fix SerializedValues::with_capacity Jasperav/Catalytic#11

Open

piodul modified the milestones: 1.0.0, 0.11.0 Nov 3, 2023

Lorak-mmk self-assigned this Nov 15, 2023

piodul mentioned this issue Dec 4, 2023

Prepared statement with IN(?) does not work #866

Closed

piodul closed this as completed Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider improvements for passing values to queries #463

Consider improvements for passing values to queries #463

cvybhu commented Jul 1, 2022 •

edited by piodul

Loading

Tasks

psarna commented Jul 1, 2022 •

edited

Loading

piodul commented Aug 24, 2023

wprzytula commented Aug 24, 2023

Lorak-mmk commented Aug 24, 2023

piodul commented Aug 25, 2023

piodul commented Aug 25, 2023

Lorak-mmk commented Jan 3, 2024

piodul commented Jan 3, 2024

Consider improvements for passing values to queries #463

Consider improvements for passing values to queries #463

Comments

cvybhu commented Jul 1, 2022 • edited by piodul Loading

Unsigned values

New interface for Value?

Should user pass column types along with values?

Tasks

psarna commented Jul 1, 2022 • edited Loading

piodul commented Aug 24, 2023

wprzytula commented Aug 24, 2023

Lorak-mmk commented Aug 24, 2023

piodul commented Aug 25, 2023

piodul commented Aug 25, 2023

Lorak-mmk commented Jan 3, 2024

piodul commented Jan 3, 2024

cvybhu commented Jul 1, 2022 •

edited by piodul

Loading

New interface for `Value`?

psarna commented Jul 1, 2022 •

edited

Loading