Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider improvements for passing values to queries #463

Closed
3 tasks done
cvybhu opened this issue Jul 1, 2022 · 8 comments
Closed
3 tasks done

Consider improvements for passing values to queries #463

cvybhu opened this issue Jul 1, 2022 · 8 comments
Assignees
Labels
API-breaking This might introduce incompatible API changes
Milestone

Comments

@cvybhu
Copy link
Contributor

cvybhu commented Jul 1, 2022

There have been some ideas and discussions about improving the way values are passed to queries.
The purpose of this issue is to gather all of them in one place and decide what to do.

Unsigned values

As described in #409, our driver doesn't support passing unsigned integer values like u64 in a query.
This could be solved by implementing the Value trait for them, but we aren't sure if that's the best way to go about it.

When given an u64 value the driver should probably cast it to i64 and send it to the database, but it isn't exactly clear what should happen if the u64 value doesn't fit in i64.

  • We could throw a runtime error, but that could be a hidden trap for users who happily passed an u64 thinking nothing can go wrong.
  • Or we could send it as a BigDecimal (as one of users on Slack suggested) but that's way less intuitive.

Another problem is that this makes our API even less strongly typed - someone might pass an u64 by mistake and think that the compiler wouldn't allow anything like that.

New interface for Value?

It might be a good idea to extend the interface of Value trait.
Currently it looks like this:

pub trait Value {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig>;
}

We could add a field to specify the target column type:

pub trait Value {
    fn serialize(&self, buf: &mut Vec<u8>, target: &ColumnType) -> Result<(), ValueTooBig>;
}

This way serialize would know what kind of column it's serializing for.
Given a big u64 value it could decide to output either i64 or BigDecimal depending on the target column type.
Additionally this would help avoid mistakes with passing incorrectly sized values, e.g passing i32 instead of i64.

Should user pass column types along with values?

To make the new interface possible we would have to get the type of columns from somewhere.

One way would be to just force the user to specify types along with the values. That would make the API even more strongly typed, but harder and more verbose to use.

Another way would be to use metadata returned from the database after preparing a statement. PreparedMetadata seems to contain the information we need. For unprepared queries we could just prepare them under the hood, they aren't performance sensitive anyway.

AFAIR I think python driver does some local type checking, maybe they have figured out how to do this already and we could replicate their solution.

Tasks

Preview Give feedback
  1. sylwiaszunejko
  2. 2 of 2
    Lorak-mmk piodul
  3. area/proc-macros
    piodul
@psarna
Copy link
Contributor

psarna commented Jul 1, 2022

New interface for Value?

Or, a new trait.

I also definitely agree that we should use prepared statement metadata if available - additional validation is one of the reasons for its existence in the protocol. We should probably make the option opt-in or opt-out, to allow performance-sensitive workloads to skip additional runtime checks.

@piodul
Copy link
Collaborator

piodul commented Aug 24, 2023

I sat and pondered on this task for a little while, and tried to specify it a bit more and split into subtasks. I see it like this:

Getting rid of the Session::query problem

For unprepared statements, we don't have a way to determine column names and types of the bind markers. Obtaining this info requires parsing the statement and knowing the current schema - which we could theoretically do on the driver, but it's a huge effort and I'm not sure cannot be done without introducing races. In the case of prepared statements, this information is available because it is computed by the DB and returned in the response to the prepare request.

I see two ways out:

  1. Before sending an unprepared statement, quietly prepare it first to obtain the information about the bind markers (it's only necessary to prepare on one node). This increases latency of unprepared queries twofold, but using unprepared queries is a performance antipattern anyway. We could provide an optimization: if the list of values to be bound is empty, we just send it as an unprepared query.
  2. Remove the values argument from the Session::query and only allow for queries without bind markers. This is simpler to implement than 1. If the user wanted to send an unprepared query with some bind markers, now they will have to prepare it beforehand - latency-wise it will be the same as for 1., though 1. does not have to prepare the statement on every node which would be a win if somebody wants to execute the query only once.

Add new serialization traits

Let's use the convention introduced in the not-yet-done deserialization refactor and let's name the new traits SerializeCql and SerializeRow:

pub trait SerializeRow {
    fn serialize(&self, ctx: &RowSerializationContext<'_>, out: &mut impl std::io::Write) -> Result<(), std::io::Error>;
}

pub trait SerializeCql {
    fn serialize(&self, typ: &CqlType, out: &mut impl std::io::Write) -> Result<(), std::io::Error>;
}

The RowSerializationContext should contain information about the bind markers: column names and their types. Later, we may add more stuff there.

As this change will affect all Session::query and Session::execute calls which should be ubiquitous in the user codebase, we should make it as easy as possible to migrate to the new API, preferably step by step.

Current serialization API is heavily based on SerializedValues type which is an untyped container for serialized values. Session::{execute,query} receive the argument list as an impl ValueList type - which is just a trait that allows to convert the type to SerializedValues. In general, we should implement SerializeRow for each type that implements ValueList, which also includes SerializedValues.

  • Calls like session.execute(&prepared, (1, 2, 3)) should work automatically out of the box as we will implement SerializeRow for types that implement SerializedValues, and SerializeCql for types that implement Value. This should be the most common case.
  • In a generic context like session.execute(&prepared, generic_list) where generic_list: impl ValueList, the piece of code should be adjusted so that generic_list.serialized()? is called instead - which will convert to SerializedValues which does implement the new SerializedRow.
  • For the case of user impls of ValueList and Value, we can provide macros (not necessarily procedural) that generate a SerializedRow/SerializedCql implementation based on the existing ValueList/Value.

It will be necessary to update some examples and documentation, but I don't expect it to be much work compared to the deserialization refactor.

Macros for the new traits

The new SerializeRow and SerializeCql traits will need to have their corresponding procedural macros implemented. Those macros should match struct fields to columns/UDT fields by name. This should be done by default, but there should be an attribute that allows disabling it.


@cvybhu @havaker @wprzytula @Lorak-mmk thoughts? If there are no objections I will create sub-tasks.

If we go with way 2. in "Getting rid of the Session::query problem" (which I prefer) then we can probably work on it in parallel with the others. "Macros for the new traits" should be done after "Add new serialization traits".

@wprzytula
Copy link
Collaborator

We could provide an optimization: if the list of values to be bound is empty, we just send it as an unprepared query.

This is a great idea, if only we settle on implementing 1st solution. I am also for the 2nd, though.

for types that implement SerializedValues

You most probably meant ValueList.

My thoughts are mostly positive. Let's do it!

@Lorak-mmk
Copy link
Collaborator

One possible problem with 2nd solution: it may encourage people to perform textual parameter substitution as they can't use query parameters - leding to security vulnerabilities (noSQL injection)

@piodul
Copy link
Collaborator

piodul commented Aug 25, 2023

One possible problem with 2nd solution: it may encourage people to perform textual parameter substitution as they can't use query parameters - leding to security vulnerabilities (noSQL injection)

Hmm, right. Maybe the 1st option is better after all. Another quite convincing argument in its favor that I can see right now is that it will just break less of the existing code.

@piodul
Copy link
Collaborator

piodul commented Aug 25, 2023

I've created the sub-tasks. If there are some subtask-specific issues to discuss, then let's do it on the subtasks.

@Lorak-mmk
Copy link
Collaborator

@piodul Can we close this now, or is there more to implement?

@piodul
Copy link
Collaborator

piodul commented Jan 3, 2024

To be fair, I think we've reduced the scope of the issue a little while we were working on 0.11 release. We did all that was necessary to ensure type safety, but the issue actually starts by describing a different problem and improving type-safety was a side effect: namely, it suggests making it possible to do things like pass integers of one size into the database column of other size and let the driver handle down/upcasting in a safe way.

We already have #409, so I think we can reuse that issue for the sake of the discussion whether we should implement support for such casts at all (personally I'm not a big fan of the idea).

I'll close this issue.

@piodul piodul closed this as completed Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes
Projects
None yet
Development

No branches or pull requests

5 participants