-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: database/sql: add methods to scan an entire row into one value #61637
Comments
Maybe we should be more strict about struct name matching to prevent misspelling errors. Unfortunately that interferes with the ability to use the same struct for different partial views of its table. |
Maybe a snake_case match would be better as a default. Also most projects i have seen in use already use 'db' Key in struct tags instead of 'sql'. If this could use the same there would be less work to convert existing projects. Edit: meant snake_case (camelCase columns would already match with the case-insentive match) |
@doggedOwl |
Case insensitivity might be problematic. If you have a database collation set to case sensitive, I believe SQL Server and MySQL will consider column names with two casings to be different columns. This is not all databases, and not all usages of the databases it affects. Also, you can already alias each field in your select statement, or you can use tags as an exact match. To be specific. I don't think trying to be clever about case insensitivity would be a good idea for databases that act different from each other. |
Another benefit of "sql" as the tag is matching the package it comes from. That seems both concise and intuitive. |
Personally, I'd far prefer scanning into a struct to scan the columns into fields in order - or at least having the option to. I don't particularly like the mechanism of using struct tags to control unmarshaling anyways and generally prefer my exported types to be tag-free. This would also allow scanning unnamed columns into structs. Even if the iterator is not part of the proposal, given that it's part of the rationale, I'd also mention that I find the semantics of returning an error as an iteration value a bit confusing. It is not enforced that iteration stops if an error is returned. Arguably, that's a good thing - it might be, that a specific row can't be scanned into a value, but for some use cases, it might still make sense to iterate over the rest of the query. On the other hand, the ambiguity will potentially have people write incorrect code, if they assume one semantic or the other. |
@Merovius Isn't that rather error-prone? If you change the order of fields in the select, but forget in the struct (which might be defined a ways away), it'll break.
Out of interest, how do you unmarshal JSON with this approach? Or do you unmarshal into a non-exported struct with tags, and then copy to an exported struct without tags? In terms of the proposal itself, I really like it -- I've wanted this for many years (and have used sqlx too). Code like @jba For structs, will embedded fields be supported, like they are in sqlx? What will |
@benhoyt I don't know if it's error-prone. I don't necessarily "buy" theoretical arguments about error-proneness. But AIUI, As for how I unmarshal JSON without tags: I declare an |
If we're going to add |
@Merovius Unless they are doing fine-grained memory optimization, people think of structs as unordered sets of fields. As does the language: re-ordering fields is never a breaking change. As for iterators, there doesn't seem to be any appetite for error handling over on #61405, so I think having |
@benhoyt Embedded fields will be supported. The implementation will use reflect.VisibleFields. Personally, I think ScanRow should fail if there are ambiguous column names, but that's open for discussion. |
@jba Not to derail this, but: Re-ordering fields is (sometimes) a breaking change. It affects convertibility. And FWIW, the language also considers order for unkeyed literals (which the Go 1 compatibility promise makes an exception for) and comparisons. |
@derekperkins I would implement
|
@Merovius good point about re-ordering fields. I will have to revisit apidiff.... |
@jba I can't tell from your response, are you opposed to adding a |
@derekperkins I am opposed because I think it's a one-liner given |
While I was unaware of |
FWIW I don't think that if we have iterators, collecting all rows into a slice will be the primary use. I think it might be today, but mainly because slices are the most convenient way to return ordered collections. |
@Merovius I'm aware of your disagreement due to our discussion in the removal of keys and values from the map package, but I think it bears referencing again with acknowledgment that you are "technically" correct: Iterators being better in many cases doesn't make them preferred. There are countless places in code in languages with iterators where people choose to use a list-like structure when performance doesn't matter simply because it's simpler for a lot of people to think about. For instance, people need to worry if an interrater is repeatable. They need to worry about it they are iterating in multiple places and actually should have dropped it into a list/slice already because they are now actually being less efficient, and since most computing these days (outside libraries) is written in boring IT departments writing code without a lot of load, I'm guessing performance isn't a big issue. |
My thought exactly. For the vast majority of CRUD style / db to api transformations, there's not going to be an db rows iterator => json encoder => http.ResponseWriter chain, even if that might be the most efficient. That spans multiple logical tiers, and makes it harder to integrate with logging / error reporting toolchains. IMO, the general expectation is a reasonable number of bounded rows that will just get put into a slice, though obviously that's just my projection. |
My experience in Python was that iterators were slower than using lists, at least for small lists. I don’t know if that experience would follow over in Go. It’s worth benchmarking to get a rough sense for it. It might also have different variability, which can be more important than pure performance per se. |
Shouldn’t this be “are left alone”? If they already have a nonzero value, they should keep it, unless there are explicit values for them. |
@willfaught That was deliberate. Leaving the existing state sounds error-prone to me. Especially since you may use the same struct for various queries that only select some of the fields. What's a good use case for leaving the existing values alone? |
Should this be configurable? Personally I would prefer this to error, because that probably means there was a typo in a field mapping. Your suggestion makes sense in the context of
This might be another place for configuration, as both options have their place. Reasons I can think of to leave them alone:
It's not clear from the json docs, but are all fields zeroed out during unmarshaling? |
@jba are there plans to expose this API somewhere like |
@PaluMacil, I changed the proposal to make field matching case-sensitive. If anyone disagrees, please address the arguments in the above comment. |
@mvdan, we obviously can't add a ScanRow method, but it should be possible to write it as a function. I'll give that a shot. Stay tuned. |
@PaluMacil I would like to understand the case-sensitivity concern better. Can you elaborate on the problematic scenario you envision here? IIUC, you're worried that a database will have two columns with names that differ only in their case, and then this new method won't know how to do the scan? The way the |
If that's consistent with serialization for JSON, then I think that seems like a great approach. I would expect this to be an extremely uncommon issue |
I think the only reason for the preference would be if there were two fields or columns that differed only by case. That seems wildly unlikely for DB columns. I would prefer being case-insensitive by default, and requiring an exact match for names specified with a field tag. |
I've been thinking about adding options to this API, as @derekperkins suggested above. In go 1.10, the So let's anticipate that by changing the signature to
At this point everyone says "I hate passing
The one option I'm thinking about would be the same as for
We could also add an option for unmatched struct fields. But I'm less sure about that because we've seen that there are many valid uses for having additional fields, and if the field is unmatched because of a typo, |
I just pushed a change to my table package that allows querying into struct slices and converting table buffers into struct slices: https://github.com/golang-sql/table/blob/master/struct.go Still a few TODOs. One thing to note, it may be better to opt into disallowing unknown struct fields globally, and make it a practice, if you want to just ignore a few fields, to use a tag such as |
If the options are important than maybe introducing a RowScanner and attaching options to it would be better than specifing options on each call of ScanRow. I can't think of any scenario where the options would change from one ScanRow to the next while looping through all the rows so that extra parameter creates too much noise for the value it provides at that point. That said I'm not convinced of this particular need at all. especially if the default is to disallow unknown fields it will force people into more occasions for errors in typing. Instead of "Select * " they will need to write a long list of exact field matches that is in fact more probable to contain errors. |
I'd support setting this at the |
I don't think so. Think of the list of columns from the query and the struct as two distinct tables. When joining together, there may be unused column from the query, and unused fields in the struct. We don't actually care about unused columns from the query: If you query |
Requiring struct fields not in the returned column set to be annotated, assumes that a struct is only fetched from a database in only one way and that's not how we personally have been doing it with |
I feel exactly the opposite, and anything we can do to disincentivize |
Disincetivising "Select *" if that is important to you can be done on your linter. No need to add Api complexity for that. |
Sorry that my tongue in cheek remark didn't come across. This isn't about query patterns or whether you use
There's no obvious "correct" behavior in the absence of an exact match of fields to destinations. Some people will want an error for one or both cases, and some people will want to silently succeed, like the default json configuration. Today, both error in stdlib usage because you have to manually map fields to destinations, and only exact matches are allowed. The real question is what is the right api design to enable those choices to be made by users, and what the default should be. Probably the most expected default is to choose whatever happens in |
I like the sound of defaults being consistent with
|
I would say that what is needed is a relatively simple interface for iterating over column values for each row. In fact, If direct iteration over raw column values was supported, advanced use cases will be trivial to implement in the third party libraries. pgx library has an additional method on the |
I think if users have advanced needs, they should just use Scan. |
On modern cloud, databases and network are very fast. But on the Go side, the driver and wrapers take their sweet time doing a dozen allocations to simply forward a "select" result set to user. :-) For a "server" language, Go SQL facilities are surprisingly limited. |
This I fully agree with. In order to get values out of Scan today, I have to do this: I do agree that the
Either way, a []any slice can be obtained from the Scan (see link), so this would be an optimization. |
This comment was marked as off-topic.
This comment was marked as off-topic.
For what it's worth, here is a cut-down version of |
@jba would we wakeup this proposal for 1.23 ? |
Sure. |
Could this proposal be added to the list of active proposals for review? It's well thought out and has lots of upvotes. |
Link to "what would a Go 2 version of database/sql look like?" #22697 |
@jba the iterator-based model you showed as example looks really similar to what I put together in this package a few months ago https://github.com/achille-roussel/sqlrange In particular, the I'm just sharing here because it seemed relevant to the discussion. Thanks for driving this proposal! |
Edited: struct field names are matched to columns case-sensitively.
Edited: untouched storage is ignored rather than zeroed.
I propose adding the method
to the
Row
andRows
types.ScanRow
attempts to populatedest
with all the columns of the current row.dest
can be a pointer to an array, slice, map or struct.Motivation
ScanRow
makes it more convenient to populate a row from a struct. Evidence that this is a desirable feature comes from the github.com/jmoiron/sqlx package, which has 9,800 importers, about 1,000 forks and about 14,000 GitHub stars. (To be fair,sqlx
provides several other enhancements todatabase/sql
as well.) TheScanRow
method bringsdatabase/sql
closer to parity withencoding/json
, whose ability to unmarshal into structs and other data types is widely used.Another motivation comes from the likely addition of iterators to the language. Without something like
ScanRow
, an iterator over a DB query would have to return aRow
orRows
, since theScan
method is variadic. An iterator like that still improves on using aRows
directly because it makes error-handling more explicit and always callsClose
. But we could do better withScanRow
, because the iterator could deliver a single value holding the entire row:This proposal doesn't include that iterator; I show it merely for motivation.
Details
ScanRow
acts as if each part of its argument were passed toScan
.If the value is an array pointer, successive array elements are scanned from the corresponding columns. Excess columns are dropped. Excess array elements are left untouched.
If the value is a slice pointer, the slice is resized to the number of columns and slice elements are scanned from the corresponding columns.
If the value is a map pointer, the underlying key type must be
string
. For each column, a map entry is created whose key is the column name and whose value is scanned from the column. Unnamed columns are assigned unique keys of the form_%d
for integers 0, 1, .... Other entries in the map are left untouched.If the value is a struct pointer, its exported visible fields are matched by name to the column names and the field values are scanned from the corresponding columns. The name matching is done as follows:
Unassigned columns are dropped. Unassigned fields are left untouched.
The text was updated successfully, but these errors were encountered: