-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
as_array() on varchar-based array_agg() column - and semicolon separator #590
Comments
I think that basically covers it - as the array parser has no idea of the type of the value it's operating on and can't access the type info to get the separator I think the separator needs to be an argument to the parser so it can be configured. If it defaulted to comma then existing code would still work in almost all cases and the odd cases where semicolon was needed would need to change. That would also allow custom types that use other separators to use it by configuring the separator. |
You're right, that's always been a problem waiting to happen... I think at the time I thought that the backend would quote such strings. I also just discovered another thing I thought about array quoting was wrong: I thought there was such a thing as single-quoted array elements. See #587. So we do need to pass a separator, which can default to a comma. But it gets more difficult when an array combines different types, with different separators. I think the only way that can happen though is in a multi-dimensional array where the ultimate elements use semicolon as the separator. Do you agree? If so, we may want a separate provision for multi-dimensional arrays. TBH array handling in libpqxx is still in its infancy. For the existing |
There has been no activity on this ticket. Consider closing it. |
I don't think this should be closed, as it's still a bug that prevents using
Checking pg_type, the only type that use something other than
Of course, other types could added with different delimiters. Notably, PostGIS has types with It's always possible for a user to create custom types with other delimiters, but this is unlikely. CREATE TYPE silly;
create function sillyin(cstring) returns silly language internal IMMUTABLE PARALLEL SAFE STRICT as $function$textin$function$;
create function sillyout(silly) returns cstring language internal IMMUTABLE PARALLEL SAFE STRICT AS $function$textout$function$;
CREATE TYPE silly (INPUT = sillyin, OUTPUT = sillyout, LIKE = text, DELIMITER = 'a');
select ARRAY['b','c']::silly[]; gives |
Absolutely, this ticket should stay open until fixed. Github will take the
ongoing conversation as a hint.
As I mentioned I'm thinking to make the separator character a template
argument, which means that you'll need to know this character at compile
time. Does anyone have a problem with that? (I'll have to move some code
around, but an important part of the encodings puzzle for this just came
together.)
Also, would I be correct in assuming that an array can have at most two
different separator characters: the element type's separator, and in the
multidimensional case, the comma between sub-arrays?
…On Fri, Sep 30, 2022, 07:34 Paul Norman ***@***.***> wrote:
I don't think this should be closed, as it's still a bug that prevents
using text[] arrays where there is no constraint excluding semicolons.
I think the only way that can happen though is in a multi-dimensional
array where the ultimate elements use semicolon as the separator.
Checking pg_type, the only type that use something other than , as a
delimiter is the box type. Can you give an example of another situation
where a ; would be the delimiter?
postgres=# SELECT typname, typdelim FROM pg_type WHERE typdelim <> ',';
typname | typdelim
---------+----------
box | ;
_box | ;
Of course, other types could added with different delimiters. Notably,
PostGIS has types with : as a separator.
It's always possible for a user to create custom types with other
delimiters, but this is unlikely.
CREATE TYPE silly;create function sillyin(cstring) returns silly language internal IMMUTABLE PARALLEL SAFE STRICT as $function$textin$function$;create function sillyout(silly) returns cstring language internal IMMUTABLE PARALLEL SAFE STRICT AS $function$textout$function$;CREATE TYPE silly (INPUT = sillyin, OUTPUT = sillyout, LIKE = text, DELIMITER = 'a');select ARRAY['b','c']::silly[];
gives {bac} as the text representation for an array with two elements.
—
Reply to this email directly, view it on GitHub
<#590 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQYDEY3MYZCNKD4WWOCJFDWAZ3UPANCNFSM55EXEJUA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The delimiter character is always the one defined before the type, including multidimensional arrays. e.g. Making it a template argument prevents one from querying for the separator character from the DB, but this is probably fine. |
Github bot, what are you doing!? I'm still working on this one! |
Maybe try removing the no-issue-activity label? |
Ah, I hadn't noticed the label, thanks. The button to remove it showed up below the conversation (where there's a second "Labels" section. Not great UI work.) |
I've come to the conclusion that I can't fix this properly in So I'm changing |
The `array_parser` was seriously broken (#590): SQL arrays may contain elements that have a semicolon in them... _and the back-end won't put them in quotes._ The parser would always see that as a field separator. Same thing for commas in e.g. the SQL "box" type, which uses the semicolon as its separator but uses commas inside an object. So I'm limiting `array_parser` for use with comma-separated types only, and planning a better, friendlier, faster, more flexible API for parsing arrays. At the same time, I did manage to specialise `array_parser` internally to different encodings, which should make it considerably faster. These changes will also benefit the future array parsing API.
There has been no activity on this ticket. Consider closing it. |
Please remove the no activity label so it doesn't autoclose |
Thanks for the reminder. For the record, there has been activity on the ticket but it's been moving slowly - #609 is a new, very different array parser that converts an SQL array into something like a C++ container. |
Can this be re-opened as its not yet fixed? |
Yes, and I'll merge the fix right away. Which I think will automatically close it again. :-) |
The fix, by the way, is a new I still need to sort out iteration though. It's got indexing, but iteration gets a little complicated for multi-dimensional arrays. |
New array parser: `pqxx::array`. Parses an SQL array (in string form) and keeps the values internally as an array of converted values. This should address #590. Still need to sort out iteration though.
Further to the work on #590. Implementing simple array iteration. And when I say "simple," I really mean the simplest I could do: storage-order iteration, going through all dimensions. The alternative would be to define something like a multidimensional subrange, and iterate over those. In the two-dimensional case, that would be like iterating a `result` to get `row` elements, and then iterating each `row` to get the `field` elements. But would it really be all that useful? It's a lot of code overhead and cognitive load, both on the libpqxx side and on the application side. I don't want all that complexity again — streaming queries have gotten us away from that for query results and it's been wonderful. Plus, in practice, all you may want is to iterate over a one-dimensional array. Or you may just want to visit all elements and not care at all about the order. It may become easier once we have `std::mdspan`. Depending on the conventions that form, we may have a problem on our hands with `size()`: is that going to be the size including all elements, or is it just the size along the outer of the dimensions? The latter seems almost arbitrary. We'll have to cross that bridge when we come to it.
So we're not supposed to be using |
@alexolog I think that's right. The whole thing has flushed out of my mental cache by now but... you can use So perhaps we should just deprecate it. |
We've experienced some issue when using
as_array()
on a varchar-basedarray_agg()
column:(psql_array_to_vector is a simple wrapper method calling the libpqxx provided array parser)
Source:
For our payload
{use_sidepath,secondary,3,1,yes,50,"Rijksweg Noord",asphalt,left|through;right}
, we expected the libpqxx array parser to extract 9 values, rather than 10.As shown in the payload above, the last element in the array (
asphalt,left|through;right
) is both unquoted, and contains a semicolon character.libpqxx seems to treat this semicolon as a separator character, although the underlying datatype is a varchar. We didn't find any way to influence which characters should be treated as a separator character. As Postgresql wouldn't escape a string containing semicolons, it would be good not to consider semicolon as a separator here.
zerebubuth/openstreetmap-cgimap#276 (comment) has a bit more discussion
Relevant query
(based on https://github.com/zerebubuth/openstreetmap-cgimap/blob/master/test/structure.sql)
Downstream issue: zerebubuth/openstreetmap-cgimap#276
Thanks for looking in this!
@tomhughes, @pnorman: please chime in, in case I forgot something.
The text was updated successfully, but these errors were encountered: