-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming runtime-sized row with stream_from #357
Comments
There's no option to treat empty strings as nulls, but you can stream null values. One way to get null string values in a stream is to use (A C-style About the second part: yes, if you want to stream values into an object using |
I am writing a simple function which uploads generic CSV file into a generic table (thus table size is not known at compile time, though the table schema is). I would like to migrate without refactoring the whole application. In order to solve the escaping issue, an ability to write |
Shouldn't be too hard to add a member function in |
I assume the data will be coming in escaped form, as it is returned by |
The stream would un-escape the strings. So parsing them wouldn't be very hard. |
Lets you read rows of data from a `stream_from` without knowing at compile time how many columns the stream has. See #357.
@georgthegreat could you have a look at the latest |
I will try to take a look and return in a couple of days. |
It looks like Upon changing
to
I am getting the following exception
I have added some debug printing to check if the table exists.
says
|
Schema qualified names should be supported by |
The following ctor works fine:
Now my code fails upon attempt to convert empty string to int64_t. How should I handle |
The documentation for I think the problem with the schema name happens because the constructor quotes and escapes your table name, and the schema and table parts each need their own pair of quotes. To handle this properly I'm going to need a new constructor with a "schema" parameter. This is a royal pain given how many overloads there already are. It all gets easier with C++20, but even then, the old constructors will still be in the way. I hope the language will give us keyword arguments soon! What I can do for now is to document that pitfall. |
C++ explicitly forbids creation of std::string_view with
So, this way of handling NULL values does not allow to distinct between NULL value of text column and empty string. Such limitation is OK in C++ (where null string and empty string are considered to be the same), but it will not work in SQL. |
Also, something is wrong with the classic pattern:
While my test data count only 2 rows, this loop is executed at least three times according to debug printing. It looks like the reader changes its state upon invocation of So, finally I was able to fix all the issues. Howerer, I can not call new API very convenient, as it adds at least two sources of errors. |
Changing read_row API to return while (auto maybeRow = stream.read_row()) {
//inside the loop maybeRow is never null
for (auto field: *maybeRow) {
//do something with field;
}
} |
When the data pointer is null, the size is 0. I figured that was obvious enough to be left implied, exactly for this reason, but I can make it more explicit if you think it's not clear.
I don't see what you mean here... A null string and an empty string are not considered the same thing, whether in C++ or in SQL. |
How do you implement empty non-null text field in zview? |
An empty field has a pointer pointing to a zero byte, yes. A default-constructed zview has a null data pointerj. So it would correspond to a null string, not an empty one. |
The Here's a different way we could slice it:
|
The different way would cause problems if Any ideas about solving the same problem for |
This is indeed contradictory. As I see, empty() method was not overriden, so both NULL-representing zview and empty zview would report as being empty(). |
This will be a common pattern with coroutines in C++20 as it'll be needed for asynchronous generators. It'll just be very slightly different: while (auto maybeRow = co_await stream.read_how()) { /* ... */ } |
@jtv, could we finish the work on this issue? At the time I am stuck in the middle of upgrade. I would like to conclude the long list of the comments above and collapse it into the following:
The latter overrides the default std::string_view method.
I can submit the PR for 1., but I will need you help with 2 and 3. |
I'm currently traveling so not much I can do until I get home tonight -
assuming I make it, it's becoming a little iffy. I'll try to do the
`std::optional` then.
I don't see a need for a matching change in `stream_to` - it already
supports dynamic rows, as discussed, and it doesn't need to detect
end-of-stream.
I don't like the changes under 1. To me those belong in the client code,
not in zview. The only purpose of zview is to add that one constraint to
string_view: that if non-null, it be zero-terminated.
To test whether a value is null, you ask whether `v.data() == nullptr`.
_Once you know that it's non-null,_ you can start looking at its contents -
such as by asking whether `v.empty()`. But if the value is null, it's sort
of pointless to query its contents. Certainly calling a null value
nonempty seems no more appropriate than calling it empty.
…On Sat, Aug 15, 2020, 14:38 Yuriy Chernyshov ***@***.***> wrote:
@jtv <https://github.com/jtv>, could we finish the work on this issue? At
the time I am stuck in the middle of upgrade.
I would like to conclude the long list of the comments above and collapse
it into the following:
1. Adapt pqxx:zview by explicitly adding null-specific predicates into
the pubic interface:
bool is_null() const {
return data() == nullptr;
}
bool empty() const {
return (data() != nullptr && size() == 0)
}
The latter overrides the default std::string_view method.
1. Change stream_from interface to return std::optional (empty
optional would be returned upon reading all the data from the stream).
2. Add similar interface to stream_to.
I can submit the PR for 1., but I will need you help with 2 and 3.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#357 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQYDE3L6S7PA4JT7QN6VMDSAZ6V7ANCNFSM4PWXUYQQ>
.
|
As for stream_to, it is completely unclear how to write NULL value into the stream. As far, as I see from the code, there is no text type capable of holding null values. internal/conversions.hxx:425:template<std::size_t N> struct nullness<char[N]> : no_null<char[N]>
internal/conversions.hxx:458:template<> struct nullness<std::string> : no_null<std::string>
internal/conversions.hxx:498:template<> struct nullness<std::string_view> : no_null<std::string_view>
internal/conversions.hxx:525:template<> struct nullness<zview> : no_null<zview>
internal/conversions.hxx:542:template<> struct nullness<std::stringstream> : no_null<std::stringstream> (with zview it looks like a bug again, as the major part of above discussion was about the ability of zview to hold null values).
UPD: was able to make it work with |
One way to produce a null |
You can now use `read_row()` in loops like: while (auto fields = stream.read_row()) process(*fields); See #357.
Okay, I've changed I didn't want to return |
New interface works fine, thanks. I also would like to propose this additional small change: #365 I also have an idea of speeding stream_to up by allowing the user to write individual fields, but this is the subject of |
JFYI, I have executed some benchmarks to compare old streams agains new streams: Was (pqxx6 / tablereader):
Now (pqxx7 / stream_from):
So, new streams is about 30% faster. These numbers should not be considered as a thorough benchmark though. |
Hey, thanks for the benchmark numbers. It may be anecdotal, but it's music to my ears. Looks to me as if the table reader is significantly faster and the parsing is sligggghtly slower — but the difference there is small enough to be random measurement jitter, or the cost of additional overflow checks and such. If you're happy with the new API, I guess we can close this ticket. If any more ideas for improvement come up, feel free of course to open a new ticket. |
Thanks! We shall continue the discussion in further PRs / issues then. |
I am trying to switch my application from pqxx6 to pqxx7.
For the scope of the issue we can assume that my application is uploading CSV file to PostgreSQL and vice versa. I would like to treat empty CSV values as SQL NULLs.
While
pqxx::stream_to
can streamstd::vector<std::string>
right into it, this API does not have any options for writing NULL values (same problem can appear with integers, yet PostgreSQL looks being capable to properly unquote them).Migration to
stream_from
is even more complicated, as it does not support runtime-sized rows (only compile-timestd::tuple
-like classes are supported).Are my assumptions correct? Which migration options do I have?
The text was updated successfully, but these errors were encountered: