-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping Blob/Clob to ByteBuffer/String is not supported #32
Comments
I think this is an edge case. Sure it's great that very large BLOBs can be handled as well by both R2DBC and JDBC, but BLOBs are mostly also useful to store images or PDFs and the likes (KBs, not GBs), in case of which the blocking nature of the buffering can be ignored, just like for strings. The distinction of Frankly, if I had to store GBs, I'm not convinced I'd store them in the database. A file system seems just as good. I think the tradeoff here should be in favour of usability. It's already very hard to correctly stream lobs with JDBC (while correctly managing resources), and I doubt most people are doing it (let alone doing it right, e.g. I've met only few people who are even aware of I'm saying this from an adoption perspective. If users have to write so much boilerplate code every time a small-ish We're now discussing the case of a single Blob in a single column. What if we have a result set with 10 Blobs/Clobs? A simple SQL select will turn into a 100 line mess of streaming infrastructure boilerplate logic.
It's great that the edge cases are kept in mind, no doubt. But let's also keep in mind that the maximum Thinking of all the things that can go wrong when an inexperienced developer deserialises a stream of
Well, I'll tell you what will happen :) Everyone will implement blocking ByteBuffer calls nonetheless just like in the old days when ojdbc didn't support reading jOOQ, for example, doesn't distinguish much between
I'm happy with that suggestion from a usability perspective (including the exceptions if > 2GB), can't comment much on the blocking scenario. |
A major insight for me is that LOBs are rarely larger than 1GB. I confirmed this with several members of the Oracle JDBC Team as well. If its uncommon for LOBs to be larger than 1GB, then we don't need to be as concerned about errors arising unexpectedly when LOBs exceed the capacity of ByteBuffer or String. Although it can happen, it will not be a common occurrence. When programmers are dealing with multi-GB LOBs, I think we can assume they are aware of that, and they know to use an appropriate mapping for this case: Blob/Clob rather than ByteBuffer/String. So the only concern left is the blocking database call. If the LOB prefetch size is large enough, then this concern can be alleviated as well. If we say that 1GB is the maximum amount of prefetched data, this will be large enough to avoid a blocking call when dealing with LOBs of typical sizes. A call to Row.get(..., ByteBuffer.class) can return a prefetched ByteBuffer without having to perform additional database calls. So with a default prefetch size of 1GB, and allowing smaller non-default sizes to be configured as well, I think we end with a satisfactory solution. By default, there's no unexpected errors or blocking database calls when a LOB value of a typical size is mapped to ByteBuffer/String. I'll plan to start implementing this new behavior next week, but additional feedback is always welcome. Thanks to @lukaseder, @Douglas-Surber, @Kuassim for looking into this with me. |
Not supporting BLOB/CLOB to ByteBuffer/String mapping is a known limitation of Oracle R2DBC. Only io.r2dbc.spi.Blob/Clob mapping is supported. This is noted in the documentation, but the reasoning is not explained in depth. So I’d like to share the full extent of my thoughts on this, and see if we can get to a better solution.
When a query returns a LOB value, Oracle Database returns a locator for that value. For the purpose of this discussion, we can think of the locator as a pointer to the actual value. To get the actual value into a buffer, a database driver makes another database call requesting to read the value which a locator points to.
Although the LOB read requires a database call, Oracle R2DBC could still support the ByteBuffer mapping without Row.get(int/String, ByteBuffer.class) having to make a blocking database call. Before emitting the Row to the mapping function, the driver could execute non-blocking database calls to read the LOB content into a buffer. After the content had been buffered, then the Row can be input to the mapping function and the ByteBuffer would be ready to go.
Of course, if the LOB exceeded 2GB, then it would not fit into a ByteBuffer and the driver would need to handle that. But we can ignore this case for the moment, as it doesn't completely prevent Oracle R2DBC from supporting the ByteBuffer mapping.
So, pre-buffering the LOB content is one option to consider. However, this approach seems to devalue the case where user code wants to use io.r2dbc.spi.Blob. Rather than have Blob.stream() respond to backpressure from a Subscriber, the stream() implementation has decided to allocate memory for the entire content of the LOB before a Subscriber has even subscribed.
On the other hand, if Oracle R2DBC only supports the io.r2dbc.spi.Blob mapping, and user code want to map that into a ByteBuffer, it still has the freedom to implement that mapping. If the user code knows that the BLOB value won’t exhaust memory, or exceed the 2GB ByteBuffer capacity, then it can map the BLOB into a ByteBuffer like this:
As shown above, the ability for user code to implement a Blob to ByteBuffer mapping is what ultimately lead to the decision for Oracle R2DBC to only support Blob mapping. With Blob, user code still has the option to map it into a ByteBuffer if wants to, but user code can also choose to process the Blob as a stream of smaller buffers if it wants to do that instead.
So far, we’ve only considered solutions that represent two extremes. Either: A) Buffer everything, or B) Buffer nothing. Option C might look like this:
I find the solution described above to be problematic because the cases where errors and blocking database calls occur seem like like pitfalls that are easy to miss. It seems too likely that a system would be verified by tests that miss the case where a LOB exceeds 2GB, and then fail in production when the >2GB case occurs. And for blocking database calls, that’s really hard to detect unless you have something like Java Mission control to measure socket read time.
Although having to implement a ByteBuffer mapping with something like the reduce operator above puts some burden on user code, it seemed like a better alternative than to introduce the pitfalls I’ve described.
Of course, it would be excellent if Oracle R2DBC could support the ByteBuffer mapping. I’m happy to discuss new solutions with anyone that wants to explore this further.
The text was updated successfully, but these errors were encountered: