Batching / Request Buffering feature review #148

andrewdodd · 2016-02-17T08:37:49Z

Hi all (sorry for the length),

I think there are some key features that need to be discussed before a few of the issues / PRs can be resolved. These are:

Issue #99 - OutOfMemoryError
Issue #107 - Data retention options
Issue #118 - BatchProcessor stop execution
Issue #126 - Allow to write lines directly (as strings) -> Related due to design implications
Issue #138 - Enable point batching by default
Issue #143 - Batch schedule stops working after connection issue (dup of #118)

PR #108 - Data retention enhancement (by @andrewdodd)
PR #119 - Batch processor keeps data points queued on failed write (by @PaulWeakley)
PR #137 - Allow to write lines directly to the database (by @slomek) -> Related due to design implications
PR #144 - BatchProcessor exception handling (by @mmatloka)
PR #146 - Add support for async requests (by @TopShelfSolutions) -> Related due to design implications
PR #147 - Put should always be async (by @jazdw) -> Related due to design implications

All of these issues are to do with (or impact the future of) the 'automatic collection & batch sending of points' feature.

My summary of the key issues:

The collection of single-write points into 'batches' by the batch processor (when enabled) allows a significant increase in the performance of writing to InfluxDB. This seems to be a good thing, people like it.
Currently the batch processor is susceptible to failures. The scheduler might die and/or it might lose data.
The solutions provided at BatchProcessor exception handling so that scheduler is not cancelled after connection error #144 and Batch processor keeps data points queued on failed write #119 are sub-optimal. BatchProcessor exception handling so that scheduler is not cancelled after connection error #144 silently loses data, and Batch processor keeps data points queued on failed write #119 buffers until OutOfMemory.

For me, the questions that need to be resolved are:

Should this library be able to perform 'auto-batching' of singly-written points at all?
If yes, should the library have the ability to let the user choose what to do when an error occurs?

Option 1 - Yes, it should allow buffering
Although it is not the most beautiful thing in the world, I believe that my PR #108 is the only solution here that allows the user to decide:

Buffer until a limit or buffer until system resources are exhausted
In the face of a limit, discard the newest/oldest/throw an exception

Unfortunately, it has the following issues:

It is unclear to me how it should work in conjunction with an 'async' request, such as those provided in Add support for asynchronous requests #146
- Should it buffer the request? Should it send right away? Should it use a different connection if the current one is busy, or should it wait?
It is unclear how the feature in PR Allow to write lines directly to the database #137 should work in conjunction with the batching behaviour.
- Should the 'direct' write use its own connection? Should the batch processor parse the 'direct' writes and buffer them like other writes (thus defeating the purpose of the change).

Option 2 - No, let's forget this buffering thing
This could potentially make a lot of these issues go away. However, it would really just push this (very common) issue back onto the users of the library. This option means:

The interaction between async writes and the batch processor vanishes
The interaction between 'direct' writes and the batch processor vanishes
The batch processor can be removed
Potentially a 'containing' class could be created that uses the async calls to provide a batch-like, non-async interface?

My opinion
I think the batch processor is a good feature. I believe this because the process of 'buffering and sending' to InfluxDB seems to be a pretty common usage pattern. The batch processor feature allows the 'write' operations in my applications to be simple (i.e. I can treat everything as a single write); but it allows the improved write performance of a batched write; and it guarantees my data won't be lost in an unexpected way (with PR #108).

I would love to hear what other people using this library have to say on the issue.

mmatloka · 2016-02-17T08:58:48Z

I would propose to start with #144 so that processor can still send data to influx after some temporary failure and then go for buffering with limit discarding newest/oldest/throwing exception.

tagliola · 2016-02-17T09:51:16Z

I would opt for buffering, without it I would discard the whole library and just do a String concat myself.

Losing data is a bad thing, as the data is often not easily or at all reproducible. My expectation would be that it should be able to survive at least an InfluxDB restart or a few network glitches. Initial step might be (don't know if that happens right now) is to retry a pending post a few times, as this would have minimal memory overhead as the request is already created. IIRC this is available in OkHttp?

Next step would be to add buffering, up to a limit. Only after this is breached, it could discard entries according to a user specified strategy, e.g. oldest/newest/random/throw. Make it possible to extend the strategy to have a callback on discard, so if people want to add disk buffering or have a local disk dump as a last resort backup, they can.

So in short, yes for buffering and 1. prevent OOM and 2. prevent loss of data at all cost, in that order.

andrewdodd · 2016-02-18T08:48:59Z

@mmatloka I certainly agree that #144 is the quickest / simplest change. Any chance you could have a go with the changes in PR #108 to see if they do what you need?

andrewdodd · 2016-02-23T09:44:13Z

@mmatloka Any luck?

anthonywebb · 2017-05-09T14:34:05Z

Sad we still dont have buffering, losing data sucks.

andrewdodd · 2018-02-16T04:35:48Z

It has been ages since I posted this issue. We solved this in our application. I tend to think that there is not likely to be a simple way to solve all of these issues in the client library (as the solution usually depends on the use case). I am closing the issue.

andrewdodd mentioned this issue Feb 17, 2016

BatchProcessor exception handling so that scheduler is not cancelled after connection error #144

Merged

andrewdodd mentioned this issue Feb 24, 2016

Using Point.Builder is memory intensive - support plain line protocol insertion #149

Closed

andrewdodd mentioned this issue Mar 3, 2016

The library does not throw an exception when database is down #152

Closed

majst01 mentioned this issue Feb 27, 2017

Batching Enhancements #289

Open

jganoff mentioned this issue Feb 28, 2017

Proposal: Separate high performance point writer #294

Open

andrewdodd closed this as completed Feb 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching / Request Buffering feature review #148

Batching / Request Buffering feature review #148

andrewdodd commented Feb 17, 2016

mmatloka commented Feb 17, 2016

tagliola commented Feb 17, 2016

andrewdodd commented Feb 18, 2016

andrewdodd commented Feb 23, 2016

anthonywebb commented May 9, 2017

andrewdodd commented Feb 16, 2018

Batching / Request Buffering feature review #148

Batching / Request Buffering feature review #148

Comments

andrewdodd commented Feb 17, 2016

mmatloka commented Feb 17, 2016

tagliola commented Feb 17, 2016

andrewdodd commented Feb 18, 2016

andrewdodd commented Feb 23, 2016

anthonywebb commented May 9, 2017

andrewdodd commented Feb 16, 2018