Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching / Request Buffering feature review #148

Closed
andrewdodd opened this issue Feb 17, 2016 · 6 comments
Closed

Batching / Request Buffering feature review #148

andrewdodd opened this issue Feb 17, 2016 · 6 comments

Comments

@andrewdodd
Copy link
Contributor

Hi all (sorry for the length),

I think there are some key features that need to be discussed before a few of the issues / PRs can be resolved. These are:

Issue #99 - OutOfMemoryError
Issue #107 - Data retention options
Issue #118 - BatchProcessor stop execution
Issue #126 - Allow to write lines directly (as strings) -> Related due to design implications
Issue #138 - Enable point batching by default
Issue #143 - Batch schedule stops working after connection issue (dup of #118)

PR #108 - Data retention enhancement (by @andrewdodd)
PR #119 - Batch processor keeps data points queued on failed write (by @PaulWeakley)
PR #137 - Allow to write lines directly to the database (by @slomek) -> Related due to design implications
PR #144 - BatchProcessor exception handling (by @mmatloka)
PR #146 - Add support for async requests (by @TopShelfSolutions) -> Related due to design implications
PR #147 - Put should always be async (by @jazdw) -> Related due to design implications

All of these issues are to do with (or impact the future of) the 'automatic collection & batch sending of points' feature.

My summary of the key issues:

For me, the questions that need to be resolved are:

  1. Should this library be able to perform 'auto-batching' of singly-written points at all?
  2. If yes, should the library have the ability to let the user choose what to do when an error occurs?

Option 1 - Yes, it should allow buffering
Although it is not the most beautiful thing in the world, I believe that my PR #108 is the only solution here that allows the user to decide:

  • Buffer until a limit or buffer until system resources are exhausted
  • In the face of a limit, discard the newest/oldest/throw an exception

Unfortunately, it has the following issues:

  • It is unclear to me how it should work in conjunction with an 'async' request, such as those provided in Add support for asynchronous requests #146
    • Should it buffer the request? Should it send right away? Should it use a different connection if the current one is busy, or should it wait?
  • It is unclear how the feature in PR Allow to write lines directly to the database #137 should work in conjunction with the batching behaviour.
    • Should the 'direct' write use its own connection? Should the batch processor parse the 'direct' writes and buffer them like other writes (thus defeating the purpose of the change).

Option 2 - No, let's forget this buffering thing
This could potentially make a lot of these issues go away. However, it would really just push this (very common) issue back onto the users of the library. This option means:

  • The interaction between async writes and the batch processor vanishes
  • The interaction between 'direct' writes and the batch processor vanishes
  • The batch processor can be removed
  • Potentially a 'containing' class could be created that uses the async calls to provide a batch-like, non-async interface?

My opinion
I think the batch processor is a good feature. I believe this because the process of 'buffering and sending' to InfluxDB seems to be a pretty common usage pattern. The batch processor feature allows the 'write' operations in my applications to be simple (i.e. I can treat everything as a single write); but it allows the improved write performance of a batched write; and it guarantees my data won't be lost in an unexpected way (with PR #108).

I would love to hear what other people using this library have to say on the issue.

@mmatloka
Copy link
Contributor

I would propose to start with #144 so that processor can still send data to influx after some temporary failure and then go for buffering with limit discarding newest/oldest/throwing exception.

@tagliola
Copy link

I would opt for buffering, without it I would discard the whole library and just do a String concat myself.

Losing data is a bad thing, as the data is often not easily or at all reproducible. My expectation would be that it should be able to survive at least an InfluxDB restart or a few network glitches. Initial step might be (don't know if that happens right now) is to retry a pending post a few times, as this would have minimal memory overhead as the request is already created. IIRC this is available in OkHttp?

Next step would be to add buffering, up to a limit. Only after this is breached, it could discard entries according to a user specified strategy, e.g. oldest/newest/random/throw. Make it possible to extend the strategy to have a callback on discard, so if people want to add disk buffering or have a local disk dump as a last resort backup, they can.

So in short, yes for buffering and 1. prevent OOM and 2. prevent loss of data at all cost, in that order.

@andrewdodd
Copy link
Contributor Author

@mmatloka I certainly agree that #144 is the quickest / simplest change. Any chance you could have a go with the changes in PR #108 to see if they do what you need?

@andrewdodd
Copy link
Contributor Author

@mmatloka Any luck?

@anthonywebb
Copy link

Sad we still dont have buffering, losing data sucks.

@andrewdodd
Copy link
Contributor Author

It has been ages since I posted this issue. We solved this in our application. I tend to think that there is not likely to be a simple way to solve all of these issues in the client library (as the solution usually depends on the use case). I am closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants