-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kinesis batch publisher with retries #408
Conversation
Travis reports 4 compile errors when building with Scala 2.11 - might be good to look into? |
Fixed :) |
@aserrallerios Today I had to create something very similar to be used in my pipeline, unfortunately, only the https://gist.github.com/thiagoandrade6/4fb8684bb2409995f781151c42e349ac It is completely based on the |
Or instead you can review the PR and ask for changes if you think something is missing/wrong :) Your Flow lacks parallelism and retries, both implemented in this PR. This PR also includes throttling and other configuration parameters. |
Don't worry, I just like the way But indeed, my flow is just simple enough to run in my application today, not really production ready for everybody, I just wanted to show because I couldn't wait for your PR to be approved :) |
Yeah, but adding features to a stage makes it more complex and not quite as comparable. If you cannot wait, please review :), also @raboof and @areyouspiffy feedback would be great. I'll try to add the missing tests/docs as soon as possible. |
|
||
import scala.util.control.NoStackTrace | ||
|
||
object KinesisFlowErrors { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we renamed this KinesisErrors
I could also put KinesisSourceErrors
in here as well, but it's up to you if you think they're better off in separate files. Same with settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
About the settings, I'd rather have them separated, it could be a mess mixing all those companion objects, validations and constructors. And think that we may have more Kinesis Sources/Flows in the future with different settings.
👍 Good work! |
I think it's feature-complete, ready to be reviewed. Feel free to ask for changes or new stuff. |
+1 for this, good work. |
} | ||
|
||
private def checkForCompletion() = | ||
if (inFlight <= 0 && pendingRequests.isEmpty && waitingRetries.isEmpty && isClosed(in)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should inFlight
ever be able to become < 0
? Perhaps log a warning then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it cannot happen. I'll change it to == 0
.
docs/src/main/paradox/kinesis.md
Outdated
@@ -60,10 +60,10 @@ Java | |||
The `KinesisSource` creates one `GraphStage` per shard. Reading from a shard requires an instance of `ShardSettings`. | |||
|
|||
Scala | |||
: @@snip (../../../../kinesis/src/test/scala/akka/stream/alpakka/kinesis/scaladsl/Examples.scala) { #settings } | |||
: @@snip (../../../../kinesis/src/test/scala/akka/stream/alpakka/kinesis/scaladsl/Examples.scala) { #source-settings } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add the root of the project as a variable like we do for akka, but that's for another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, including tests and documentation. Haven't worked with Kinesis myself though so can't really comment on specifics.
91c06a0
to
4440968
Compare
Fixed-up commits. |
@aserrallerios I took the liberty of reviewing your code, and it looks great, in general there is not much difference from what I was doing, but there is something I would not do, and its the retries, I do not think it should be a responsibility of the flow. You even put in some nice things like batching and throttling, So Im not sure would be a good decision to make retries part of the flow. Other than that I would just add a KinesisSink to complete the whole picture like in SQS |
Thnx for the review @eduardojld. I'll add a About the retires, note that the code doesn't retry any "internal" or "unknown" server error, just partial errors i.e. records that failed to publish among other records that were successfully published. For example, if you PUT 500 records, the first 100 may succeed and the remaining 400 may fail. We have found that handling this partials errors is mandatory if you really need to get your PUT throughput near the available throughput limit (depending on the number of shards), as they are quite common. Removing the retries logic makes it nearly unusable if high throughput is required. An alternative to internal retries would be to publish failed records downstream, forcing the user to decide what to do with failed records. But honestly I cannot think in any other scenario than the user wanting to feed the same "publish records flow" again, to retry those records, producing a cycle in the graph. I'd rather do it internally unless the community find strong arguments for it, as it would increase the complexity of the end-user graphs (with graph cycles!). You can always disable retries with settings parameter :D |
4440968
to
89ca54c
Compare
Added sink version. |
Guys, more reviews, anything before accepting it? |
It looks like there's some conflicts w/ your branch @aserrallerios - Those likely have to be resolved prior to a merge. |
89ca54c
to
912fb2c
Compare
Thanks @etspaceman 😅 Cherry-picked |
Hi @aserrallerios, the merge of #535, which changed the formatting of many files including KinesisSourceErrors, is now causing a conflict since you have renamed that file to KinesisErrors. Do you need any help with this? I am keen on getting this for the next release :) |
912fb2c
to
9a667f8
Compare
Rebased & formatted, thanks @nvrs! |
Hi @raboof , do you think this needs more reviewers before accepting the PR? |
Guys, I've been testing with Graphs that self-restart on failures, and I have one question: do the initialization code of the If the answer is that only the code on |
It didn't harm, anyway, so I've done it: change initialization of Stage's variables & some minor stuff to improve code clearness in the Stage. |
3629d6d
to
218f7ff
Compare
218f7ff
to
79c2654
Compare
This adds a Kinesis publisher with batching and retries. It does not ensure publish order (due to batching and retries; I plan to add more publishers that ensure total ordering or ordering by partition key).
More tests and documentation will be added and the stage logic may be refactored/simplified.
Feedback will be much appreciated!