Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for naming windowed join topics and other internal topics #178

Open
kidpollo opened this issue Aug 9, 2019 · 3 comments
Open

Support for naming windowed join topics and other internal topics #178

kidpollo opened this issue Aug 9, 2019 · 3 comments

Comments

@kidpollo
Copy link
Contributor

kidpollo commented Aug 9, 2019

The current stable version of Kafka streams (2.3) supports naming some internal topics per KIP's.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-372%3A+Naming+Repartition+Topics+for+Joins+and+Grouping
https://cwiki.apache.org/confluence/display/KAFKA/KIP+230%3A+Name+Windowing+Joins

It seems that it is possible (in 2.3) to name some internal topics now but not windowed join topics. We actually discovered that in trunk it is implemented but not released yet. https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamImpl.java#L1109

Being able to name joins is important for long lived JOIN windows so that changes in the topology dont change the internal topic name and ignore the join history upon topology restart.

In the meant time we are gettin by a custom build 2.2 of kafka with support for Named Joins.
https://github.com/FundingCircle/kafka/pull/5/files#diff-5142e1d4a6410459d6bf6df98828e5afR920-R921

And a patch to join impl functions:

(defn join-windowed
  "Combines the values of two streams that share the same key using a windowed
  inner join. Adds the `join-name` parameter, which is used to name the internal
  storage topics. Requires patched version of the kafka streams jar."
  [this-kstream other-kstream value-joiner-fn windows
   {key-serde :key-serde this-value-serde :value-serde}
   {other-value-serde :value-serde}
   join-name]
  (clj-kstream
   (.join (kstream* this-kstream)
          (kstream* other-kstream)
          (value-joiner value-joiner-fn)
          windows
          (Joined/with key-serde this-value-serde other-value-serde join-name))))

(defn left-join-windowed
  "Combines the values of two streams that share the same key using a windowed
  left join. Adds the `join-name` parameter, which is used to name the internal
  storage topics. Requires patched version of the kafka streams jar."
  [this-kstream other-kstream value-joiner-fn windows
   {key-serde :key-serde this-value-serde :value-serde}
   {other-value-serde :value-serde}
   join-name]
  (clj-kstream
   (.leftJoin (kstream* this-kstream)
              (kstream* other-kstream)
              (value-joiner value-joiner-fn)
              windows
              (Joined/with key-serde this-value-serde other-value-serde join-name))))

Supporting naming of windowed joins seems quite critical as explained above but looking into supporting other internal topic custom naming support should also be looked at.

@cddr
Copy link
Contributor

cddr commented Aug 9, 2019

Good summary of the issue. Thanks @kidpollo

Can we make sure the functions proposed above maintain the existing behavior if called without a join-name?

@kidpollo
Copy link
Contributor Author

Yes they do I believe @99-not-out tested this by building trunk

@cddr
Copy link
Contributor

cddr commented Aug 14, 2019

Yes they do I believe @99-not-out tested this by building trunk

Not sure I made myself clear. What I mean is that users of jackdaw should continue to be able to call e.g. the join-windowed function with only 6 args (as opposed to the 7 args required by the definition above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants