-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with SimpleConsumer when using auto commit #18
Comments
I also noticed these problems with offset commit and offset fetch. In consumer.py, I un-commented the lines for 0.8.1, which will fetch the offset information for the consumer. This resulted in an error in the broker. If I enable auto-commit or do a manual commit with consumer.commit(), the broker crashes with an error. Will update the exact error messages in a while. |
When I do consumer.commit() the following error is seen at the broker end.
The python-kafka error is as follows
|
@atinsood - after your fix, did auto-commit work? I am seeing an issue where the broker throws an exception. |
@mahendra I haven't pulled in the changes that you made. Will try to fork your code. But after my changes the commit still does not work.. I am able to commit successfully without any exception but the consumer still seems to be pulling out all the messages. I ended up referring to https://github.com/mumrah/kafka-python/blob/master/kafka/NOTES.md and changed my code to
But the offset always seems to be offset is {0: 0} and now I am able to pull out messages from the offset that I specified 10 in this case to the end |
Made a few more changes in the base code and had some more luck
So basically after uncommenting the code that's mentioned for 0.8.1 we need to comment the next for loop. My consumer code looks like
So I am able to get messages now and the offset is correctly increasing But I am seeing the following exception after every 48 messages for some reason :)
|
@atinsood one quick question. which version of kafka are you running? I am following instructions here: https://cwiki.apache.org/KAFKA/kafka-08-quick-start.html |
@mahendra Been using the master so far. But will look into the instructions mentioned by you and use that branch |
@atinsood I have a branch in my repo which has all the fixes. You can give it a try https://github.com/mahendra/kafka-python/tree/affirm |
@atinsood @mumrah I also noticed the buffer underflow errors. I think this is happening because of itertools not being thread safe. For eg: self._next_id() is using itertools.count(). When the commit thread runs, it is possible that the same correlation id is being used. I saw that some fetch request data was going to commit request decoder (and vice-versa). I averted this problem by protecting self._next_id() around a lock. Then I saw that this problem is happening in other calls. I am looking into this issue. A temporary work-around is to use auto_commit without a auto-timer
This way, the commit code will run within the main thread. Once I did this, the errors disappeared. We need to ensure that protocol/client code is thread safe. |
@atinsood @mumrah - Well, I just went through the code in detail and noticed that correlation id is not being used for multiplexing requests as of now. In that case, invoking auto_commit via a timer (thread) is inherently unsafe. We need to look at alternate approaches. One very easy approach that I can think of is to use multiprocessing module instead of threading. The connection is then in a separate process space and commit can happen in-parallel. Still thinking on it. Will try it out and send a patch if it works well. |
Wow, a lot to catch up on from this weekend. @mahendra, you're totally correct in that the library is not thread safe right now. I believe the only thing required to make it so is to implement the request multiplexing in conn.py. The request encoding/decoding should be thread safe since it is not using any instance variables during the decoding process (notice that everything in protocol.py is marked as If the commits are happening on a separate thread/process, there needs to be a way optionally synchronize when a manual commit happens. There are some use cases when you need to be sure that your offsets have been committed before proceeding. |
@mumrah I agree that implementing request multiplexing in conn.py is a better option than implementing it as a separate process. I have an early version working with running commit in a separate process (using shared memory). Will explore request multiplexing tomorrow and see how to do it. |
Cleaning up old issues. This seems resolved |
I was trying to use Simple Consumer to do auto commit, but ran into number of issues.
SimpleConsumer(kafka, "my-group", TOPIC_NAME, True)
then the auto commit does not work.
SimpleConsumer(kafka, "my-group", TOPIC_NAME, True, auto_commit_every_n=100)
then the code fails with unable to find reference for ReentrantTimer
SimpleConsumer(kafka, "my-group", TOPIC_NAME, True, 100, 5000)
the code fails with unable to find reference to send_offset_commit_request which is defined incorrectly.
self.send_offset_commit_request(self.group, reqs)
SimpleConsumer(kafka, "my-group", TOPIC_NAME, True, auto_commit_every_n=100)
can proceed further.
But now I am stuck at
Code diff to workaround issues mentioned above
The text was updated successfully, but these errors were encountered: