Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8-bit transparency under Python 3 #60

Open
seveas opened this issue May 11, 2015 · 3 comments
Open

8-bit transparency under Python 3 #60

seveas opened this issue May 11, 2015 · 3 comments
Labels

Comments

@seveas
Copy link
Contributor

seveas commented May 11, 2015

Pull request #57 adds python 3 compatibility, but always tries to decode job bodies. Given that arbitrary byte strings may be put in the body, this is suboptimal.

For other protocol messages, only ascii content is accepted, so encoding/decoding automatically is fine.

I propose the following:

  • Add an 'encoding' attribute to Connection.__init__, defaulting to sys.getdefaultencoding

For put:

  • When you put a bytes objects it's put as-is
  • Otherwise it's encoded with the encoding above

And for reseve/peek etc. (_read_body):

  • When a body is read:
    • If encoding is None, return a bytes object
    • Otherwise decode with the specified encoding and return a string

If this design is acceptable, I'll rebase and amend the pull request to follow this design.

@earl earl changed the title 8-bit transparency under python 3 8-bit transparency under Python 3 May 11, 2015
@svisser
Copy link
Contributor

svisser commented May 13, 2015

If encoding or decoding fails, should we raise Python's built-in exceptions or wrap them in a BeanstalkcException? Similarly, when encoding is None and a Unicode string is passed to put what exception would we raise?

I think wrapping them with a BeanstalkcException is a clean solution (similar to how socket problems are wrapped) but preserving exception information is only elegantly supported in Python 3.x:

raise SubclassBeanstalkcException from exc

@seveas
Copy link
Contributor Author

seveas commented Aug 17, 2015

@earl poke. Any comments?

@johnchristopherjones
Copy link

I think it might be incorrect to support string encoding/decoding for beanstalkc. Beanstalkd is a lot like socket; it has no real knowledge (as far as I know) about the contents of a job; it just stashes bytes. There's no real way (or need) for beanstalk to know the encoding of a string job, or even that the job is a string.

What if beanstalkc only speaks bytes? The user would have to explicitly encode/decode on put/reserve, but I'm not sure that's a significant burden. We're implicitly relying on us-ascii/utf-8 being the same thing as byte strings most of the time in the Python 2 interface. Acknowledging that explicitly seems like the way to go. That sidesteps the 8-bit byte transparency issue entirely. Adding automatic encoding/decoding for strings is a convenience that can be added later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants