Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore: fix utf-8 encoding returning buffers #51

Closed
wants to merge 1 commit into from
Closed

Restore: fix utf-8 encoding returning buffers #51

wants to merge 1 commit into from

Conversation

vweevers
Copy link
Member

@vweevers vweevers commented Jun 21, 2019

History:

  • The utf8 encoding is supposed to return a string. That's always been the case AFAIK and makes perfect sense. Stores like leveldown however, may return data as a buffer and we've had to account for this. Let's call this the maybe-string problem. The primary solution has been to pass leveldown a boolean *asBuffer option. If false, leveldown returns data as a string.
  • When we separated the encoding logic from levelup into encoding-down, we didn't take the *asBuffer options into account (we didn't realize that at the time). Some stores would return a buffer instead of a string. To deal with that, coercion to string was added to level-codec in Fix/utf8 decoding #12 (7.0.0).
  • Coercion to string was removed in Revert "fix utf-8 encoding returning buffers" #23 (8.0.0) because the *asBuffer logic was restored in asBuffer fix encoding-down#19; we thought coercion was no longer necessary.

This PR restores the coercion, to work around an ecosystem quirk: leveldown and memdown handle strings and buffers differently. While leveldown stores both types as a byte array (meaning you can put a buffer and get back a string if so desired, and vice versa), memdown stores them as-is (meaning if you put a buffer, you'll get back a buffer; if you put a string, you'll get back a string - simplified). This leads to unexpected behavior.

Another issue (which won't be fixed by this PR but is very relevant) is that memdown isn't able to compare a string key to a buffer key (or any other type for that matter); you can only safely use one key type in your db. Possible solutions are discussed in Level/memdown#186. Let's call this the mixed-type problem. It is relevant because:

  • One proposed solution will make memdown behave like leveldown and thus remove the need for this level-codec PR. Before you say "that sounds like the simplest solution", wait...
  • Another proposed solution will make memdown behave like level-js which doesn't have the maybe-string problem either, albeit for a different reason. It treats strings and buffers as distinct keys and values, even if their bytes are the same. Arguably - especially when viewed outside of the historical context of Level - this is the least-surprising behavior because you get back what you store. Working with binary data is a distinctly different use case from working with utf8 strings. You'll only sometimes have the need to process utf8 data as binary, which you can still do.

So, fixing the mixed-type problem might also fix the maybe-string problem, but we could still choose to merge this PR as a short- to medium-term solution.

@vweevers vweevers added the semver-major Changes that break backward compatibility label Jun 21, 2019
@vweevers vweevers self-assigned this Jun 21, 2019
@vweevers vweevers removed their assignment Jun 22, 2019
@vweevers
Copy link
Member Author

Updated the description with context.

@achingbrain
Copy link
Contributor

I just hit this too while trying to upgrade some really old modules to the latest versions of level*, it would be great if this could be merged. Robustness principle, etc.

@vweevers
Copy link
Member Author

@achingbrain With memdown, or other (would be good to know)?

This PR is not my preferred solution, it's a workaround. If it were semver-patch we could release it right away but it's semver-major (that bubbles up to dependents) so I don't want to rush this. Feedback on Level/memdown#186 is most welcome.

@achingbrain
Copy link
Contributor

Yes - with level-mem specifically.

@vweevers
Copy link
Member Author

Hm, let's reconsider the assertion that this would be semver-major, because I only based that on the fact that the previous back-and-forth changes were semver-major.

If we can all agree that the utf8 encoding is supposed to return a string, and that this has always been the case, then this PR can be considered a bug fix. If anyone somehow relied on the bug (which I doubt) or made a workaround for it, they should have done so with a typeof x !== 'string' check. Even if they didn't, they most likely did something like x = String(x) which will still work.

@vweevers
Copy link
Member Author

Closing in favor of Level/memdown#191.

@vweevers vweevers closed this Aug 14, 2019
@vweevers
Copy link
Member Author

vweevers commented Sep 6, 2019

@achingbrain I just released level-mem@5.0.0 which upgraded to memdown@5 which removes the need for this level-codec workaround. Let us know if it doesn't resolve your issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-major Changes that break backward compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants