Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.5.1 breaks base64 -d on Alpine Linux (musl libc) #126

Closed
jirutka opened this issue Nov 19, 2023 · 15 comments
Closed

v0.5.1 breaks base64 -d on Alpine Linux (musl libc) #126

jirutka opened this issue Nov 19, 2023 · 15 comments
Assignees
Milestone

Comments

@jirutka
Copy link

jirutka commented Nov 19, 2023

base64 v0.5.0 works fine, v0.5.1 is broken on Alpine Linux Edge (both with gcc and cmake):

cmake -B build -G Ninja \
        -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_INSTALL_PREFIX=/usr \
        -DBUILD_SHARED_LIBS=ON \
        -DBASE64_BUILD_CLI=ON \
        -DBASE64_BUILD_TESTS=ON \
        -DBASE64_WITH_OpenMP=OFF \
        -DBASE64_WITH_AVX512=OFF \
        -DBASE64_WITH_AVX2=OFF \
        -DBASE64_WITH_AVX=OFF
cmake --build build

 ctest --test-dir build --output-on-failure
Test project base64-0.5.1/build
    Start 1: test_base64
    Start 2: benchmark
1/2 Test #1: test_base64 ......................   Passed    0.00 sec
2/2 Test #2: benchmark ........................   Passed    1.04 sec

100% tests passed, 0 tests failed out of 2
echo foo | ./build/bin/base64 | ./build/bin/base64 -d
./build/bin/base64: stdin: decoding error
execve("./build/bin/base64", ["./build/bin/base64", "-d"], 0x7ffc6def0e48 /* 19 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fc7571dcb08) = 0
set_tid_address(0x7fc7571dcf70)         = 32459
brk(NULL)                               = 0x55b6d3d97000
brk(0x55b6d3d99000)                     = 0x55b6d3d99000
mmap(0x55b6d3d97000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x55b6d3d97000
open("build/bin/libbase64.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=67656, ...}) = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960) = 960
mmap(NULL, 69632, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc75712a000
mmap(0x7fc75712b000, 40960, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0x7fc75712b000
mmap(0x7fc757135000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED, 3, 0xb000) = 0x7fc757135000
mmap(0x7fc757139000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0xe000) = 0x7fc757139000
close(3)                                = 0
mprotect(0x7fc757139000, 4096, PROT_READ) = 0
mprotect(0x7fc7571d9000, 4096, PROT_READ) = 0
mprotect(0x55b6d2d08000, 4096, PROT_READ) = 0
mmap(NULL, 1048596, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc757029000
mmap(NULL, 1398137, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc756ed3000
readv(0, [{iov_base="Zm9vCg==\n", iov_len=1048575}, {iov_base="", iov_len=1024}], 2) = 9
readv(0, [{iov_base="", iov_len=1048566}, {iov_base="", iov_len=1024}], 2) = 0
writev(2, [{iov_base="./build/bin/base64: stdin: decod"..., iov_len=42}, {iov_base=NULL, iov_len=0}], 2./build/bin/base64: stdin: decoding error
) = 42
munmap(0x7fc757029000, 1052672)         = 0
munmap(0x7fc756ed3000, 1400832)         = 0
close(0)                                = 0
close(1)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++
@aklomp aklomp closed this as completed Nov 20, 2023
@aklomp aklomp reopened this Nov 20, 2023
@aklomp
Copy link
Owner

aklomp commented Nov 20, 2023

Sorry, fat-fingered the post button.

@aklomp
Copy link
Owner

aklomp commented Nov 20, 2023

Hi, this is caused by a change in behavior of the bin/base64 utility since 0.5.0. In order to be bug-compatible with "standard" base64, it now always outputs a newline after the output. Since \n is not a valid base64 char, an error is thrown when the output is piped back into the decoder.

The fix is to run it as echo foo | ./build/bin/base64 -w0 | ./build/bin/base64 -d. The -w0 option suppresses line-buffered output.

@jirutka
Copy link
Author

jirutka commented Nov 21, 2023

I hope you don’t consider this totally unexpected and confusing behaviour a feature. Since newline is not a valid character in base64, you can easily strip the trailing newline in the input to the decoder.

@aklomp
Copy link
Owner

aklomp commented Nov 22, 2023

The base64 utility is intended as a demo project for the base64 library, to show how to integrate the library into a codebase. I don't consider it a core feature. Before the last release, the base64 library was very simple and behaved as a dumb pipe. stdout could be piped to stdin and all was well.

But over the years a few issues have been raised (#8 is the one I could find quickly) that were caused by the fact that the behavior was not identical to the system base64. So in issue #115 I decided to fix that complaint and go for bug compatibility. So now the program always emits a trailing newline and always line-breaks the output by default. Gross, I agree, because I like dumb pipes, but it upholds the principle of least surprise.

But, and this is where you have a point, it seems like I didn't do a perfect job, because upon checking, the system base64 does seem to eat newlines, as demonstrated by:

echo foo | base64 | base64 -d

So I guess this issue is now about stripping newlines from the input.

@aklomp
Copy link
Owner

aklomp commented Nov 22, 2023

Possible connection to #33.

@sergeevabc
Copy link

Windows 7 x64 user here. Looking forward to test a fixed version.

@aklomp
Copy link
Owner

aklomp commented Jan 9, 2024

I pushed an issue126 branch which fixes this issue, but in benchmarks it's about twice as slow as the system base64's decoder on the same large input file. Probably because the algorithm is really naive and checks the input bytewise for \n.

I guess I can merge it because a working but slow program is preferable over a non-working program, but it sucks to admit defeat to the system base64 :)

@sergeevabc
Copy link

@aklomp, you mean I have to download that branch and compile it myself? o_O

@aklomp
Copy link
Owner

aklomp commented Jan 10, 2024

@sergeevabc Well yes, how else do you get a binary? This project does not publish any binaries.

But anyway, I think I'll merge the branch today because it fixes the issue, and create a follow-up issue to deal with the slow decoding. The fix works, it's pretty trivial and I tested it locally. It's just quite slow because it scans the input twice.

@aklomp aklomp added this to the v0.5.2 milestone Jan 10, 2024
@aklomp
Copy link
Owner

aklomp commented Jan 10, 2024

Found the speed regression. Don't write to stdout in the inner loop :) Anyway, now the timing is competitive again. Will merge soon.

@aklomp aklomp closed this as completed in 8a1b7e8 Jan 10, 2024
@sergeevabc
Copy link

sergeevabc commented Jan 11, 2024

Testing base64 0.5.2 built with w64devkit on Windows 7 x64. Oh my, the executable turned out to be almost 10 times larger than the source (116658 vs 12913 bytes).

$ busybox echo -n Z2l0aHVi | base64.exe -d
github

$ busybox echo -n Z2l0aHViIAo= | base64.exe -d
github

$ busybox echo -n Z2l0aHViCg== | base64.exe -d
github

$ busybox echo -n Z2l0aHViIA0K | base64.exe -d
github

I guess the devil is in the details like spaces, carriage returns and line feeds?

@aklomp
Copy link
Owner

aklomp commented Jan 11, 2024

@sergeevabc I don't quite understand your comment. Are you reporting a bug?

Those strings seem to decode fine for me, including the whitespace, as shown with xxd:

$ echo -n Z2l0aHVi | bin/base64 -d | xxd
00000000: 6769 7468 7562                           github
$ echo -n Z2l0aHViIAo= | bin/base64 -d | xxd
00000000: 6769 7468 7562 200a                      github .
$ echo -n Z2l0aHViCg== | bin/base64 -d | xxd
00000000: 6769 7468 7562 0a                        github.
$ echo -n Z2l0aHViIA0K | bin/base64 -d | xxd
00000000: 6769 7468 7562 200d 0a                   github ..

About the binary size, not sure why it's so much larger, but I think static linking might have something to do with it.

@sergeevabc
Copy link

sergeevabc commented Jan 28, 2024

There's an issue indeed, @aklomp.

@aklomp
Copy link
Owner

aklomp commented Jan 28, 2024

@sergeevabc This issue has been closed. Please open a new issue if you want to file a bug report. By the way, I can't reproduce this in Linux; are you sure you are using the latest version?

@sergeevabc
Copy link

sergeevabc commented Jan 28, 2024

@aklomp, by the way, you should consider displaying the version so that we don't have to guess, redownload and recompile the source code. Yes, I have the latest version (0.5.2). And I suspect that issue is related to handling CRLF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants