Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msgpack unit test failures on ppc64 arch #524

Closed
dkopecek opened this issue Mar 19, 2017 · 45 comments
Closed

msgpack unit test failures on ppc64 arch #524

dkopecek opened this issue Mar 19, 2017 · 45 comments
Labels
aspect: binary formats BSON, CBOR, MessagePack, UBJSON kind: bug solution: proposed fix a fix for the issue has been proposed and waits for confirmation state: help needed the issue needs help to proceed

Comments

@dkopecek
Copy link
Contributor

Hi, I was trying to build[1] the newest release on multiple architectures (Fedora build architectures). There seems to be some msgpack related unit test failure on ppc64. See the attached build log:

json-ppc64-build.log.txt

[1] https://koji.fedoraproject.org/koji/taskinfo?taskID=18476755

@nlohmann
Copy link
Owner

Thanks for the report. Could this be an endianess issue?

@dkopecek
Copy link
Contributor Author

Yeah, that would be my guess too. Consider:

src/unit-msgpack.cpp:210: FAILED:
  CHECK( json::from_msgpack(result) == j )
with expansion:
  1 == 256

which looks like 0x0001 == 0x0100

@dkopecek
Copy link
Contributor Author

ppc64 is big endian

@nlohmann nlohmann added the aspect: binary formats BSON, CBOR, MessagePack, UBJSON label Mar 28, 2017
@user1095108
Copy link

if you decide to do the swaps don't forget about bit_cast<>().

@nlohmann
Copy link
Owner

I currently do not have the means to test this. Is there a way to have a big endian system with Vagrant?

@nlohmann nlohmann added the state: help needed the issue needs help to proceed label Mar 31, 2017
@user1095108
Copy link

user1095108 commented Apr 1, 2017 via email

@nlohmann
Copy link
Owner

nlohmann commented Apr 3, 2017

In the end, I need a test setup to at least verify a fix.

I could get qemu to work, but I failed to get a recent Linux with a recent compiler to work, let alone properly integrate it into my network. Hints greatly appreciated!

@user1095108
Copy link

user1095108 commented Apr 3, 2017 via email

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

Hey @dkopecek, could you please try https://github.com/nlohmann/json/blob/feature/manual_lexer/src/json.hpp ? I could not test it with a big-endian system myself.

It is a reimplementation of the binary readers/writers which should respect the byte order.

The main functions are:

// from http://stackoverflow.com/a/1001328/266378
static bool little_endianess()
{
    int num = 1;
    return (*reinterpret_cast<char*>(&num) == 1);
}
// writes a number of type T to vector v
template<typename T>
void write_number(T n)
{
    std::array<uint8_t, sizeof(T)> vec;
    std::memcpy(vec.data(), &n, sizeof(T));

    for (size_t i = 0; i < sizeof(T); ++i)
    {
        // reverse byte order prior to conversion if necessary
        if (is_little_endian)
        {
            v.push_back(vec[sizeof(T) - i - 1]);
        }
        else
        {
            v.push_back(vec[i]);
        }
    }
}
// reads a number of type T from input
template<typename T>
T get_number()
{
    std::array<uint8_t, sizeof(T)> vec;
    for (size_t i = 0; i < sizeof(T); ++i)
    {
        get();
        check_eof();

        // reverse byte order prior to conversion if necessary
        if (is_little_endian)
        {
            vec[sizeof(T) - i - 1] = static_cast<uint8_t>(current);
        }
        else
        {
            vec[i] = static_cast<uint8_t>(current);
        }
    }

    T result;
    std::memcpy(&result, vec.data(), sizeof(T));
    return result;
}

@nlohmann nlohmann added the solution: proposed fix a fix for the issue has been proposed and waits for confirmation label Apr 5, 2017
@user1095108
Copy link

user1095108 commented Apr 5, 2017 via email

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

I could not find such a constant in the standard, and I would like to avoid dozens of #ifdefs to cover all compilers. I'm curious whether this fix works in the first place.

@user1095108
Copy link

user1095108 commented Apr 5, 2017 via email

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

Is that macro portable?

@user1095108
Copy link

user1095108 commented Apr 5, 2017 via email

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

This was exactly what I wanted to avoid.

@user1095108
Copy link

user1095108 commented Apr 5, 2017 via email

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

I wanted to avoid "dozens of #ifsefs".

@dkopecek
Copy link
Contributor Author

dkopecek commented Apr 5, 2017

@nlohmann Cannot confirm that the fix works, because there are some compilation errors. See the attached
build.log.txt.

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

Sorry, I forgot I also changed the test suite. You need to check out the whole branch.

@dkopecek
Copy link
Contributor Author

dkopecek commented Apr 5, 2017

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

Thanks for the feedback. I really need a test system with ppc64.

@dkopecek
Copy link
Contributor Author

dkopecek commented Apr 5, 2017

@nlohmann Check out virt-builder:
https://developer.fedoraproject.org/tools/virt-builder/about.html
http://manpages.ubuntu.com/manpages/yakkety/man1/virt-builder.1.html

There are templates for Fedora ppc64, ppc64le, armv7, ... etc. (use virt-builder -l to list available templates).

@nlohmann
Copy link
Owner

nlohmann commented Apr 5, 2017

This has nothing to do with the actual error, but the build script is odd:

+ doxygen doc/Doxyfile
warning: tag INPUT: input source `../src/json.hpp' does not exist
warning: tag INPUT: input source `index.md' does not exist
warning: source examples is not a readable file or directory... skipping.
warning: source images is not a readable file or directory... skipping.
warning: source ../src/json.hpp is not a readable file or directory... skipping.
warning: source index.md is not a readable file or directory... skipping.
sh: dot: command not found
error: Problems running dot: exit code=127, command='dot', arguments='"/builddir/build/BUILD/json-feature-manual_lexer/html/graph_legend.dot" -Tsvg -o "/builddir/build/BUILD/json-feature-manual_lexer/html/graph_legend.svg"'
error: Problem extracting size from SVG file /builddir/build/BUILD/json-feature-manual_lexer/html/graph_legend.svg
error: Style sheet 'css/mylayout.css' specified by HTML_EXTRA_STYLESHEET does not exist!

Instead of calling doxygen directly, one should call make -Cdoc instead.

@nlohmann nlohmann removed the solution: proposed fix a fix for the issue has been proposed and waits for confirmation label Apr 8, 2017
@nlohmann
Copy link
Owner

nlohmann commented Apr 8, 2017

I managed to set up a Debian 3.16.39 with powerpc architecture using qemu. With GCC 4.9.2 I could compile all tests. CBOR/MessagePack are still failing, but at least I can now debug myself ;)

@user1095108
Copy link

user1095108 commented Apr 8, 2017 via email

@nlohmann
Copy link
Owner

nlohmann commented Apr 8, 2017

It seems so:
be

@nlohmann
Copy link
Owner

nlohmann commented Apr 8, 2017

With the manual_lexer branch, there is only one test failing for CBOR and MessagePack, respectively:

msgpack64

It seems that the test itself assumes little endianess. I shall add a fix.

@nlohmann
Copy link
Owner

nlohmann commented Apr 8, 2017

With the last commit (6f99d5b) all tests succeed.

@nlohmann nlohmann added the solution: proposed fix a fix for the issue has been proposed and waits for confirmation label Apr 8, 2017
@nlohmann
Copy link
Owner

@dkopecek Could you please check whether the fix works for you?

@user1095108
Copy link

user1095108 commented Apr 10, 2017 via email

@dkopecek
Copy link
Contributor Author

@nlohmann Oops, sorry, I was planning to but totally forgot :-) I'll check it out tomorrow (CEST timezone) and let you know.

@nlohmann
Copy link
Owner

What's wrong with MIPS?

@user1095108
Copy link

user1095108 commented Apr 10, 2017 via email

@nlohmann
Copy link
Owner

Yes, and I proposed a fix for big endianess. I assume it also works for MIPS, but I already had troubles getting PPC to work.

@user1095108
Copy link

user1095108 commented Apr 10, 2017 via email

@nlohmann
Copy link
Owner

Sorry, i don't, and would appreciate help on this issue.

@dkopecek
Copy link
Contributor Author

@nlohmann results:

x86_64: ok
ppc64le: ok
aarch64: ok
ppc64: FAIL
i686: ok
armv7hl: ok

but this time it's a different fail. see the attached ppc64-build.log.txt

@nlohmann
Copy link
Owner

Thanks for reporting. Could you rerun the tests with

make json_unit
test/json_unit "*"

This runs all tests and does not abort after the first failure.

@dkopecek
Copy link
Contributor Author

Interesting. With that testing procedure, the tests don't fail on ppc64 and do fail on armv7hl. So there seems to be something non-deterministic in the tests.

armv7hl-build.log.2.txt
ppc64-build.log.2.txt

@nlohmann
Copy link
Owner

That is indeed strange. The tests should be (of course) deterministic, but since the ARM error is a SIGSEGV, maybe just a different call could have triggered it. Without debugging, I cannot tell much about the issue...

Could I ask you to try the following?

make test-conversions -Ctest
test/test-conversions -s

This should give us more information on when the test crashes.

@nlohmann
Copy link
Owner

(On the bright side, it seems as if the fix for the endianess issue seems to work)

@dkopecek
Copy link
Contributor Author

Could I ask you to try the following?

make test-conversions -Ctest
test/test-conversions -s

Doesn't fail at all. Tried several times.

@nlohmann
Copy link
Owner

Strange... Could you try to execute the tests with Valgrind?

@dkopecek
Copy link
Contributor Author

Here are the logs. x86_64 is ok but I've attached it anyway.

armv7hl-build.3.log.txt
ppc64-build.3.log.txt
x86_64-build.3.log.txt

@nlohmann
Copy link
Owner

I think I need to close this issue. I cannot reproduce the problem, and I can do little with the provided logs.

@nlohmann nlohmann closed this as completed Jul 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aspect: binary formats BSON, CBOR, MessagePack, UBJSON kind: bug solution: proposed fix a fix for the issue has been proposed and waits for confirmation state: help needed the issue needs help to proceed
Projects
None yet
Development

No branches or pull requests

3 participants