Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary schema data is not reproducible. #226

Closed
pmeiyu opened this issue Oct 31, 2018 · 9 comments
Closed

Binary schema data is not reproducible. #226

pmeiyu opened this issue Oct 31, 2018 · 9 comments

Comments

@pmeiyu
Copy link

pmeiyu commented Oct 31, 2018

Hello,

I am building rime-data on Guix. Guix is a functional package manager. It requires the binary output of any package to be reproducible. But it seems that the binary schema data files produced by rime_deployer are not reproducible, thus the binary output of rime-data is not reproducible.

Developers of librime, can you confirm that the binary schema files produced by rime_deployer are not reproducible? Is this by design?

@nameoverflow
Copy link
Member

What does deterministic exactly mean?

@pmeiyu
Copy link
Author

pmeiyu commented Oct 31, 2018

@nameoverflow Deterministic means Reproducible Builds.

A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.

@lotem
Copy link
Member

lotem commented Nov 3, 2018

@nameoverflow
I think they meant that the data files are not the same across rebuild, not the rime_deployer binary.

@pmeiyu
Copy link
Author

pmeiyu commented Nov 3, 2018

@nameoverflow
I think they meant that the data files are not the same across rebuild, not the rime_deployer binary.

Yes. The binary schema data files produced by rime_deployer are not reproducible. Sorry I didn't make myself clear.

@pmeiyu pmeiyu changed the title The output of rime_deployer is not deterministic. Binary schema data is not reproducible. Nov 3, 2018
@lotem
Copy link
Member

lotem commented Nov 3, 2018

Given the same set of input files, the build output are the same on the same machine.
Here's how I tested it:

make -f Makefile.xcode thirdparty
make -f Makefile.xcode
cd xbuild/bin
LD_LIBRARY_PATH=../lib/Release Release/rime_deployer --build
mv build build.1
LD_LIBRARY_PATH=../lib/Release Release/rime_deployer --build
mv build build.2
for x in build.1/*; do diff $x ${x/build.1/build.2} || (echo "diff found"; break); done

If the issue is in reproducibly fetching the set of data files with rime/plum, then librime is not the responsible part.

We've tried to make binary data files work for different CPU architectures. But it turns out data files built on different machines are compatible but not strictly identical.
#121 (comment)

@pmeiyu
Copy link
Author

pmeiyu commented Nov 3, 2018

We've tried to make binary data files work for different CPU architectures.

That's not necessary.

Could it be caused by YAML schema file's filesystem timestamps? Does the binary schema file contain timestamp information?

@pmeiyu
Copy link
Author

pmeiyu commented Nov 3, 2018

I tested on my computer. Touch all the YAML files and rebuild. The result is different. So this is indeed caused by YAML files' timestamp.

@lotem lotem added the wontfix label Jan 14, 2019
@lotem
Copy link
Member

lotem commented Jan 14, 2019

I'm aware of the issue with timestamps, yet decide to leave it there a little longer.
Timestamps provide a quick way to detect user modifications to any YAML config files, this is important to ensure good performance as there is a quick check on every startup.

But the situation may change in the future. I plan to make the data deployment process fully separate from the IME, run on user demand.
Then, in offline deployment, we can afford less time critical checksum comparison with input files.

@lotem lotem self-assigned this Jan 14, 2019
@eagleoflqj
Copy link
Member

Fixed at #720

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants