Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Include dockerized protoc #404

Merged
merged 24 commits into from
Nov 27, 2021
Merged

feat: Include dockerized protoc #404

merged 24 commits into from
Nov 27, 2021

Conversation

boukeversteegh
Copy link
Collaborator

@boukeversteegh boukeversteegh commented Nov 22, 2021

@stephenh I managed to dockerize protoc pretty neatly using a generic protoc.Dockerfile, and a docker-compose.yml file that configures it for use with ts-proto.

This will help developers working on ts-proto, and can be used during integration.

Usage

. aliases.sh

protoc --plugin=//ts-proto/protoc-gen-ts_proto --ts_proto_out=$OUT_DIR -I $PROTO_ROOT $PROTO_FILES
# or
ts-protoc --ts_proto_out=$OUT_DIR -I $PROTO_ROOT $PROTO_FILES
# Example
ts-protoc --ts_proto_out=out -I integration/value integration/value/value.proto

protoc ARGS is equivalent to:

docker-compose run -f docker-compose.yml --rm protoc ARGS
  • the path to the docker-compose file is made absolute, so you can use dockerized protoc anywhere on your system.

Notes

  • The ts-proto root directory is mounted into docker container as /ts-proto
  • The current directory from where protoc is launched is mounted into /host in the container
    • The containers working directory will be /host
    • Because of this, you can use protoc in any directory, and refer to host files using relative paths
    • You cannot refer to files outside of the current directory (i.e: ../foo.proto), as the parent folder will not be mounted on the container.
  • When you launch protoc from the ts-proto root folder, the working directory will contain the protoc-gen-ts_proto file, so then you won't need to specify the --plugin parameter. For some reason, I cannot manage to make the internal protoc recognize the ts-proto plugin binary from the PATH variable. Even though the binary is globally accessible from within the container.
  • For debugging the container, you can run protoc-sh which will launch the container into the shell.
  • Use protoc-build to rebuild the container when you are improving it.

Todo

As I'm not familiar with the actual integration and test process, I hope someone would be willing to do that part. In theory it should be as simple as importing the aliases.sh file, which overwrites protoc with the dockerized version.

@boukeversteegh boukeversteegh changed the title Include dockerized protoc feat: Include dockerized protoc Nov 22, 2021
@sgielen
Copy link
Contributor

sgielen commented Nov 22, 2021

I ran into the same issue you reported in the other PR (that all .bin files were modified) and on my end, I resolved it by running only yarn setup in a Docker container based on node:14-bullseye with apt-get update && apt-get -y install protobuf-compiler ran beforehand. We could, in theory, add the script yarn setup:docker to do this automatically? Then, not only protoc would run in Docker against a stable environment, but also the codegen itself?

@boukeversteegh
Copy link
Collaborator Author

boukeversteegh commented Nov 22, 2021

@sgielen Yeah, I we could definitely update-bins.sh, since that is the culprit. I think codegen.sh is not unstable, but if we dockerized it then devs could run the tools without protoc.

However, dockerizing does replace the protoc dependency with docker. To be fair, it's more reliable and straightforward to depend on docker, rather than protoc, but it will exclude some developers who cannot install docker for some reason.

My idea would be that sourcing the aliases.sh file, the user can opt-in to use dockerized protoc, whereas the scripts would otherwise use the system's protoc command.

However, there would be no option to switch this behavior when using the run scripts.
Perhaps multiple versions of each run script could be available, like setup setup:docker etc.

@stephenh
Copy link
Owner

@boukeversteegh this looks great!

Yeah, good point on the aliasing letting developers opt-in / out of needing to have docker installed. I guess I'm kind of biased, I've been using docker for awhile now so generally take it for granted. I wouldn't mind making that a simplifying function, if only for things like "updating the bin files".

I pulled this down locally to poke around (I like it!) and per @sgielen 's musing, I pushed up a small commit to this PR (hope that's okay!) that adds ~3-4 yarn setup:... scripts, that I think could replace today's ~overly blunt yarn setup with an approach that is both more granular, as well as all running within the docker protoc service, so should have cross-machine deterministic output.

And, yeah, personally I'd be fine with these docker-enabled commands being the primary/only-ish way we provide, at least via the package.json scripts. I suppose really only the update-bins is what really needs to run in the docker service, after that things like codegen.sh are stable and can be ran directly on the host machine. Which is nice b/c I think that's the more frequent iteration cycle, while iterating on changes and seeing what the latest output is.

So, dunno, at least as a 1st/baby step, I could see removing the current setup script, updating the readme to mention each of the setup:docker/pbjs/bins/codegen commands, and then calling that good/great for now? Wdyt?

@boukeversteegh
Copy link
Collaborator Author

@stephenh thank you for your positive reaction :)

command names

I like the separate run commands you made. The big setup command takes quite some time to run, so I started to run things separately, but I ended up messing it up often, by not being careful what I should run, when, and in what order. So in the end its good to have a single combined command, or it should be clear how the separate commands should be used.

What could help is command names that reflect the process a bit better. For example:

  • proto2bin
  • bin2pbjs
  • bin2ts

I think these names will make it more clear. Although I would personally also be happy with a proto2ts command. Or better, let the tests run the conversion, but that's another rabbit hole.

But in general replacing the big 'setup' script with separate commands running in docker, and updating the readme sounds like a valuable improvement, i.e. a good step forward.

making docker primary

Well it would have benefits, I'm not against it. Getting the update-bins.sh to work on Windows was quite a disaster. In the end the solution was not complicated, but it took long to figure out. I had to create a protoc-gen-dump.bat file that contains @bash protoc-gen-dump and use that instead in the script. The way to reference it also was not trivial.

small extra commit

When running the npm commands from Webstorm (my IDE), no $PWD is passed which gave an error, so I updated the compose file to use . as the default working dir, which works well.

@stephenh
Copy link
Owner

proto2bin bin2pbjs bin2ts

Sure, I like those names. And yeah, we could probably have both a single yarn setup that did all three of those, as well as the individual commands for running more incrementally.

Getting the update-bins.sh to work on Windows was quite a disaster

Ah shoot, that sucks. Out of curiosity, would the docker-based yarn proto2bin-runs-update-bins.sh-in-docker solve this, b/c Windows developers wouldn't have to invoke it directly? If so, that seems like a big win. In general, how is docker on Windows? Pretty easy / just works to do like the new yarn proto2bin-runs-docker-compose approach?

@boukeversteegh
Copy link
Collaborator Author

boukeversteegh commented Nov 23, 2021

Cool! Yeah let's keep a single command as well.

The new command that work in docker do away with the windows compatibility issue immediately.

Docker on windows runs really well once it's installed and you can be sure the commands will work without issue.

However, there was an issue with HyperV, an operating system feature for hardware virtualization and virtual box used to not work when hyperv is on. A quick search shows that nowadays vbox is compatible with HV but it is slower and may require reconfiguration of existing VMs after enabling. Then I also remember some colleagues at my past job who could not get docker to work on their work laptops. This is more likely related to security restrictions put in place by the company's sys admins. Another issue is that HyperV is not available on the Home editions of windows, although there's now a way to install docker through the integrated Linux called Windows Subsystem for Linux, which then doesn't require windows professional.

So in short, in some cases, installing docker is not straightforward and I think some users will be happy if they don't need docker to contribute to ts proto.

I can update the scripts to be compatible to be run natively on windows, sometime tomorrow probably. Cheers!

@boukeversteegh
Copy link
Collaborator Author

boukeversteegh commented Nov 25, 2021

Hello! I've made some updates. Please have a look if you're okay with these changes.

Changes:

  • Compatibility of codegen with windows
  • Split up and renamed yarn tasks
  • Added yarn tasks that use locally installed protoc
  • Expanded Readme with development workflow, and how/when to use the commands
  • Allow codegen to run in multiple directories, and accept multiple directory name arguments
  • Avoid one warning previously issued by updatebins, caused by directly compiling proto files inside nested folders in one integration test. The script now only handles integration/*/*.proto, not integration/**/*.proto

I took the liberty to rename setup to build:test, to better indicate that this command needs to be used more than once (unlike setup, which is generally a one-time procedure), and to clarify that it is related to the tests, rather than using the package itself.

Feedback welcome. If all is good, I think it can be merged. There are no loose ends as far as I'm aware.

PS: With this PR we can consistently generate the .bin files. I currently referenced protoc 3.19.1. All .bin files are different when I run updatebins, also when I ran it with the version suggested by @sgielen (which is 3.12.4). Should we update all bins as well, with the latest protoc, as part of this PR?

Copy link
Owner

@stephenh stephenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boukeversteegh great, this lgtm! And yeah, I think just include the latest/3.19 updated bins in this PR as well. That way everything should be clean on main after this lands.

@boukeversteegh
Copy link
Collaborator Author

@boukeversteegh great, this lgtm! And yeah, I think just include the latest/3.19 updated bins in this PR as well. That way everything should be clean on main after this lands.

Thank you for your review and approval! I've updated the bins, one test case, and added some notes in the readme about the dockerized protoc.

Feel free to squash/merge if tests pass.

@boukeversteegh
Copy link
Collaborator Author

(or should I merge, so that my name shows up in the commit?)

@boukeversteegh
Copy link
Collaborator Author

boukeversteegh commented Nov 27, 2021

Not sure why, but there's a difference in the .ts files generated by codegen.sh and yarn pb2ts, which runs codegen.sh from within docker.

As far as I understand this script does not rely on protoc, so there should not be any difference.

I'm trying again running from the host, hopefully that resolves the build error.

This kind of means that we should switch the build scripts to use docker as well, otherwise we are pointing developers to use the dockerized scripts, whose output will - in the end - not be accepted by the pipeline. Any help would be appreciated.

@stephenh
Copy link
Owner

@boukeversteegh I made a few small tweaks, and then ran yarn bin2ts, and it had a few extra files that maybe just didn't get included? Not sure, but it's green now! Going to hit merge. Thanks! Getting these bin files stabilized is huge!

@stephenh stephenh merged commit 7564a78 into main Nov 27, 2021
@stephenh stephenh deleted the feature/protoc-docker branch November 27, 2021 16:26
stephenh pushed a commit that referenced this pull request Nov 27, 2021
# [1.91.0](v1.90.1...v1.91.0) (2021-11-27)

### Bug Fixes

* use Long.fromValue instead of Long.fromString for improved robustness regarding already parsed objects ([#405](#405)) ([7bdc3ee](7bdc3ee))

### Features

* Include dockerized protoc ([#404](#404)) ([7564a78](7564a78))
@stephenh
Copy link
Owner

🎉 This PR is included in version 1.91.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@stephenh
Copy link
Owner

@boukeversteegh okay, so there was actually some non-determinism here; it turns out that imported files like timestamp.ts where actually being generated twice, i.e. for the integration/type-registry test that had multiple *.proto files (that each imported timestamp), and hence multiple *.binfiles (and each.binfile had its own copy of theTimestamp` proto type).

But only non-triple-nested *.bin files were getting updated, so we ended up having mixed versions of the google.protobuf.Timestamp type, and the parallelism inside of codegen.sh was just randomly picking which bin file we'd use first, hence which version of timestamp.ts was effectively random.

I ended up fixing this in #418 but only after several other "smashing the hammer on random things" changes. :-)

@boukeversteegh
Copy link
Collaborator Author

@boukeversteegh okay, so there was actually some non-determinism here;

I ended up fixing this in #418 but only after several other "smashing the hammer on random things" changes. :-)

Wow, I'm amazed you were able to figure this out. I will be careful in future PRs to change too many things at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants