Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toolchain/gem binary quirks #72

Closed
RyanDraves opened this issue Jan 30, 2024 · 10 comments
Closed

toolchain/gem binary quirks #72

RyanDraves opened this issue Jan 30, 2024 · 10 comments

Comments

@RyanDraves
Copy link
Contributor

RyanDraves commented Jan 30, 2024

Follow-up for #41.

I've been using 0.6.0 for a bit, and the exposed Gem binaries seem follow weird patterns. Here are a few behaviors I noticed:

  • The gem binary targets aren't immediately available. E.g. if run or depend on @ruby//:jekyll after cleaning my Bazel cache, I'll receive an error like:
ERROR: /home/dravesr/.cache/bazel/_bazel_dravesr/8e8f77e8714e317f7e314b9271c50fc0/external/rules_ruby~override~ruby~ruby/BUILD: no such target '@@rules_ruby~override~ruby~ruby//:jekyll': target 'jekyll' not declared in package '' defined by /home/dravesr/.cache/bazel/_bazel_dravesr/8e8f77e8714e317f7e314b9271c50fc0/external/rules_ruby~override~ruby~ruby/BUILD (Tip: use `query "@ruby//:*"` to see all the targets in that package)
ERROR: /home/dravesr/src/BUILD:10:12: no such target '@@rules_ruby~override~ruby~ruby//:jekyll': target 'jekyll' not declared in package '' defined by /home/dravesr/.cache/bazel/_bazel_dravesr/8e8f77e8714e317f7e314b9271c50fc0/external/rules_ruby~override~ruby~ruby/BUILD (Tip: use `query "@ruby//:*"` to see all the targets in that package) and referenced by '//:site'
ERROR: Analysis of target '//:site' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.541s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
ERROR: Build failed. Not running target

if I instead first run bazel run @ruby//:bundle -- install, then the target is available. This seems like an issue with the hermeticity of the setup.

  • More generally, all of the exposed gems run from somewhere in the Bazel cache / runfiles directories. This makes one command affect another and also makes it harder to export files; e.g. if I want to run jekyll new with bazel run @ruby//:jekyll, I need to fish the output files out of bazel-out/
  • The exposed binaries are hard to depend on with how runfiles are managed. Maybe I'm not supposed to depend on them, but I set up a neat Bazel rule to wrap jekyll build and jekyll serve around @ruby//:jekyll, and to get the binary to run properly in a bazel run command I needed to wrap it in a script:
    executable = "export RUNFILES_DIR=$(readlink -f ../)\n"
    executable += ctx.attr.jekyll.files_to_run.executable.short_path + " " + " ".join(args) + " $@\n"
  • bazel run @ruby//:bundle -- install doesn't respect bundle_fetch settings and defaults to //:Gemfile. It also doesn't play nicely with path arguments, e.g. if I run bazel run @ruby//:bundle -- install --gemfile dir/Gemfile, I'll get:
[!] There was an error parsing `Gemfile`: No such file or directory @ rb_sysopen - /home/dravesr/.cache/bazel/_bazel_dravesr/8e8f77e8714e317f7e314b9271c50fc0/execroot/_main/bazel-out/k8-fastbuild/bin/external/rules_ruby~override~ruby~ruby/bundle.runfiles/_main/dir/Gemfile. Bundler cannot continue.

I snooped through the Bazel cache a bit and found that bundle.runfiles/_main is empty and Bundle defaults to reaching out to ../../../../../../../Gemfile (7 parents, back in the execroot/_main which is symlinked to the original workspace). Similarly the Gemfile.lock is ignored.

For some extra details, here's a sample from my MODULE.bazel:

bazel_dep(name = "rules_ruby", version = "0.6.0")
# TODO: Wait for 0.6.0 to be published and remove Git override
git_override(
    module_name = "rules_ruby",
    remote = "https://github.com/bazel-contrib/rules_ruby",
    commit = "81bf18ecf7de001a6aa5b46e420f3a9b98866ad5",
)

ruby = use_extension("@rules_ruby//ruby:extensions.bzl", "ruby")
ruby.toolchain(
    name = "ruby",
    version = "3.0.6",
)
use_repo(ruby, "ruby")
ruby.bundle_fetch(
    name = "bundle",
    gemfile_lock = "//:Gemfile.lock",
    gemfile = "//:Gemfile",
)
use_repo(ruby, "bundle", "ruby_toolchains")
register_toolchains("@ruby_toolchains//:all")
@p0deje
Copy link
Member

p0deje commented Jan 30, 2024

The way you use the ruleset is very specific and not supported at the moment. Let's start with

bazel run @ruby//:bundle -- install

This of course ignores all the configuration that happens inside the bundle_fetch rule which sets specific locations for BUNDLE_PATH and BUNDLE_GEMFILE. Without them, the bundler would install gems into Ruby itself - this is why @ruby//:jekyll appears in your binaries after running it. However, it's not how the ruleset should be used. You shouldn't depend on bazel run to produce anything - an expected operation would be to ensure Jekyll is in your Gemfile, and then:

bazel build @bundle
bazel run @bundle//bin:jekyll -- new

The latter binary would not have direct access to the working directory, so you might need to provide an exact path - that's for example how we do it for Rails:

bazel run @ruby//:rails -- new $(pwd)/new_app

Another option would be to define a BUILD rule that would put all files to the inputs/runfiles and then delegate to the binary. I haven't tested this but it should look like this:

rb_binary(
  name = "jekyll",
  srcs = ["my_jekill_files"],
  deps = ["@bundle"],
  main = ["@bundle//bin:jekyll"],
)
bazel run :jekyll -- build
bazel run :jekyll -- serve

@RyanDraves
Copy link
Contributor Author

Thanks for the quick reply! I see I'm using these targets wrong. I was hoping to a fully cached bazel build of my Jekyll site, but I haven't been able to get there using the @bundle//bin:jekyll target. I didn't find any success with rb_binary, but after a quick whitespace fix in #73, I tried switching my current implementation to use @bundle but can't seem to get past multiple copies of the gems being present.

I made a pretty simple Bazel rule that attempts to build the site with the @bundle//bin:jekyll binary:
https://github.com/RyanDraves/nlb/blob/7a5d9c7af6f475f8ceab4a89861e2454582cb010/rules_jekyll/jekyll.bzl#L2-L23

The target referencing this rule looks like:

jekyll_site(
    name = "site",
    srcs = [":sources"],
    jekyll = "@bundle//bin:jekyll",
    config = ":_config.yml",
)

However, I get a lot of errors and warning like the following:

/tmp/bazel-execroot/_main/bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/rules_ruby~override~ruby~bundle/vendor/bundle/ruby/3.0.0/gems/net-http-0.4.1/lib/net/http.rb:725: warning: already initialized constant Net::HTTP::VERSION
/tmp/bazel-working-directory/_main/bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/rules_ruby~override~ruby~bundle/bin/jekyll.runfiles/rules_ruby~override~ruby~bundle/vendor/bundle/ruby/3.0.0/gems/net-http-0.4.1/lib/net/http.rb:725: warning: previous definition of VERSION was here

Which ultimate leads to an error that fails the build:

  Conversion error: Jekyll::Converters::Markdown encountered an error while converting '_posts/2024-01-30-bazel-jekyll-site.md':
                    A parser with the name atx_header_gfm already exists!

Inspecting the build environment from bazel-out/k8-opt-exec-ST-13d3ddad9198/bin/external/rules_ruby~override~ruby~bundle, I can see that there is a vendor/bundle/ folder that has my installed gems in ruby/3.0.0/gems/, however there is another copy of them in bin/jekyll.runfiles/rules_ruby~override~ruby~bundle/vendor/bundle/ and yet another in bin/jekyll.runfiles/_main/external/rules_ruby~override~ruby~bundle/vendor/bundle/. I spent a while playing around with the runfiles tree that @bundle//bin:jekyll sets up, but I was unsuccessful in pruning the runfiles without Jekyll failing to find its dependent gems.

Am I on the right track for running a gem with some args and passed in source files in a build?

Reference tree -L 3:
image

@p0deje
Copy link
Member

p0deje commented Feb 1, 2024

I've tried playing around with Jekyll and I managed to build an example using the following setup:

  1. Setup modules.
# MODULE.bazel
bazel_dep(name = "aspect_bazel_lib", version = "2.3.0")
bazel_dep(name = "rules_ruby", version = "0.6.0")

ruby = use_extension("@rules_ruby//ruby:extensions.bzl", "ruby")
ruby.toolchain(
    name = "ruby",
    version = "3.2.2",
)
ruby.bundle_fetch(
    name = "bundle",
    gemfile = "//:Gemfile",
    gemfile_lock = "//:Gemfile.lock",
)
use_repo(ruby, "ruby", "bundle", "ruby_toolchains")
register_toolchains("@ruby_toolchains//:all")
  1. Setup BUILD targets:
# BUILD 
load("@aspect_bazel_lib//lib:run_binary.bzl", "run_binary")

run_binary(
    name = "site",
    srcs = glob([
        "_posts/*",
        "*.html",
        "*.markdown",
    ]) + ["_config.yml"],
    args = [
        "build",
        "-d",
        "$(GENDIR)/_site",
    ],
    execution_requirements = {"no-sandbox": "1"},
    out_dirs = [
        "_site",
    ],
    tool = "@bundle//bin:jekyll",
)
  1. Make sure Jekyll doesn't attempt to process toolchain:
# _config.yml
exclude:
  - bazel-out/
  - external/
  1. Build Jekyll website:
$ bazel build :site
...
Target //:site up-to-date:
  bazel-bin/_site

$ tree -L 3
bazel-bin/_site
├── 404.html
├── BUILD -> /Users/p0deje/my-awesome-site/BUILD
├── MODULE.bazel -> /Users/p0deje/my-awesome-site/MODULE.bazel
├── MODULE.bazel.lock -> /Users/p0deje/my-awesome-site/MODULE.bazel.lock
├── about
│   └── index.html
├── assets
│   ├── main.css
│   ├── main.css.map
│   └── minima-social-icons.svg
├── feed.xml
├── index.html
└── jekyll
    └── update
        └── 2024

5 directories, 10 files

I haven't tried to serve yet, but I believe you can attempt to do it in a similar way. Does this look reasonable?

@p0deje
Copy link
Member

p0deje commented Feb 1, 2024

Also, to make a build cache more efficient, you can define targets per post like this:

[
    run_binary(
        name = post[7:-9],
        srcs = [
            "404.html",
            "_config.yml",
            "about.markdown",
            "index.markdown",
            post,
        ],
        outs = ["_site/jekyll/update/{year}/{month}/{day}/{title}.html".format(
            day = post[7:-9].split("-", 3)[2],
            month = post[7:-9].split("-", 3)[1],
            title = post[7:-9].split("-", 3)[3],
            year = post[7:-9].split("-", 3)[0],
        )],
        args = [
            "build",
            "-d",
            "$(GENDIR)/_site",
        ],
        tool = "@bundle//bin:jekyll",
    )
    for post in glob(["_posts/*"])
]
$ tree _posts
_posts
├── 2024-01-01-welcome-to-jekyll2.markdown
└── 2024-02-01-welcome-to-jekyll.markdown

$ bazel query "//:*"
//:2024-01-01-welcome-to-jekyll2
//:2024-02-01-welcome-to-jekyll
...

$ bazel build ...
$ bazel-bin/_site
├── 404.html
├── BUILD -> /Users/p0deje/my-awesome-site/BUILD
├── MODULE.bazel -> /Users/p0deje/my-awesome-site/MODULE.bazel
├── MODULE.bazel.lock -> /Users/p0deje/my-awesome-site/MODULE.bazel.lock
├── about
│   └── index.html
├── assets
│   ├── main.css
│   ├── main.css.map
│   └── minima-social-icons.svg
├── feed.xml
├── index.html
└── jekyll
    └── update
        └── 2024
            ├── 01
            │   └── 01
            │       └── welcome-to-jekyll2.html
            └── 02
                └── 01
                    └── welcome-to-jekyll.html

UPD: This is not enough since index.html is taken from the last built target, but I think there should be a way to override it with a different generic target.

@RyanDraves
Copy link
Contributor Author

RyanDraves commented Feb 2, 2024

Wow, incredible! run_binary is exactly the kind of thing I was trying to set up; that really helped simplify things, thank you. execution_requirements = {"no-sandbox": "1"}, was also very helpful; I got my site building with your example but the errors will reproduce if I comment it out.

I haven't investigated finer-grained caching yet, but I did get jekyll serve working after translating the rest of my rule back into skylib helpers.

Here's my full example:

load("@aspect_bazel_lib//lib:run_binary.bzl", "run_binary")
load("@bazel_skylib//rules:write_file.bzl", "write_file")

filegroup(
    name = "sources",
    srcs = glob([
        "_posts/**/*",
        "_layouts/**/*",
    ]) + [
        "404.html",
        "about.markdown",
        "index.markdown",
    ],
)

run_binary(
    name = "site_build",
    srcs = [
        ":_config.yml",
        ":sources",
    ],
    args = [
        "build",
        "--source",
        package_name(),  # `package_name` shenigans seems to resolve the site not being at the repo root
        "--destination",
        "$(GENDIR)/{0}/_site".format(package_name()),
        "--config",
        "$(location :_config.yml)",
    ],
    env = {
        "LC_ALL": "C.UTF-8",
        "LANG": "en_US.UTF-8",
        "LANGUAGE": "en_US.UTF-8",
    },
    execution_requirements = {"no-sandbox": "1"},
    out_dirs = [
        "_site",
    ],
    tool = "@bundle//bin:jekyll",
)

write_file(
    name = "site_serve_file",
    out = "site_serve_file.sh",
    content = [
        "#!/bin/bash",
        # rules_ruby needs RUNFILES_DIR to be set
        "export RUNFILES_DIR=$(readlink -f ../)",
        "EXEC_ROOT=$(pwd)",
        "$EXEC_ROOT/$1 ${@:2}",
    ],
)

sh_binary(
    name = "site_serve",
    srcs = [
        ":site_serve_file",
    ],
    args = [
        "$(location @bundle//bin:jekyll)",
        "serve",
        "--destination",
        "{0}/_site".format(package_name()),
        "--skip-initial-build",
        "--config",
        "$(location :_config.yml)",
    ],
    data = [
        ":_config.yml",
        ":site_build",
        "@bundle//bin:jekyll",
    ],
)

I still find the RUNFILES_DIR behavior strange, but it looks like my previous solution for that translated fine.

I'm also seeing files not in my sources end up in the output site (like BUILD); I'm guessing that's a consequence of no-sandbox. Clearing my Bazel cache and adding them to the exclude Jekyll config fixes this, so not a big deal.

@p0deje
Copy link
Member

p0deje commented Feb 2, 2024

Great, any chance you could strip down your blog so I could put it to examples/jekyll and run on CI? This would let me ensure I don't accidentally break your setup.

@RyanDraves
Copy link
Contributor Author

Yep, opened #74

@p0deje
Copy link
Member

p0deje commented Feb 11, 2024

Is there anything else to do in regards to this issue or shall you close it @RyanDraves?

@p0deje
Copy link
Member

p0deje commented Feb 23, 2024

I'll consider this done for a moment.

@p0deje p0deje closed this as completed Feb 23, 2024
@RyanDraves
Copy link
Contributor Author

All set, thank you for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants