Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TruffleRuby head to CI #1246

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

petergoldstein
Copy link
Contributor

@petergoldstein petergoldstein commented Feb 9, 2022

This is a clean version of #1189 against current master head. Thanks to @eregon for the original submission.

I have two concerns with this PR:

  1. Truffleruby appears to produce a different hash than either MRI ruby or JRuby for the manual
  2. Truffleruby takes ~7 minutes to run specs that take ~10 seconds in MRI. That seems excessive.

Posting this PR for discussion. @eregon @pointlessone @gettalong any thoughts are welcome.

@pointlessone
Copy link
Member

I would rather not add TruffleRuby unless I have commitment from someone to support it. TruffleRuby is an exciting development but I don't know if it has a solid userbase to warrant the effort. I don't know anyone who uses it in production and the CI overhead is substantial.

@petergoldstein
Copy link
Contributor Author

@pointlessone I think that's reasonable. @eregon are you interested in signing up for that? Do you have any idea why the tests are so slow for Truffleruby?

@eregon
Copy link

eregon commented Feb 10, 2022

The TruffleRuby team will monitor the CI status via https://github.com/eregon/truffleruby-gem-tracker and help if it fails (like I mentioned on #1189 (comment)).

I don't know why the tests are slower. It'd be interesting to run RSpec in a mode where it measures each test's time, to see if it's a particular test being slower. Could also produce a flamegraph by passing --cpusampler=flamegraph to TruffleRuby.
@bjfish Maybe you can help and look into this?

@eregon eregon mentioned this pull request Feb 10, 2022
@johnnyshields
Copy link
Contributor

If there are concerns about support, TruffleRuby could still be added to CI as experimental (failures allowed for now)

@pointlessone
Copy link
Member

  1. GitHub Actions actions is not great for optional build steps. We can't have a green build with some steps failing. It makes the whole contribution UX much worse.
  2. Waiting for the slow non-essential step is suboptimal. GH is adamant about waiting for the build completion to switch PR UI.

In short, adding TruffleRuby doesn't benefit Prawn much.

@eregon
Copy link

eregon commented Feb 14, 2022

If the CI time is a concern (it's 6min30s in this run, still sounds fairly fast):

  • We could try to make it faster by investigating and e.g. using --engine.Mode=latency
  • It could be a daily or weekly job instead, so then it doesn't affect CI time for PRs

@petergoldstein
Copy link
Contributor Author

@eregon The big concern is the comparative time. Note that the truffleruby "Run tests" section takes over 5 minutes to run. As compared to ~15s for each of the MRI rubies. That's especially odd given that one of Truffle's main selling points is speed.

The scheduled job is a good suggestion though.

@eregon
Copy link

eregon commented Feb 14, 2022

It's expected more optimizing Ruby JITs run test suites slower (e.g., similar on JRuby), because tests constantly run different code and so rarely run any existing & JIT-compiled code. And the interpreter for a Ruby with a JIT needs to collect profiling data which costs some time, as well as the JIT consuming CPU time (especially on machines with few cores).
There are ways to improve this, including persistent the JITed code, and using TRUFFLERUBYOPT=--engine.Mode=latency which optimizes startup/warmup over peak performance.
The slowdown shouldn't be that big though, and @bjfish is currently looking into that.

I think the scheduled job is a good solution, especially for truffleruby-head, it doesn't get in the way but still regularly produces CI results and we can easily find out if it got red and regressed somehow.

@petergoldstein
Copy link
Contributor Author

@eregon So I can understand that, but as you note it really shouldn't be that big. A factor of 20 is pretty substantial. Would be interested to hear the cause. I've been unable to get truffleruby running locally, so I haven't been able to narrow down the slowdown.

@pointlessone If it sounds good to you, I can change this PR to create a scheduled CI job (daliy? weekly?) that includes truffleruby-head. Let me know.

@eregon
Copy link

eregon commented Feb 15, 2022

From the flamegraph it seems most of time (31.2% using Search) is spent in Prawn::Images::PNG#split_image_data and specifically in StringIO#read and String#byteslice called from there. The main cost is probably scanning the substring to compute the correct coderange (7-bit or valid for BINARY strings).
This should improve when TruffleRuby has adopted TruffleString (WIP).
It also indicates that in this case it might be better to compute the coderange lazily, since it's one of the rare cases where the coderange doesn't seem needed (for color.write data.read(color_bytes)).
From a wider perspective it's an unfortunate case of Ruby not having a real "byte array" structure, and String being a relatively poor fit for it.
Let's track this as a TruffleRuby issue: oracle/truffleruby#2599

@pointlessone
Copy link
Member

@petergoldstein We already have a weekly schedule for CI. I'm fine including TruffleRuby in it. I'd skip it in PR builds until its run time would be comparable to at least JRuby.

@eregon Thank you for looking into it. Feel free to propose improvement that might also benefit other implementations too.

@petergoldstein
Copy link
Contributor Author

@pointlessone This splits out truffleruby into a scheduled (and also manual, upon request) workflow. There's some duplication because we're reusing the push/pull_request CI config for the normal scheduled one. So I simply added a truffleruby only config in another file.

Comment on lines +7 to +11
{
'ruby' => '2c0279e0bff2a9120494a52aa46216c1871902b5e66a3537bd4d3cbd66db0096b43b6e1ae0e4e189b561c4db9fa1afacb6c41f260e3aaf942faae2fee352d35b',
'jruby' => '51baf6440907e9e38d22f50deafa91572aec1174e621c044ae077cfe3d4361982a505dae5f013dd06f64f38cb9b3a38d5a3f8f0903849591774e298a3c91d39a',
'truffleruby' => '9cab8cd0736f239058ce8dc4b2915fe014e1461d434de456d9fbf140c2b69553f39accbe9acb38b974efd1e7d3e5dfddf77e99e0bdba56f43a8746a0e5fcf37b'
}.freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wander why you changed it to a hash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pointlessone To satisfy Rubocop

with:
ruby-version: ${{ matrix.ruby }}
- name: Gems cache
uses: actions/cache@v2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not just bundler-cache: true? I might have forgotten.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah to force the path to be ~/gems and not vendor/bundle maybe?

gems-${{ matrix.ruby }}-
- name: Install dependencies
run: |
gem install bundler
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant, setup-ruby already install Bundler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware of that, but there was a desire by the maintainer to be consistent with ci.yml. And as I recall a previous suggestion for removing that line from ci.yml was not accepted.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK then fair enough

Copy link

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@eregon
Copy link

eregon commented Dec 12, 2022

I took another look at the performance issue (oracle/truffleruby#2599) and I think I've found the culprit now. I'll try to fix it soon.

@petergoldstein Could you rebase this PR and address #1246 (comment) ?

The test suite runs in 1min24s for me locally so I feel it's more than fast enough for a scheduled CI job.
So maybe we can merge this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants