Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gauge primitive RubyVM::YJIT.runtime_stats, if YJIT is enabled #2711

Merged
merged 2 commits into from
Jun 21, 2023

Conversation

HeyNonster
Copy link
Contributor

@HeyNonster HeyNonster commented Mar 22, 2023

What does this PR do?

Adds YJIT runtime stats to the Ruby Runtime Metrics.

Motivation
YJIT was deemed production ready in Ruby 3.2. YJIT includes its own runtime stats, which would be useful to gauge.

Of particular use is the OBJECT_SHAPE_COUNT metric. Ruby has a ceiling on the total number of unique shapes but we currently don't have a convenient way of keeping track of those figures to determine if optimizations need to be made.

Additional Notes

The comments in lib/datadog/core/environment/yjit.rb I mostly copied directly from the YJIT source code.

Primitive Stats vs Extended Stats

This PR tracks YJIT stats when running Ruby with the --yjit flag, but currently only the primitive stats (not the additional stats that are available when the --yjit-stats flag is also set) are gauged.

It's limited to the primitive stats because YJIT, by default, has the following runtime stats:

$ RUBYOPT='--yjit' ruby -e 'pp RubyVM::YJIT.runtime_stats' =>
{
  :inline_code_size=>0,
  :outlined_code_size=>116,
  :freed_page_count=>0,
  :freed_code_size=>0,
  :live_page_count=>1,
  :code_gc_count=>0,
  :code_region_size=>16384,
  :object_shape_count=>236
}

However, with the --yjit-stats flag, it returns 325 separate stats:

$ RUBYOPT="--yjit --yjit-stats" ruby -e 'pp RubyVM::YJIT.runtime_stats.keys.length' => 325

And not all of them are numeric:

$ RUBYOPT="--yjit --yjit-stats" ruby -e 'pp RubyVM::YJIT.runtime_stats.values.select { |value| !value.kind_of?(Numeric) }.any?' => true

How to test the change?

# Using Ruby 3.2+
RUBYOPT="--yjit" bundle exec rake spec:yjit

bundle exec rake spec:yjit # shows the YJIT specs get skipped

@github-actions github-actions bot added the core Involves Datadog core libraries label Mar 22, 2023
@HeyNonster HeyNonster marked this pull request as ready for review March 22, 2023 13:14
@HeyNonster HeyNonster requested a review from a team March 22, 2023 13:14
Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great contribution, thank you!

For background information, for each runtime metric, we have to provide a human readable title and a short description, as well as choosing if they belong to a the same graph or different graphs:

Screenshot 2023-03-22 at 2 30 11 PM

For example, the "Slots" graph above contains all these 4 metrics:

runtime.ruby.gc.heap_available_slots
runtime.ruby.gc.heap_live_slots
runtime.ruby.gc.heap_marked_slots
runtime.ruby.gc.heap_free_slots

This presentation work is done by us internally after we merge the Ruby Tracer changes.

In order to provide this information in the UI, we have to know what runtime metrics we are collecting, and be able to articulate what they are and if they should be directly correlated with other runtime metrics.

For this PR, I suggest declaring explicitly the new runtime metrics that are being created, instead of iterating over the results of RubyVM::YJIT.runtime_stats.

@@ -78,6 +79,9 @@ def flush
)
end
end

# Only on Ruby >= 3.2
try_flush { yjit_metrics.each { |metric, value| gauge(metric, value) } if Core::Environment::YJIT.available? }
Copy link
Member

@marcotc marcotc Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should explicitly declare all the values that it reports, which today would be:

[:inline_code_size,
 :outlined_code_size,
 :freed_page_count,
 :freed_code_size,
 :live_page_count,
 :code_gc_count,
 :code_region_size,
 :object_shape_count]

Copy link
Contributor Author

@HeyNonster HeyNonster Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the iteration to use explicit methods instead. The comments for each method I lifted directly from the YJIT source code.


# Only on Ruby >= 3.2
try_flush do
if Core::Environment::YJIT.available?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this if check outside of try_flush to be consistent with line 120?

https://github.com/DataDog/dd-trace-rb/pull/2711/files#diff-7818d2faf59ace61f4f77f9a9dda171e178bd43fe6b61cb3793cf92ac2e4d8f3R120

Why don't we group this logic under a single if check?

try_flush do
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_CODE_GC_COUNT,
    Core::Environment::YJIT.code_gc_count
  )
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_CODE_REGION_SIZE,
    Core::Environment::YJIT.code_region_size
  )
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_FREED_CODE_SIZE,
    Core::Environment::YJIT.freed_code_size
  )
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_FREED_PAGE_COUNT,
    Core::Environment::YJIT.freed_page_count
  )
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_INLINE_CODE_SIZE,
    Core::Environment::YJIT.inline_code_size
  )
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_LIVE_PAGE_COUNT,
    Core::Environment::YJIT.live_page_count
  )
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_OBJECT_SHAPE_COUNT,
    Core::Environment::YJIT.object_shape_count
  )
  gauge_if_not_nil(
    Core::Runtime::Ext::Metrics::METRIC_YJIT_OUTLINED_CODE_SIZE,
    Core::Environment::YJIT.outlined_code_size
  )

  yjit_metrics.each { |metric, value| gauge(metric, value)
end if Core::Environment::YJIT.available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GustavoCaso I removed line 120 because it was mistakenly left over from a previous attempt at this.

I opted not to move the conditional check inline with the end to keep the style consistent with the rest of this file e.g. lines 57-81:

          try_flush do
            if Core::Environment::VMCache.available?
              # Only on Ruby < 3.2
              gauge_if_not_nil(
                Core::Runtime::Ext::Metrics::METRIC_GLOBAL_CONSTANT_STATE,
                Core::Environment::VMCache.global_constant_state
              )
              ...

Copy link
Member

@GustavoCaso GustavoCaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GustavoCaso
Copy link
Member

@HeyNonster

Thanks for the changes. I noticed that rubocop is failing. I usually run bundle exec rubocop -a locally to try fixing those. Sometimes it has to be fixed manually since the tool doesn't know how to fix it.

Feel free to let me know if you need any help with that.

Another thing, if you don't mind rebasing master into your branch? That would fix all the broken CI steps 🔨

cc @marcotc if you want to have one last look before merging

@HeyNonster
Copy link
Contributor Author

HeyNonster commented Jun 2, 2023

@Gustavo

I've rebased onto the latest master! :)

As for Rubocop, the one complaint was that the #flush method is now too long. I'm not sure how y'all would prefer to handle that. As one possible solution, I pushed a fixup! commit that moves the YJIT stats flushing into a separate #dump_yjit_stats private method that #flush calls.

If you like that approach, I can squash the fixup! commit; if not, I can drop it and try something else 😄

In the fixup! I also removed !::RubyVM::YJIT.stats_enabled? check from the available? method. stats_enabled? returned true if the extended YJIT stats were enabled. Originally, this PR was iterating through every YJIT runtime stat (which would have been unwieldy with the 325 extended stats). Now that we're explicitly declaring which stats we want to record, instead of iterating over each of them, we can allow the extended stats to be on without causing problems.

@GustavoCaso
Copy link
Member

@HeyNonster I like the approach from the last commit 😄

Also, sorry for the delay with the reply.

@marcotc are we good to go and merge it?

@HeyNonster
Copy link
Contributor Author

HeyNonster commented Jun 16, 2023

@HeyNonster I like the approach from the last commit 😄

Also, sorry for the delay with the reply.

@marcotc are we good to go and merge it?

Thanks, @GustavoCaso, I'll keep an eye on this and squash the fixup when y'all give me the final greenlight! 🙂

@HeyNonster
Copy link
Contributor Author

I've rebased this PR to resolve one conflict on main. :)

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much, @HeyNonster! 🙇

YJIT was deemed production ready in Ruby 3.2. YJIT includes its own
[runtime
stats](https://github.com/ruby/ruby/blob/master/doc/yjit/yjit.md#other-statistics)
which would be useful to guage.

YJIT, by default, has the following runtime stats:

```
$ RUBYOPT='--yjit' ruby -e 'pp RubyVM::YJIT.runtime_stats' =>
{
  :inline_code_size=>0,
  :outlined_code_size=>116,
  :freed_page_count=>0,
  :freed_code_size=>0,
  :live_page_count=>1,
  :code_gc_count=>0,
  :code_region_size=>16384,
  :object_shape_count=>236
}

With the `--yjit-stats` flag, it returns 325 separate stats:

```
$ RUBYOPT="--yjit --yjit-stats" ruby -e 'pp RubyVM::YJIT.runtime_stats.keys.length' =>
325
```

And not all of them are numeric:

```
$ RUBYOPT="--yjit --yjit-stats" ruby -e 'pp
RubyVM::YJIT.runtime_stats.keys.select { |k| !kind_of?(Numeric) }.any?' =>
true
```

This adds the default YJIT stats, but not the extended ones.
@HeyNonster
Copy link
Contributor Author

Thank you so much, @HeyNonster! 🙇

Thank you, @marcotc and @GustavoCaso! I've squashed the fixup! and rebased the branch onto main. :)

@marcotc marcotc merged commit 70d472f into DataDog:master Jun 21, 2023
@github-actions github-actions bot added this to the 1.13.0 milestone Jun 21, 2023
HeyNonster added a commit to HeyNonster/dd-trace-rb that referenced this pull request Jul 12, 2023
In DataDog#2711, I forgot to include an update to the CHANGELOG.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants