Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HTML5 sanitizer to serve huge payloads through Speedscope remote viewer #102

Merged
merged 3 commits into from
Nov 7, 2023

Conversation

davidstosik
Copy link
Contributor

@davidstosik davidstosik commented Oct 24, 2023

Fixes #92.

When we run the app profiler on Shopify core's boot, it is frequent that the generated profile JSON file exceeds 10MB.

Rails::Html::Sanitizer is unable to process such huge payloads (because libxml2 limits text nodes to 10MB by default?).

The rails-html-sanitizer gem introduced HTML5 parsing in its 1.6.0 version, which lifts the text node size limit.

This change intends to use the new HTML5 parser when it's available to sanitize the Speedscope remote viewer's HTML, allowing bigger payloads to be served successfully.

I wasn't sure how to implement such variable superclass (depending on the version of rails-html-sanitizer used), and I was even less sure of how I could implement some tests to avoid regressions. All suggestions will be appreciated! 🙏🏻

…viewer

When we run the app profiler on Shopify core's boot, it is frequent that
the generated profile JSON file exceeds 10MB.

`Rails::Html::Sanitizer` is unable to process such huge payloads
(because libxml2 limits text nodes to 10MB by default?).

The `rails-html-sanitizer` introduced HTML5 parsing in its 1.6.0
version, which lifts the text node size limit.

This change intends to use the new HTML5 parser when it's available
to sanitize the Speedscope remote viewer's HTML, allowing bigger
payloads to be served successfully.
rack (2.2.6.3)
rack-test (1.1.0)
rack (>= 1.0, < 3)
rails-dom-testing (2.0.3)
activesupport (>= 4.2.0)
nokogiri (>= 1.6)
rails-html-sanitizer (1.5.0)
loofah (~> 2.19, >= 2.19.1)
rails-html-sanitizer (1.6.0)
Copy link
Contributor Author

@davidstosik davidstosik Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was necessary to upgrade rails-html-sanitizer to 1.6.0 in Gemfile.lock in order to make it available when running tests.

Note that this is not a gem dependency anyway, and the gem is still meant to work with rails-html-sanitizer 1.5.0.

Ideally, I'd want a way to run the same test suite with different versions of rails-html-sanitizer (one before and one after 1.6.0, which is the version that originally defines Rails::HTML – all caps – and implements the HTML5 sanitizer), but I had no idea how I could achieve that. (Also not sure this would be worth the effort.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important to support RHS 1.5.x? I'd prefer if you just updated the dependency.

@@ -60,6 +60,21 @@ class MiddlewareTest < TestCase
assert_match(%r(<script type="text/javascript">), html)
end

test "#call show can serve huge payloads" do
frames = { "1" => { name: "a" * 1e7 } }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10MB, or about 10 million (1e7) characters, is the size limit the old HTML4 sanitizer wasn't able to exceed.
Reverting rails-html-sanitizer to 1.5.0 in Gemfile.lock, one can run this test and confirm that it fails.

Comment on lines +73 to +74
html[-200..-1],
message: "The generated HTML was incomplete"
Copy link
Contributor Author

@davidstosik davidstosik Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm using a substring because html is huge (10MB+), and this assertion would flood the terminal if it fails.
I also added an error message to clarify what is likely happening.

html = html.first

assert_match(
%r{'Flamegraph for .*'\);\n</script>},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the end of the "rendered" payload, as seen here:

iframe.setAttribute('src', '/app_profiler/viewer/index.html#profileURL=' + objUrl + '&title=' + 'Flamegraph for #{name}');
</script>

If this was rendered, then there likely was no problem rendering the whole text node.

@@ -6,7 +6,13 @@ module AppProfiler
module Viewer
class SpeedscopeRemoteViewer < BaseViewer
class BaseMiddleware
class Sanitizer < Rails::Html::SafeListSanitizer
sanitizer_superclass = if Rails::Html::Sanitizer::VERSION >= "1.6.0" && Rails::HTML::Sanitizer.html5_support?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rails::Html::Sanitizer::VERSION >= "1.6.0" was necessary because HTML (all caps) and #html5_support? are not defined in older versions of the gem.
This line could also be:

Suggested change
sanitizer_superclass = if Rails::Html::Sanitizer::VERSION >= "1.6.0" && Rails::HTML::Sanitizer.html5_support?
sanitizer_superclass = if Rails::Html.respond_to?(:html5_support?) && Rails::Html::Sanitizer.html5_support?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can/should use Rails::HTML::Sanitizer.best_supported_vendor for this if you pin to RHS >= 1.6.0

sanitizer_superclass = if Rails::Html::Sanitizer::VERSION >= "1.6.0" && Rails::HTML::Sanitizer.html5_support?
Rails::HTML5::SafeListSanitizer
else
Rails::Html::SafeListSanitizer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rails::Html::SafeListSanitizer is the only sanitizer available in old versions of rails-html-sanitizer, and equivalent to Rails::HTML4::SafeListSanitizer in 1.6.0 and later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments on my recommendation to use 1.6.0

Copy link

@flavorjones flavorjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to gently push on the assumption that we need to support Rails::HTML::Sanitizer v1.5.x. v1.6.0 should be backwards-compatible and supports Ruby 2.7 and later.

@@ -6,7 +6,13 @@ module AppProfiler
module Viewer
class SpeedscopeRemoteViewer < BaseViewer
class BaseMiddleware
class Sanitizer < Rails::Html::SafeListSanitizer
sanitizer_superclass = if Rails::Html::Sanitizer::VERSION >= "1.6.0" && Rails::HTML::Sanitizer.html5_support?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can/should use Rails::HTML::Sanitizer.best_supported_vendor for this if you pin to RHS >= 1.6.0

rack (2.2.6.3)
rack-test (1.1.0)
rack (>= 1.0, < 3)
rails-dom-testing (2.0.3)
activesupport (>= 4.2.0)
nokogiri (>= 1.6)
rails-html-sanitizer (1.5.0)
loofah (~> 2.19, >= 2.19.1)
rails-html-sanitizer (1.6.0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it important to support RHS 1.5.x? I'd prefer if you just updated the dependency.

sanitizer_superclass = if Rails::Html::Sanitizer::VERSION >= "1.6.0" && Rails::HTML::Sanitizer.html5_support?
Rails::HTML5::SafeListSanitizer
else
Rails::Html::SafeListSanitizer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments on my recommendation to use 1.6.0

@davidstosik
Copy link
Contributor Author

davidstosik commented Oct 25, 2023

@flavorjones Thank you for your comments!
I'm unsure, but it looks like rails-html-sanitizer was not declared as a direct dependency of the app_profiler Gem, so I made the assumption that the gem was expected to work in any Rails application it's dropped in (including older Rails apps that might still be using rails-html-sanitizer 1.5.x).

I also guess we might not want to declare rails-html-sanitizer as a dependency in the gemspec because this gem could totally profile an app and produce a profile JSON file without it?

If we're happy to stop supporting older versions of rails-html-sanitizer, then I'll be happy to simplify the code, though I'm unsure how we'd gracefully ask users to update rails-html-sanitizer. 🤔

For reference, Rails 7.0.8 has a pretty loose version constraint on rails-html-sanitizer:

  s.add_dependency "rails-html-sanitizer", "~> 1.0", ">= 1.2.0"

@flavorjones
Copy link

@davidstosik I understand your points, and you're right about the historical expectation. But what I'm suggesting is that we don't care anymore about supporting old versions of RHS (because the new version is backwards compatible) and that the right thing to do is to declare a dependency on RHS >= 1.6.0 and simplify this code.

@@ -1,12 +1,13 @@
# frozen_string_literal: true

gem "rails-html-sanitizer", ">= 1.6.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flavorjones To enforce the version dependency, I had to add a peer dependency here. (There is no dependency to that gem in the gemspec.)

@davidstosik
Copy link
Contributor Author

@shioyama: I paired with @gmcgibbon this morning and we confirmed that this change also fixes the double escaping issue you mentioned to me yesterday! 🎉

Before After
image image

@davidstosik davidstosik merged commit 641f562 into main Nov 7, 2023
5 checks passed
@davidstosik davidstosik deleted the sto/serve-huge-payloads branch November 7, 2023 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speedscope viewer fails for big payloads
3 participants