Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support 'chevron' library for templating as jinja alternative #11617

Closed
wants to merge 2 commits into from

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Nov 9, 2020

SUMMARY

chevron is a python implementation of mustache.js, and seems much safer by scope (project tagline is "Logic-less templates" !) than jinja2 could ever be.

While the library is not super active lately, it appears to be feature complete and is easy to review from a security standpoint as it's a few short-ish modules. Overall the whole library is <1k lines.

This PR adds support for the chevron library behind a feature flag.

SHORTCOMINGS

  • one shortcoming is the html escaping that we have to work around using triple curlies ie:{{{ ... }}}. I submitted a PR to enable turning this off in our context here allow bypassing html_escaping noahmorrison/chevron#81
  • accessing a position in an array seems impossible in the release using the undocumented {{mylist.0}} mustache feature, but works in chevron's master, highlighting the fact that the lib hasn't released to PyPI in a long time
  • given the security risks here, we should do a full audit of chevron's [small] codebase, and pin the lib. Bumping the lib should force a deep analysis of the changelog/changeset to make sure there are no security regressions

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Screen Shot 2020-11-08 at 10 10 55 PM

(chevron)[https://github.com/noahmorrison/chevron]
is a python implementation of mustache.js, and seems much safer
by scope than jinja could ever be.

While the library is not super active lately, it appears to be
feature complete and is easy to review from a security standpoint as
it's a few short-ish modules.

This PR adds support for the chevron library behind a feature flag.
@codecov-io
Copy link

codecov-io commented Nov 9, 2020

Codecov Report

Merging #11617 (ed78947) into master (92a9acd) will increase coverage by 0.42%.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11617      +/-   ##
==========================================
+ Coverage   62.26%   62.69%   +0.42%     
==========================================
  Files         873      873              
  Lines       42238    43307    +1069     
  Branches     3959     4079     +120     
==========================================
+ Hits        26301    27151     +850     
- Misses      15757    15976     +219     
  Partials      180      180              
Flag Coverage Δ
javascript 62.94% <ø> (+0.09%) ⬆️
python 62.51% <60.00%> (+0.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/config.py 90.11% <ø> (ø)
superset/jinja_context.py 79.83% <60.00%> (-1.99%) ⬇️
...dashboard/components/FiltersBadge/DetailsPanel.tsx 17.80% <0.00%> (-0.80%) ⬇️
...set-frontend/src/views/CRUD/welcome/EmptyState.tsx 87.14% <0.00%> (-0.67%) ⬇️
...ntend/src/views/CRUD/annotation/AnnotationList.tsx 81.74% <0.00%> (-0.40%) ⬇️
...ontend/src/components/ListViewCard/ImageLoader.tsx 86.20% <0.00%> (-0.16%) ⬇️
superset-frontend/src/components/Menu/SubMenu.tsx 100.00% <0.00%> (ø)
...-frontend/src/common/components/common.stories.tsx 0.00% <0.00%> (ø)
.../src/components/dataViewCommon/TableCollection.tsx 100.00% <0.00%> (ø)
superset/db_engine_specs/kylin.py 94.73% <0.00%> (+0.61%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92a9acd...ed78947. Read the comment docs.

@@ -338,6 +347,8 @@ def get_template_processor(
template_processor = template_processors.get(
database.backend, BaseTemplateProcessor
)
elif feature_flag_manager.is_feature_enabled("CHEVRON_TEMPLATE_PROCESSING"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to nest this conditional under if ENABLE_TEMPLATE_PROCESSING, sort of as a sub option after the user turns tempate processing on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rename ENABLE_TEMPLATE_PROCESSING to JINJA_TEMPLATE_PROCESSING instead (albeit a breaking change)?

Copy link
Member Author

@mistercrunch mistercrunch Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, clearly there's some cleanup to be done here, but also need to maintain backward compatibility. I think that's the easy portion of this PR, the real question is around the viability of chevron as a secure templating backend.

As a solution to this particular topic though I'd like to add JINJA_TEMPLATE_PROCESSING as True by default and nest the condition under ENABLE_TEMPLATE_PROCESSING.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose calling the feature flag TEMPLATE_PROCESSING, and then adding a config parameter TEMPLATE_PROCESSOR: Optional[TemplateProcessorType] = TemplateProcessorType.CHEVRON. Similar to how other feature flagged options do (see e.g. THUMBNAIL_SELENIUM_USER, THUMBNAIL_CACHE_CONFIG etc).

@robdiciuccio
Copy link
Member

This looks good upon cursory review. I'll look into the lib a bit more deeply.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and no harm in supporting multiple template processors. I'm slightly concerned by the staleness of the library, but that's a different topic (not a problem before chevron is potentially enabled by default).

@@ -338,6 +347,8 @@ def get_template_processor(
template_processor = template_processors.get(
database.backend, BaseTemplateProcessor
)
elif feature_flag_manager.is_feature_enabled("CHEVRON_TEMPLATE_PROCESSING"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose calling the feature flag TEMPLATE_PROCESSING, and then adding a config parameter TEMPLATE_PROCESSOR: Optional[TemplateProcessorType] = TemplateProcessorType.CHEVRON. Similar to how other feature flagged options do (see e.g. THUMBNAIL_SELENIUM_USER, THUMBNAIL_CACHE_CONFIG etc).

@mistercrunch
Copy link
Member Author

FYI, offered help to maintain chevron here: noahmorrison/chevron#82

@ktmud
Copy link
Member

ktmud commented Nov 10, 2020

On another note, have you looked into JinjaSQL? I've used it in some Python scripts before, but not sure whether it is safer than the raw jinja.

@robdiciuccio
Copy link
Member

@ktmud Thanks for the suggestion! It seems with JinjaSQL "You can use the full power of Jinja templates," which is what we're trying to move away from (the ability to execute arbitrary python code in user-generated input).

@mistercrunch
Copy link
Member Author

@ktmud didn't spend much time but that lib seems to be doing very little on top of jinja as highlighted in the single, very simple <200 LoC module here:
https://github.com/hashedin/jinjasql/blob/master/jinjasql/core.py

@pull-request-size pull-request-size bot added size/M and removed size/S labels Nov 11, 2020
@ktmud
Copy link
Member

ktmud commented Nov 11, 2020

It seems the only thing it does it to parameterize SQL queries, which at least reduces the risk of SQL injections (if there's ever such risk within Superset).

@ktmud
Copy link
Member

ktmud commented Nov 12, 2020

Not sure if we really want to move away from Jinja. Other than the unintentional exposure of unsafe objects, what are other potential risks with Jinja?

One thing we could to is to be more careful with what we expose to Jinja context---and the first thing we should probably do is to always expose raw functions instead of class objects (i.e., replace hive.latest_partition and presto.latest_partition with latest_partition).

@robdiciuccio
Copy link
Member

One concern with chevron is that it allows loading of "partials" from the filesystem. ex:

chevron.render('Config: {{> superset_config }}', {}, '.', 'py')
>>> 'Config: import os\nfrom superset.stats_logger import DummyStatsLogger\nfrom cachelib.file...'

While this can be mitigated since we control how chevron.render is called, it does not appear that it can be disabled.

@robdiciuccio
Copy link
Member

@ktmud The intention is the prevent security issues like CVE-2020-13948. We should absolutely move away from exposing class objects to the Jinja context, but this doesn't prevent access to Python internals. We're relying heavily on Jijna's Sandboxing to prevent RCE, which feels very brittle.

@ktmud
Copy link
Member

ktmud commented Nov 12, 2020

@ktmud The intention is the prevent security issues like CVE-2020-13948. We should absolutely move away from exposing class objects to the Jinja context, but this doesn't prevent access to Python internals. We're relying heavily on Jijna's Sandboxing to prevent RCE, which feels very brittle.

I think CVE-2020-13948 has the same root cause as the other security bug we had, which is exposing class objects or modules that allow unsafe chained access to things we don't want to expose. As long as we stop doing that, is there anything else we should worry about?

A lot of OSS use Jinja as their template engine, including Airflow and dbt, is there anything we can learn from them to make Jinja more secure?

@robdiciuccio
Copy link
Member

@ktmud The difference here is that we're processing user-generated input, while Airflow and dbt are operating on templates defined in code. Unclear how the hosted versions of these apps (Astronomer and dbt Cloud) handle Jinja security.

@ktmud
Copy link
Member

ktmud commented Nov 12, 2020

To be fair dbt is also very much user-generated input. Anyone can login to the dbt UI and write templated code in the IDE: https://docs.getdbt.com/docs/running-a-dbt-project/using-the-dbt-ide

@mistercrunch
Copy link
Member Author

Looks like jinja changed their narrative around the sandboxed mode recently (?). I'm pretty sure they used to advise against using Jinja to execute untrusted strings not that long ago.

Screen Shot 2020-11-12 at 1 40 05 PM

@mistercrunch
Copy link
Member Author

Out of curiosity we could look at DBT's code to see how they secure their sandbox.

@mistercrunch
Copy link
Member Author

mistercrunch commented Nov 12, 2020

Clearly the compatibility with DBT AND Airflow through Jinja is a really important factor for many. People want to be able to iterate and go back and forth between these tools.

@robdiciuccio
Copy link
Member

I'll take a stab at making the jinja implementation more secure in a separate branch.

@robdiciuccio
Copy link
Member

Attempt at making Jinja more secure: #11704

@mistercrunch
Copy link
Member Author

Superseded by #11704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants