Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: improve mask/unmask encrypted_extra #29943

Merged
merged 3 commits into from
Aug 22, 2024

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Aug 14, 2024

SUMMARY

When the user edits a database, the password in the SQLAlchemy URI is masked (replaced with XXXXXXXXXX), so that the sensitive information is not transmitted down the wire unneedlessly. If the user saves the database edit, the mask is replaced with the original password, unless it has been changed.

The encrypted_extra field in a database may also contain sensitive fields. Instead of masking the whole field, in #21248 we introduced logic to mask only selected fields, so we can provide a better editing experience for users. For BigQuery, eg, the credentials look like this:

{
  "type": "service_account",
  "project_id": "your project id",
  "private_key_id": "your private key id",
  "private_key": "-----BEGIN PRIVATE KEY-----...-----END PRIVATE KEY-----\n",
  ...
}

When editing, it will show in database modal as:

{
  "type": "service_account",
  "project_id": "your project id",
  "private_key_id": "your private key id",
  "private_key": "XXXXXXXXXX",
  ...
}

The user can then selectively change some of the fields, and when the database is edited the private_key will be replaced with the old value if it's equal to XXXXXXXXXX.

Currently this functionality is only implemented for BigQuery and GSheets, but it should be implemented in all databases that store sensitive information under encrypted_extra. To simplify the process and remove duplicate code, in this PR I updated the methods in the base class to work with JSON paths. For BigQuery, for example, all that is needed now is:

encrypted_extra_sensitive_fields = ["$.credentials_info.private_key"]

This PR also adds the logic for Snowflake.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Confirmed existing tests work, added a few more tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot bot added data:connect:googlebigquery Related to BigQuery data:connect:googlesheets Related to Google Sheets labels Aug 14, 2024
@betodealmeida betodealmeida force-pushed the simplify-encrypted-extra-masking branch from d4b73ad to e2a8519 Compare August 14, 2024 14:22
@betodealmeida betodealmeida requested a review from eschutho August 14, 2024 14:28
@betodealmeida betodealmeida force-pushed the simplify-encrypted-extra-masking branch from 2ca5e83 to 0328c48 Compare August 14, 2024 14:39
@michael-s-molina michael-s-molina self-requested a review August 14, 2024 19:20
@michael-s-molina michael-s-molina added review:checkpoint Last PR reviewed during the daily review standup and removed review:checkpoint Last PR reviewed during the daily review standup labels Aug 14, 2024
Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few optional optional comments for consideration, otherwise LGTM

# list of JSON path to fields in `encrypted_extra` that should be masked when the
# database is edited
# pylint: disable=invalid-name
encrypted_extra_sensitive_fields: list[str] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] usually we put class-level attributes at the top above all the methods, but I can see how it's nice to have it there right by the method that uses this one... Personally I'd expect this being at the top

Copy link
Member

@mistercrunch mistercrunch Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] another approach here would be to assume all keys in encrypted_extra are sensitive and allow-list the ones that we know are not sensitive. Seems it would be more cautious overall if it's not more work (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point about the default behavior. We could have the attribute in the base class default to:

["$"]

This way everything would be masked.

@@ -45,6 +45,7 @@
from flask import current_app, g, url_for
from flask_appbuilder.security.sqla.models import User
from flask_babel import gettext as __, lazy_gettext as _
from jsonpath_ng import parse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] jsonpath_ng.parse could be wrapped in utils/json.py, maybe something like def redact_sensitive(json, sensitive_paths)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good point! I've had issues with JSONPath libraries in the past, wrapping them in a helper method will help in the future.

@rusackas
Copy link
Member

rusackas commented Aug 22, 2024

Giphy needs more "merge" GIFs.

@betodealmeida betodealmeida merged commit 4b59e42 into master Aug 22, 2024
33 of 34 checks passed
@mistercrunch
Copy link
Member

mp

@mistercrunch
Copy link
Member

... may cause seizures ...

@sadpandajoe sadpandajoe added the v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch label Aug 27, 2024
sadpandajoe pushed a commit that referenced this pull request Aug 27, 2024
@rusackas rusackas deleted the simplify-encrypted-extra-masking branch September 27, 2024 20:54
@github-actions github-actions bot added 🍒 4.1.0 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels data:connect:googlebigquery Related to BigQuery data:connect:googlesheets Related to Google Sheets preset-io size/XL v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch 🍒 4.1.0 🍒 4.1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants