Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic in DeltaSharingFileSystem to conditionally downgrade https requests to http #559

Merged
merged 11 commits into from
Aug 14, 2024

Conversation

taiga-db
Copy link
Collaborator

@taiga-db taiga-db commented Aug 8, 2024

Conditionally allow https requests (to access storage buckets) in DeltaSharingFileSystem to be downgraded to http when a proxy is specified (introduced in this PR) through a new spark config, spark.delta.sharing.never.use.https

verified using unit test, DeltaSharingFileSystemSuite and integration test, table1 with storage proxy. The latter validates that the pre-signed url file-access code path is successful when requests go through a proxy server (where the request is upgraded from http -> https).

@taiga-db taiga-db marked this pull request as draft August 8, 2024 05:52
@@ -69,7 +70,31 @@ private[sharing] class DeltaSharingFileSystem extends FileSystem with Logging {
val proxy = new HttpHost(proxyConfig.host, proxyConfig.port)
clientBuilder.setProxy(proxy)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below is a HttpRequestExecutor that is conditionally (spark config) set to the client that effectively downgrades https requests to http by returning a wrappedRequest with a modified uri.

  • new code path is only activated by spark config
  • any failures during downgrade are swallowed and a log is emitted

Copy link
Collaborator

@wchau wchau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linzhou-db For DS to use it, we need to release this in DS kernel, too, right?

@@ -69,7 +70,31 @@ private[sharing] class DeltaSharingFileSystem extends FileSystem with Logging {
val proxy = new HttpHost(proxyConfig.host, proxyConfig.port)
clientBuilder.setProxy(proxy)

if (proxyConfig.noProxyHosts.nonEmpty) {
val neverUseHttps = getConf.getBoolean("spark.delta.sharing.never.use.https", false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are only doing https degrading if proxy is active? Looking at the other names of the DS confs, maybe spark.delta.sharing.network.never.use.https?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, only downgrading if proxy is provided. If this is an unnecessary restriction, i can remove.

@taiga-db taiga-db requested a review from wchau August 12, 2024 17:00
@taiga-db taiga-db changed the title [WIP] - Add logic in DeltaSharingFileSystem to conditionally downgrade https requests to http Add logic in DeltaSharingFileSystem to conditionally downgrade https requests to http Aug 12, 2024
@taiga-db taiga-db marked this pull request as ready for review August 12, 2024 17:02
@@ -89,6 +89,9 @@ object ConfUtils {
"spark.delta.sharing.oauth.tokenRenewalThresholdInSeconds"
val OAUTH_EXPIRATION_THRESHOLD_SECONDS_DEFAULT = 10 * 60 /* 10 mins */

val NEVER_USE_HTTPS = "spark.delta.sharing.never.use.https"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.delta.sharing.network.never.use.https?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, will update

@wchau wchau requested a review from linzhou-db August 12, 2024 18:58
@@ -89,6 +89,9 @@ object ConfUtils {
"spark.delta.sharing.oauth.tokenRenewalThresholdInSeconds"
val OAUTH_EXPIRATION_THRESHOLD_SECONDS_DEFAULT = 10 * 60 /* 10 mins */

val NEVER_USE_HTTPS = "spark.delta.sharing.network.never.use.https"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we we have a better name? or you prefer never?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this naming mirrors existing settings (like this one)

@@ -69,7 +70,31 @@ private[sharing] class DeltaSharingFileSystem extends FileSystem with Logging {
val proxy = new HttpHost(proxyConfig.host, proxyConfig.port)
clientBuilder.setProxy(proxy)

if (proxyConfig.noProxyHosts.nonEmpty) {
val neverUseHttps = ConfUtils.getNeverUseHttps(getConf)
if (neverUseHttps) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this orthogonal with the proxy change? or could it only be applied when a proxy is set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it's possible to keep the logic separate from the proxy change, it's not realistic (at least from what I can tell) to downgrade https requests to access storage buckets without one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it could only be applied when a proxy is set?

Copy link
Collaborator Author

@taiga-db taiga-db Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. that is the plan (and how it is implemented here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you guide me in which line is that enforced?
And what's the code path when there's no proxy set (and it will ignore the neverUseHttps config)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if proxy configs (host and port) are not defined, we do not enter this block. This block contains all of the net-new code introduced in this PR. On top of that, the new code path for attaching the HttpRequestExecutor will never get executed if there's a proxy set but neverUseHttps: false

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks!

@@ -69,7 +70,31 @@ private[sharing] class DeltaSharingFileSystem extends FileSystem with Logging {
val proxy = new HttpHost(proxyConfig.host, proxyConfig.port)
clientBuilder.setProxy(proxy)

if (proxyConfig.noProxyHosts.nonEmpty) {
val neverUseHttps = ConfUtils.getNeverUseHttps(getConf)
if (neverUseHttps) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you guide me in which line is that enforced?
And what's the code path when there's no proxy set (and it will ignore the neverUseHttps config)?

@linzhou-db linzhou-db merged commit e26d824 into delta-io:main Aug 14, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants