-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support to connections with Thrift over HTTP transport. #325
base: master
Are you sure you want to change the base?
Conversation
joaopedroantonio
commented
Apr 4, 2020
•
edited
Loading
edited
- Supports HTTP transport for Thrift protocol.
- Three types of authentication supported: NONE, NOSASL, BASIC and KERBEROS.
- BASIC authentication is useful when the Thrift HTTP interface is behind a proxy (e.g. in Azure HDInsight clusters).
This PR solves issue #69. Let me know what you think and if it looks good for you I'll invest time in the tests that are missing in this PR. :) |
- Supports HTTP transport for Thrift protocol. - Three types of authentication supported: NONE, BASIC and KERBEROS.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #325 +/- ##
==========================================
+ Coverage 93.23% 93.67% +0.44%
==========================================
Files 14 14
Lines 1523 1677 +154
Branches 165 185 +20
==========================================
+ Hits 1420 1571 +151
+ Misses 75 74 -1
- Partials 28 32 +4 ☔ View full report in Codecov by Sentry. |
will take a pass on it next week, unfortunately can't get to this earlier |
# TODO | ||
# Setting the Cookie in the headers should be implemented in the thrift library. | ||
# We'll keep this here until that change is available in there. | ||
class TCookieHttpClient(thrift.transport.THttpClient.THttpClient): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the meantime, this commit was merged into Thrift master branch, we just have to wait for a new Thrift release to get rid of this TCookieHttpClient:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this fix has been included in the 0.13.0 Release
@joaopedroantonio I'm aligned with the approach here, let's add the unit tests & add the test plan in the PR description if you don't mind. |
Thank you for the feedback @bkyryliuk, will start working on the tests between this weekend and start of next week. |
Any update on this? |
Oh well, tbh this got lost on my backlog, thanks for the bump. :) Will try to wrap it up this week. |
Added unit tests for different HTTP scenarios and fixed a few issues in the process. @bkyryliuk the codecov checks failed but I guess that's mostly because I moved the binary transport implementation and there are a few line/branch misses. Do you think it's relevant to cover those misses or can we move forward as it is? |
@bkyryliuk, any idea when you might be able to pick this up? :) |
@bkyryliuk bump. :) |
Hello again everyone! Can you give some feedback on this PR? Thanks! (cc @bkyryliuk ) |
if http_path is None: | ||
http_path = '/' | ||
|
||
socket = TCookieHttpClient('http://{}:{}{}'.format(host, port, http_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could also pass through a http_protocol
, either http
or https
and have this set.
I've tested this PR and this is working nicely for our HiveServer2 service running over |
For our particular use-case we need a few more features:
I don't think these should block this PR though, I'd be happy to attempt to add these features once this initial version is in, unless of course you want to add them :) |
if auth is None: | ||
auth = 'NONE' | ||
if http_path is None: | ||
http_path = '/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be better to default this to /cliservice
, as that seems to be the default path HiveServer2 provides
@gthomas-slack sorry for the late reply! I'm glad this was helpful! I'll take a better look at your comments over the weekend and I'm happy to include your suggestions if they aren't too complex. :) |
For thrift=0.13.0 (current version in conda/pip) it doesn't work with the included TCookieHttpClient wrapper class, and I get the following error in the connect constructor when executing the first "use default" query:
After replacing TCookieHttpClient with the original THttpClient it seems to work ok for http connections (no SSL support included). |
I believe the parameters name for the HTTP Thrift should be renamed to match the (simpler) names used in the original JDBC Thrift URI. Also 'binary', 'http', 'sasl' are not named "protocols" but transport "modes" according to JDBC/Hive documentation Example Thrift JDBC over HTTPS URI: Based on this it seems the names should be:
The remaining parameters needed to support HTTPS (HTTP over SSL) should be:
|
Hi @joaopedroantonio . Bumping this. Thank you for working on this! Could you pls look at the new feedback. Thank you |
I've tested this with my Spark Thrift HiveServer2 deployment behind a load balancer on Kubernetes and it works great. |
For anyone who needs to use http/https HiveServer2 in production right now, It's actually possible with the current release of Here is a basic example using HTTPS and adding a custom authentication header: from pyhive import hive
import base64
import thrift.transport.THttpClient
def thrift_http_transport():
transport = thrift.transport.THttpClient.THttpClient(uri_or_host='https://my-hiveserver2.com:443/cliservice')
auth_credentials = '{}:{}'.format('test', 'test').encode('UTF-8')
auth_credentials_base64 = base64.standard_b64encode(auth_credentials).decode('UTF-8')
transport.setCustomHeaders(
{
'Authorization': 'Basic {}'.format(auth_credentials_base64), # HiveServer2 BASIC auth
'X-Auth-Token': 'xxx' # Custom header to auth with some kind of middleware
}
)
return transport
conn = hive.connect(thrift_transport=thrift_http_transport())
cursor = conn.cursor()
cursor.execute("""SELECT SUM(1) from model.dim_date""")
data = cursor.fetchall()
cursor.close()
conn.close()
print(data) |
Can we use http protocol with SQLAlchemy? |
Does anyone have an example for http connection with kerberos auth ? |
FYI there is currently a bug using HTTP transport connections when using thrift |
Joao Antonio seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Hi @joaopedroantonio are there any updates on this? |