Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect to Apache Hive endpoint in OCI BDS #22316

Closed
3 tasks done
davidkhala opened this issue Dec 2, 2022 · 14 comments
Closed
3 tasks done

Connect to Apache Hive endpoint in OCI BDS #22316

davidkhala opened this issue Dec 2, 2022 · 14 comments
Assignees
Labels
#bug Bug report data:connect:hive Related to Hive data:connect:oracle Related to Oracle

Comments

@davidkhala
Copy link

davidkhala commented Dec 2, 2022

I have setup an Oracle Cloud Big Data Service to privide a Hive. I want to connect it.

How to reproduce the bug

  1. Go to 'Connect a database'
  2. Choose 'Apache Hive' from 'SUPPORTED DATABASES' dropdown
  3. Input the SQLALCHEMY URI as hive://hive@168.138.166.53:10000
  4. Click "Test Connection", it will success
  5. Click "Connect"
  6. See error

Expected results

Silent success

Actual results

An error popup
An error occurred while fetching databases: Object of type bytes is not JSON serializable

image

Screenshots

image

Environment

(please complete the following information):

  • browser type and version: MS Edge
  • superset version: master
  • python version: docker image inline
  • node.js version: docker image inline
  • any feature flags active:

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

python stacktraces

superset_app          | 2022-12-02 12:45:56,017:INFO:pyhive.hive:SHOW SCHEMAS
superset_app          | 2022-12-02 12:45:56,017:DEBUG:pyhive.hive:TExecuteStatementReq(sessionHandle=TSessionHandle(sessionId=THandleIdentifier(guid=b'\xff\xeeQ\xdc0\xe2F\xc7\xbe\xd7hU\xad\x103\xb1', secret=b'\xdeaVP\xc6\x07G\xc3\x823DOG\x9b\xc9\x02')), statement='SHOW SCHEMAS', confOverlay=None, runAsync=False, queryTimeout=0)
superset_app          | 2022-12-02 12:45:56,110:DEBUG:pyhive.hive:TExecuteStatementResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), operationHandle=TOperationHandle(operationId=THandleIdentifier(guid=b'\xefx\\\x88\xa1\xd8@m\x9a=\x89\x1f\xd6I\x95\x81', secret=b'-K\x01\x93G\x11N\x91\x8d\xc3\xce\x87\xd9\x19\x14\xfc'), operationType=0, hasResultSet=True, modifiedRowCount=None))
superset_app          | 2022-12-02 12:45:56,111:DEBUG:pyhive.hive:TGetResultSetMetadataResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), schema=TTableSchema(columns=[TColumnDesc(columnName='database_name', typeDesc=TTypeDesc(types=[TTypeEntry(primitiveEntry=TPrimitiveTypeEntry(type=7, typeQualifiers=None), arrayEntry=None, mapEntry=None, structEntry=None, unionEntry=None, userDefinedTypeEntry=None)]), position=1, comment='from deserializer')]))
superset_app          | 2022-12-02 12:45:56,113:DEBUG:pyhive.hive:TFetchResultsResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), hasMoreRows=False, results=TRowSet(startRowOffset=0, rows=[], columns=[TColumn(boolVal=None, byteVal=None, i16Val=None, i32Val=None, i64Val=None, doubleVal=None, stringVal=TStringColumn(values=['ctm', 'default', 'information_schema', 'sys'], nulls=b'\x00'), binaryVal=None)], binaryColumns=None, columnCount=None))
superset_app          | 2022-12-02 12:45:56,113:DEBUG:pyhive.hive:TFetchResultsResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None), hasMoreRows=False, results=TRowSet(startRowOffset=0, rows=[], columns=[TColumn(boolVal=None, byteVal=None, i16Val=None, i32Val=None, i64Val=None, doubleVal=None, stringVal=TStringColumn(values=[], nulls=b'\x00'), binaryVal=None)], binaryColumns=None, columnCount=None))
superset_app          | 2022-12-02 12:45:56,123:DEBUG:pyhive.hive:TCloseOperationResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None))
superset_app          | 2022-12-02 12:45:56,126:DEBUG:pyhive.hive:TCloseSessionResp(status=TStatus(statusCode=0, infoMessages=None, sqlState=None, errorCode=None, errorMessage=None))
superset_app          | 2022-12-02 12:45:56,165:ERROR:root:Object of type bytes is not JSON serializable
superset_app          | Traceback (most recent call last):
superset_app          |   File "/usr/local/lib/python3.8/site-packages/flask_appbuilder/api/__init__.py", line 86, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 114, in wraps
superset_app          |     raise ex
superset_app          |   File "/app/superset/views/base_api.py", line 111, in wraps
superset_app          |     duration, response = time_function(f, self, *args, **kwargs)
superset_app          |   File "/app/superset/utils/core.py", line 1604, in time_function
superset_app          |     response = func(*args, **kwargs)
superset_app          |   File "/app/superset/utils/log.py", line 265, in wrapper
superset_app          |     value = f(*args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 84, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/databases/api.py", line 281, in post
superset_app          |     return self.response(201, id=new_model.id, result=item)
superset_app          |   File "/usr/local/lib/python3.8/site-packages/flask_appbuilder/api/__init__.py", line 704, in response
superset_app          |     _ret_json = jsonify(kwargs)
superset_app          |   File "/usr/local/lib/python3.8/site-packages/flask/json/__init__.py", line 361, in jsonify
superset_app          |     f"{dumps(data, indent=indent, separators=separators)}\n",
superset_app          |   File "/usr/local/lib/python3.8/site-packages/flask/json/__init__.py", line 139, in dumps
superset_app          |     rv = _json.dumps(obj, **kwargs)
superset_app          |   File "/usr/local/lib/python3.8/json/__init__.py", line 234, in dumps
superset_app          |     return cls(
superset_app          |   File "/usr/local/lib/python3.8/json/encoder.py", line 199, in encode
superset_app          |     chunks = self.iterencode(o, _one_shot=True)
superset_app          |   File "/usr/local/lib/python3.8/json/encoder.py", line 257, in iterencode
superset_app          |     return _iterencode(o, 0)
superset_app          |   File "/usr/local/lib/python3.8/site-packages/flask/json/__init__.py", line 57, in default
superset_app          |     return super().default(o)
superset_app          |   File "/usr/local/lib/python3.8/json/encoder.py", line 179, in default
superset_app          |     raise TypeError(f'Object of type {o.__class__.__name__} '
superset_app          | TypeError: Object of type bytes is not JSON serializable
@davidkhala davidkhala added the #bug Bug report label Dec 2, 2022
@davidkhala davidkhala changed the title Apache Hive in OCI BDS Connect to Apache Hive endpoint in OCI BDS Dec 2, 2022
@rusackas
Copy link
Member

rusackas commented Dec 2, 2022

@bkyryliuk uses Hive, if i'm not mistaken. Have you run into anything like this?

@davidkhala
Copy link
Author

The same error also applies to homebrew Hive deployment.

I also checked it happened even with

cd superset
git checkout 1.4.0
TAG=1.4.0 docker compose -f docker-compose-non-dev.yml up

I highly suspect it is introduced by hive driver, since it appears quite recently, within a few days.

@doglex
Copy link

doglex commented Jan 10, 2023

same

1 similar comment
@ljyf5593
Copy link

same

@kashifjavedaddo
Copy link

Hi everyone,

We have also started facing this issue after upgrading to apache superset 2.1.0

Is there any resolution or workaround to connect to Apache Hive from superset?

We will appreciate an update as there is no reply on this thread regarding possible solution or workaround for almost 4 months now.

Copy link

Hi everyone,

We have also started facing this issue after upgrading to apache superset 2.1.0

Is there any resolution or workaround to connect to Apache Hive from superset? The connection was working fine before upgrade in version 2.0.1

We will appreciate an update as there is no reply on the github thread regarding possible solution or workaround for almost 4 months now.

Regards,

Kashif Javed Rana

@rusackas rusackas added the data:connect:hive Related to Hive label Apr 20, 2023
Usiel added a commit to Usiel/PyHive that referenced this issue Apr 28, 2023
PyHive's HiveDialect usage of bytes for the name and driver fields is not the norm is causing issues upstream: apache/superset#22316
Even other dialects within PyHive use strings. SQLAlchemy does not strictly require a string, but all the stock dialects return a string, so I figure it is heavily implied.

I think the risk of breaking something upstream with this change is low (but it is there ofc). I figure in most cases we just make someone's `str(dialect.driver)` expression redundant.

Examples for some of the other stock sqlalchemy dialects (name and driver fields using str):
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/pysqlite.py#L501
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/base.py#L1891
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/base.py#L2383
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/mysqldb.py#L113
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/pymysql.py#L59
@Usiel
Copy link
Contributor

Usiel commented Apr 28, 2023

The issue is related to this newly added line and the fact that PyHive returns bytes instead of a string for the dialect's driver field, which then in turn messes with the JSON serialization. I figure we can handle this oddness on our side if needed, but I opened a PR in the meantime (dropbox/PyHive#450).

A less than ideal workaround that might work: Use http to connect to your Hive server with hive+http:// or hive+https:// instead of the default thrift driver.

bkyryliuk pushed a commit to dropbox/PyHive that referenced this issue May 9, 2023
PyHive's HiveDialect usage of bytes for the name and driver fields is not the norm is causing issues upstream: apache/superset#22316
Even other dialects within PyHive use strings. SQLAlchemy does not strictly require a string, but all the stock dialects return a string, so I figure it is heavily implied.

I think the risk of breaking something upstream with this change is low (but it is there ofc). I figure in most cases we just make someone's `str(dialect.driver)` expression redundant.

Examples for some of the other stock sqlalchemy dialects (name and driver fields using str):
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/pysqlite.py#L501
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/base.py#L1891
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/base.py#L2383
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/mysqldb.py#L113
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/pymysql.py#L59
@welljs
Copy link

welljs commented May 29, 2023

The issue is related to this newly added line and the fact that PyHive returns bytes instead of a string for the dialect's driver field, which then in turn messes with the JSON serialization. I figure we can handle this oddness on our side if needed, but I opened a PR in the meantime (dropbox/PyHive#450).

A less than ideal workaround that might work: Use http to connect to your Hive server with or instead of the default thrift driver.hive+http://``hive+https://

it doesn't work for me. I see error:

An error occurred while creating databases: (builtins.NoneType) None
[SQL: Authentication is not valid use one of:BASIC, NOSASL, KERBEROS, NONE]
(Background on this error at: https://sqlalche.me/e/14/dbapi)

@Usiel
Copy link
Contributor

Usiel commented Jun 5, 2023

The issue is related to this newly added line and the fact that PyHive returns bytes instead of a string for the dialect's driver field, which then in turn messes with the JSON serialization. I figure we can handle this oddness on our side if needed, but I opened a PR in the meantime (dropbox/PyHive#450).
A less than ideal workaround that might work: Use http to connect to your Hive server with or instead of the default thrift driver. hive+http://hive+https:// ``

it doesn't work for me. I see error:

An error occurred while creating databases: (builtins.NoneType) None
[SQL: Authentication is not valid use one of:BASIC, NOSASL, KERBEROS, NONE]
(Background on this error at: https://sqlalche.me/e/14/dbapi)

Hi @welljs, seems unrelated to this issue to me. Probably a problem with the used SQLAlchemy URI (auth param?).

@wesleygoi-liftoff
Copy link

need to be fixed

@Usiel
Copy link
Contributor

Usiel commented Jun 15, 2023

Once PyHive releases the fix we can update the dependency on Superset. Currently, 0.7.0 is a pre-release: https://pypi.org/project/PyHive/#history

@terrancesnyder
Copy link

+1 please fix

@rusackas rusackas added the data:connect:oracle Related to Oracle label Jul 24, 2023
@bkyryliuk
Copy link
Member

0.7.0 is released, give it a try

@rusackas
Copy link
Member

rusackas commented Mar 8, 2024

It sounds like this is likely fixed by now, and is pretty out of date if not. If people are still encountering this in current versions (3.x) please open a new Issue with updated context or a PR to address the problem. Thanks!

@rusackas rusackas closed this as completed Mar 8, 2024
betodealmeida added a commit to preset-io/PyHive that referenced this issue Aug 8, 2024
* feat: add HTTP and HTTPS to hive (dropbox#385)

* feat: add https protocol

* support HTTP

* fix: make hive https py2 compat (dropbox#389)

* fix: make hive https py2 compat

* fix lint

* Update README.rst (dropbox#423)

* chore: rename Trino entry point (dropbox#428)

* Support for Presto decimals (dropbox#430)

* Support for Presto decimals

* lower

* Use str type for driver and name in HiveDialect (dropbox#450)

PyHive's HiveDialect usage of bytes for the name and driver fields is not the norm is causing issues upstream: apache/superset#22316
Even other dialects within PyHive use strings. SQLAlchemy does not strictly require a string, but all the stock dialects return a string, so I figure it is heavily implied.

I think the risk of breaking something upstream with this change is low (but it is there ofc). I figure in most cases we just make someone's `str(dialect.driver)` expression redundant.

Examples for some of the other stock sqlalchemy dialects (name and driver fields using str):
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/pysqlite.py#L501
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/sqlite/base.py#L1891
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/base.py#L2383
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/mysqldb.py#L113
https://github.com/zzzeek/sqlalchemy/blob/main/lib/sqlalchemy/dialects/mysql/pymysql.py#L59

* Correcting Iterable import for python 3.10 (dropbox#451)

* changing drivers to support hive, presto and trino with sqlalchemy>=2.0 (dropbox#448)

* Revert "changing drivers to support hive, presto and trino with sqlalchemy>=2.0 (dropbox#448)" (dropbox#452)

This reverts commit b0206d3.

* Update __init__.py (dropbox#453)

dropbox@1c1da8b

dropbox@1f99552

* use pure-sasl with python 3.11 (dropbox#454)

* minimal changes for sqlalchemy 2.0 support (dropbox#457)

* update readme to reflect recent changes (dropbox#459)

* Update README.rst (dropbox#475)

* Update README.rst (dropbox#476)

* feat: JWT support

* Add CI to build package

---------

Co-authored-by: Daniel Vaz Gaspar <danielvazgaspar@gmail.com>
Co-authored-by: Bogdan <b.kyryliuk@gmail.com>
Co-authored-by: serenajiang <serena.jiang@airbnb.com>
Co-authored-by: Usiel Riedl <usiel.riedl@gmail.com>
Co-authored-by: Multazim Deshmukh <57723564+mdeshmu@users.noreply.github.com>
Co-authored-by: nicholas-miles <nicholas.miles6@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report data:connect:hive Related to Hive data:connect:oracle Related to Oracle
Projects
None yet
Development

No branches or pull requests