Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with migrating directly from AWS Glue to Hive #103

Open
hlmiao opened this issue Nov 11, 2021 · 2 comments
Open

Issue with migrating directly from AWS Glue to Hive #103

hlmiao opened this issue Nov 11, 2021 · 2 comments

Comments

@hlmiao
Copy link

hlmiao commented Nov 11, 2021

I am trying to migrate Glue Catalog to Hive Metastore of an EMR Cluster ( I used an external MySQL database as my Hive metastore).

I followed all the steps to migrate directly from AWS Glue to Hive, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:

2021-11-11 09:33:53,573 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/export_from_datacatalog.py", line 138, in
main()
File "/tmp/export_from_datacatalog.py", line 134, in main
connection=glue_context.extract_jdbc_conf(connection_name)
File "/tmp/export_from_datacatalog.py", line 38, in datacatalog_migrate_to_hive_metastore
transform_databases_tables_partitions(sc, sql_context, hive_metastore, databases, tables, partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1445, in transform_databases_tables_partitions
.transform(hms=hive_metastore, databases=databases, tables=tables, partitions=partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1227, in transform
(ms_sds, ms_tbls, ms_partitions) = self.extract_sds(ms_tbls, ms_partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1018, in extract_sds
.drop_columns(['ID', 'type'])
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 182, in drop_columns
df = df.drop(col)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 2519, in drop
jdf = self._jdf.drop(self._jseq(cols))
AttributeError: 'str' object has no attribute '_jdf'

@vinceRicchiuti
Copy link

Hi @hlmiao,
i'm trying to do the opposite of what are you doing. I'm actually try to find the bug about this error and found that the problem is the bind of methods like drop_columns to the class DataFrame. This bindind is not working as expected, i modify the script removing these bindings and the script goes over.

Actually i still have bugs on script but hope this workaround can fix your problem.

@hlmiao
Copy link
Author

hlmiao commented Jan 31, 2022

That sounds good. I'll try it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants