You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a Glue catalog in our dev aws account and now i am trying to migrate this Glue Catalog to Hive Metastore of an EMR Cluster (I need to do this to replace Hive Metastore content with Glue Catalog metadata so that I can track our Glue Catalog Data lineage using Apache Atlas installed on the EMR cluster).
I followed all the steps and procedures to Directly Migrate Glue Catalog to Hive Metastore but i am getting the Duplicate entry 'default' for key 'UNIQUE_DATABASE' error everytime and i have tried various iterations but still keep getting the same errors when i run the Glue ETL job. See the full error message below:
py4j.protocol.Py4JJavaError: An error occurred while calling o1025.jdbc.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 77.0 failed 4 times, most recent failure: Lost task 1.3 in stage 77.0 (TID 1298, ip-00-00-000-000.ec2.internal, executor 32): java.sql.BatchUpdateException: Duplicate entry 'default' for key 'UNIQUE_DATABASE'
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1815)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1277)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:642)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:783)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:783)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2069)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry 'default' for key 'UNIQUE_DATABASE'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:971)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2141)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1773)
So as per the migration procedure, we need to provide "--database-names" parameter values as key pairs so I tried specifying the list of databases separated by a semi-colon(;) first and also tried specifying just one database like "default" but none of this works. The above error was thrown when I used just the default database.
Is anyone familiar with this error? is there any work around to this issue? Please let me know if I am missing something. Any help is appreciated.
The text was updated successfully, but these errors were encountered:
We have a Glue catalog in our dev aws account and now i am trying to migrate this Glue Catalog to Hive Metastore of an EMR Cluster (I need to do this to replace Hive Metastore content with Glue Catalog metadata so that I can track our Glue Catalog Data lineage using Apache Atlas installed on the EMR cluster).
I followed all the steps and procedures to Directly Migrate Glue Catalog to Hive Metastore but i am getting the Duplicate entry 'default' for key 'UNIQUE_DATABASE' error everytime and i have tried various iterations but still keep getting the same errors when i run the Glue ETL job. See the full error message below:
So as per the migration procedure, we need to provide "--database-names" parameter values as key pairs so I tried specifying the list of databases separated by a semi-colon(;) first and also tried specifying just one database like "default" but none of this works. The above error was thrown when I used just the default database.
Is anyone familiar with this error? is there any work around to this issue? Please let me know if I am missing something. Any help is appreciated.
The text was updated successfully, but these errors were encountered: