Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][hive-source][bug] fix An error occurred reading an empty directory #5427

Merged
merged 4 commits into from
Sep 12, 2023

Conversation

zhilinli123
Copy link
Contributor

@zhilinli123 zhilinli123 commented Sep 5, 2023

close #5416

@zhilinli123
Copy link
Contributor Author

PTAL: @FlechazoW @TyrantLucifer

+ "please check the configuration parameters such as: [file_filter_pattern]");
if (this.fileNames.isEmpty()) {
log.error("The current directory is empty " + path);
this.fileNames.add(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place looks strange. If there are no files in the directory, is it more appropriate to return an empty file list? I think the reader of the file list should handle the empty file list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I made some changes plase review

@zhilinli123
Copy link
Contributor Author

PTAL: @EricJoy2048

@liugddx liugddx merged commit de7b86a into apache:dev Sep 12, 2023
@zhilinli123
Copy link
Contributor Author

zhilinli123 commented Sep 12, 2023

Hive Source table

CREATE TABLE `default.my_table`(
  `id` int, 
  `name` string)
PARTITIONED BY ( 
  `year` int, 
  `month` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/my_table'
TBLPROPERTIES (
  'bucketing_version'='2', 
  'transient_lastDdlTime'='1693883163');

Mysql Sink table

CREATE TABLE `my_table_hive` (
  `id` int NOT NULL,
  `name` varchar(25) DEFAULT NULL,
  `year` int DEFAULT NULL,
  `month` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Spark Demo Conf


env {
  # You can set spark configuration here
  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  #job.mode = BATCH
  job.name = "SeaTunnel"
  spark.executor.instances = 1
  spark.executor.cores = 1
  spark.executor.memory = "1g"
  spark.master = local
}


source {
  Hive {
    table_name = "default.my_table"
    metastore_uri = "thrift://localhost:9083"
  }
}
transform {

}

sink {
  jdbc {
    url = "jdbc:mysql://localhost:3306/test"
    driver = "com.mysql.cj.jdbc.Driver"
    user = "root"
    password = "12345678"
    query = "insert into my_table_hive (id, name, year,month) values (?,?,?,?)"
  }
}
## Hdfs Dir 
drwxr-xr-x   - mac supergroup          0 2023-09-05 11:32 hdfs://localhost:9000/user/hive/warehouse/my_table/year=2023/month=7
drwxr-xr-x   - mac supergroup          0 2023-09-05 11:07 hdfs://localhost:9000/user/hive/warehouse/my_table/year=2023/month=8
drwxr-xr-x   - mac supergroup          0 2023-09-05 11:06 hdfs://localhost:9000/user/hive/warehouse/my_table/year=2023/month=9
mac@zhilinliMac ~ % hadoop fs -ls hdfs://localhost:9000/user/hive/warehouse/my_table/year=2023/month=7
2023-09-12 15:35:09,997 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mac@zhilinliMac ~ % hadoop fs -ls hdfs://localhost:9000/user/hive/warehouse/my_table/year=2023/month=8
2023-09-12 15:36:42,740 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop fs -ls hdfs://localhost:9000/user/hive/warehouse/my_table/year=2023/month=9
2023-09-12 15:35:40,602 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 mac supergroup         14 2023-09-05 11:06 hdfs://localhost:9000/user/hive/warehouse/my_table/year=2023/month=9/000000_0
## hive query Data 
select * from default.my_table_hive;
1       Alice   2023    9
2       Bob     2023    9

image
image
For this pr I left a few credentials, and then I will try to add e2e @EricJoy2048 @liugddx @TyrantLucifer

+ "please check the configuration parameters such as: [file_filter_pattern]");
if (this.fileNames.isEmpty()) {
log.error("The current directory is empty " + path);
this.fileNames.add(path);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥还把文件夹的名称加进去,fileNames里面装的是可阅读的文件的名称,只要保证这个方法返回的集合里的文件不为空,如若为空则抛出异常就可以。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥还把文件夹的名称加进去,fileNames里面装的是可阅读的文件的名称,只要保证这个方法返回的集合里的文件不为空,如若为空则抛出异常就可以。

cc @zhilinli123

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks the up-to-date code not add the path, just logger the error info. @ponyliuh
And we can't throw exception when size is empty. because if the table is empty, we still want it run well, just sync nothing not get an exception.
image

EricJoy2048 added a commit that referenced this pull request Sep 14, 2023
TyrantLucifer pushed a commit that referenced this pull request Sep 15, 2023
gnehil pushed a commit to gnehil/seatunnel that referenced this pull request Oct 12, 2023
…ory (apache#5427)

* [fix][hive-source][bug] fix An error occurred reading an empty directory

* [fix][hive-source][bug] fix An error occurred reading an empty directory
gnehil pushed a commit to gnehil/seatunnel that referenced this pull request Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] [hive source mysql target] File list is empty Error
5 participants