Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](tvf) Support using tvf to read sequence_file/rc_file in local/hdfs/s3 #41080

Merged
merged 25 commits into from
Sep 23, 2024

Conversation

0130w
Copy link

@0130w 0130w commented Sep 20, 2024

Proposed changes

Issue Number: #30669

This change supports reading the contents of external file tables from rcbinary, rctext, and sequence files via the JNI connector.

todo-lists:

  • Support read rc_binary files using local tvf
  • Support read rc_text/sequence files using local tvf
  • Support using s3/hdfs tvf

Example:

sequence file:
input:

select * from local( "file_path" = "test/test.seq", "format" = "sequence", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:map<string,int>;k16:struct<name:string,age:int>");

output:

+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.07 sec)

rc_binary file:
input:

select * from local( "file_path" = "test/test.rcbinary", "format" = "rc_binary", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:m
ap<string,int>;k16:struct<name:string,age:int>");

output:

+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
| k1   | k2   | k3   | k4          | k5   | k6   | k7     | k8   | k9         | k10       | k11  | k12                 | k13        | k14             | k15              | k16                           |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
|    1 |    2 |    3 | 10000000000 | 1.23 | 3.14 | 100.50 | you  | are        | beautiful |    0 | 2023-10-29 02:00:00 | 2023-10-29 | ["D", "E", "F"] | {"k2":5, "k1":3} | {"name":"chandler", "age":54} |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
1 row in set (0.12 sec)

rc_text file:
input:

select * from local( "file_path" = "test/test.rctext", "format" = "rc_text", "backend_id" = "10011", "hive_schema"="k1:tiny
int;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:
map<string,int>;k16:struct<name:string,age:int>");

output:

+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.06 sec)

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@0130w 0130w marked this pull request as ready for review September 21, 2024 17:34
@0130w 0130w changed the title support rcbinary/rctext/sequence file for external file table in jni connector [draft](tvf) support rcbinary/rctext/sequence file for external file table in jni connector Sep 21, 2024
@0130w
Copy link
Author

0130w commented Sep 21, 2024

run buildall

@0130w 0130w changed the title [draft](tvf) support rcbinary/rctext/sequence file for external file table in jni connector [Feature](tvf) support rcbinary/rctext/sequence file for external file table in jni connector Sep 22, 2024
@0130w 0130w changed the title [Feature](tvf) support rcbinary/rctext/sequence file for external file table in jni connector [Feature](tvf) support using tvf to read sequence_file/rc_file in local/hdfs/s3 Sep 22, 2024
@0130w 0130w changed the title [Feature](tvf) support using tvf to read sequence_file/rc_file in local/hdfs/s3 [Feature](tvf) Support using tvf to read sequence_file/rc_file in local/hdfs/s3 Sep 22, 2024
@0130w
Copy link
Author

0130w commented Sep 22, 2024

run buildall

@morningman morningman self-assigned this Sep 23, 2024
@morningman morningman merged commit 6980200 into apache:branch-seq_rc_file Sep 23, 2024
8 of 10 checks passed
morningman pushed a commit that referenced this pull request Sep 24, 2024
…al/hdfs/s3 (#41080)

Issue Number: #30669

<!--Describe your changes.-->

This change supports reading the contents of external file tables from
rcbinary, rctext, and sequence files via the JNI connector.

todo-lists:
- [x] Support read rc_binary files using local tvf
- [x] Support read rc_text/sequence files using local tvf
- [x] Support using s3/hdfs tvf

Example:

**sequence file:**
input:
``` mysql
select * from local( "file_path" = "test/test.seq", "format" = "sequence", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:map<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.07 sec)
```

**rc_binary file:**
input:
```mysql
select * from local( "file_path" = "test/test.rcbinary", "format" = "rc_binary", "backend_id" = "10011", "hive_schema"="k1:tinyint;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:m
ap<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
| k1   | k2   | k3   | k4          | k5   | k6   | k7     | k8   | k9         | k10       | k11  | k12                 | k13        | k14             | k15              | k16                           |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
|    1 |    2 |    3 | 10000000000 | 1.23 | 3.14 | 100.50 | you  | are        | beautiful |    0 | 2023-10-29 02:00:00 | 2023-10-29 | ["D", "E", "F"] | {"k2":5, "k1":3} | {"name":"chandler", "age":54} |
+------+------+------+-------------+------+------+--------+------+------------+-----------+------+---------------------+------------+-----------------+------------------+-------------------------------+
1 row in set (0.12 sec)
```

**rc_text file:**
input:
``` mysql
select * from local( "file_path" = "test/test.rctext", "format" = "rc_text", "backend_id" = "10011", "hive_schema"="k1:tiny
int;k2:smallint;k3:int;k4:bigint;k5:float;k6:double;k7:decimal(10,2);k8:string;k9:char(10);k10:varchar(20);k11:boolean;k12:timestamp;k13:date;k14:array<string>;k15:
map<string,int>;k16:struct<name:string,age:int>");
```
output:
```
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
| k1   | k2   | k3   | k4          | k5   | k6    | k7    | k8    | k9         | k10     | k11  | k12                 | k13        | k14             | k15                  | k16                       |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
|    7 |   13 |   74 | 13000000000 | 6.15 | 4.376 | 57.30 | world | Char       | Varchar |    1 | 2022-01-01 10:00:00 | 2022-01-01 | ["A", "B", "C"] | {"key2":2, "key1":1} | {"name":"John", "age":30} |
+------+------+------+-------------+------+-------+-------+-------+------------+---------+------+---------------------+------------+-----------------+----------------------+---------------------------+
1 row in set (0.06 sec)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants