feat(hbase): support gen HFile for hbase v2 (BETA) #358

haohao0103 · 2022-11-07T11:43:43Z

close #357

1、Support write vertex/edge directly to KV storage
2、Just support customString and customNumber ID now
3、submit the loader code for bypass server for hbase writing

合并原仓库代码

…schema to SchemaCache fix apache#333

fix apache#333

…graph-toolchain

merge master into schemaCache-optimize

…incubator-hugegraph-toolchain into schemaCache-optimize

fix apache#357

imbajin · 2022-11-07T11:52:51Z

@JackyYangPassion Is this an improved part?

codecov · 2022-11-07T11:55:33Z

Codecov Report

Merging #358 (e3c8a90) into master (c893f50) will decrease coverage by 2.37%.
The diff coverage is 6.92%.

@@             Coverage Diff              @@
##             master     #358      +/-   ##
============================================
- Coverage     64.82%   62.44%   -2.38%     
- Complexity     1851     1864      +13     
============================================
  Files           255      260       +5     
  Lines          9081     9462     +381     
  Branches        837      874      +37     
============================================
+ Hits           5887     5909      +22     
- Misses         2810     3169     +359     
  Partials        384      384

Impacted Files	Coverage Δ
...om/baidu/hugegraph/loader/builder/EdgeBuilder.java	`67.74% <0.00%> (-25.60%)`	⬇️
...baidu/hugegraph/loader/builder/ElementBuilder.java	`89.71% <ø> (ø)`
.../baidu/hugegraph/loader/builder/VertexBuilder.java	`61.29% <0.00%> (-21.32%)`	⬇️
...com/baidu/hugegraph/loader/constant/Constants.java	`75.00% <ø> (ø)`
...u/hugegraph/loader/direct/loader/DirectLoader.java	`0.00% <0.00%> (ø)`
...egraph/loader/direct/loader/HBaseDirectLoader.java	`0.00% <0.00%> (ø)`
...aidu/hugegraph/loader/direct/util/SinkToHBase.java	`0.00% <0.00%> (ø)`
...ugegraph/loader/metrics/LoadDistributeMetrics.java	`0.00% <0.00%> (ø)`
...u/hugegraph/loader/spark/HugeGraphSparkLoader.java	`0.00% <0.00%> (ø)`
...m/baidu/hugegraph/loader/executor/LoadOptions.java	`70.40% <30.00%> (-4.60%)`	⬇️
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

JackyYangPassion · 2022-11-07T12:07:29Z

@JackyYangPassion Is this an improved part?

support bulkload from hive with client bypass server feature.
this feature has been launched, which solves the problem of importing large amounts of data through API and affecting queries

imbajin · 2022-11-07T12:17:32Z

OK, mark it also as to be reviewed.

and could u handle the third-party dependencies check?

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/builder/EdgeBuilder.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/builder/VertexBuilder.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/loader/DirectLoader.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/loader/HBaseDirectLoader.java

adjust code style fix apache#357

haohao0103 · 2022-11-07T23:17:15Z

1、The code style has been adjusted,
2、third-party dependencies has added to the known-dependencies.txt
@JackyYangPassion @javeme @imbajin

fix third-party dependencies error fix apache#357

imbajin · 2022-11-08T03:43:34Z

1、The code style has been adjusted,
2、third-party dependencies has added to the known-dependencies.txt
@JackyYangPassion @javeme @imbajin

thanks，the 3rd party check seems failed，need some help？

javeme

Thanks for your contribution~
please also address other comments: https://github.com/apache/incubator-hugegraph-toolchain/pull/358/files (search by "ago"), and also address this file LoadOptions.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/loader/HBaseDirectLoader.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/util/SinkToHBase.java

adjust code style fix apache#357

…aCache-optimize

haohao0103 · 2022-11-09T11:33:36Z

@imbajin Hi, I can help solve the loader ci check failure

imbajin · 2022-11-09T11:40:08Z

@imbajin Hi, I can help solve the loader ci check failure

Thanks, I have already adopted the basic code, and current the differ is:

expected:

{
    "version":"2.0",
    "structs":[
        {
            "id":"1",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"users.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "Gender",
                    "Age",
                    "Occupation",
                    "Zip-code"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[
                {
                    "label":"user",
                    "skip":false,
                    "id":null,
                    "unfold":false,
                    "field_mapping":{
                        "UserID":"id"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Occupation",
                        "Zip-code",
                        "Gender",
                        "Age"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ],
            "edges":[

            ]
        },
        {
            "id":"2",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"ratings.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "MovieID",
                    "Rating",
                    "Timestamp"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[

            ],
            "edges":[
                {
                    "label":"rating",
                    "skip":false,
                    "source":[
                        "UserID"
                    ],
                    "unfold_source":false,
                    "target":[
                        "MovieID"
                    ],
                    "unfold_target":false,
                    "field_mapping":{
                        "UserID":"id",
                        "MovieID":"id",
                        "Rating":"rate"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Timestamp"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ]
        }
    ]
}

actual:

{
    "version":"2.0",
    "structs":[
        {
            "id":"1",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"users.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "Gender",
                    "Age",
                    "Occupation",
                    "Zip-code"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[
                {
                    "label":"user",
                    "skip":false,
                    "id":null,
                    "unfold":false,
                    "field_mapping":{
                        "UserID":"id"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Occupation",
                        "Zip-code",
                        "Gender",
                        "Age"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ],
            "edges":[

            ]
        },
        {
            "id":"2",
            "skip":false,
            "input":{
                "type":"FILE",
                "path":"ratings.dat",
                "file_filter":{
                    "extensions":[
                        "*"
                    ]
                },
                "format":"TEXT",
                "delimiter":"::",
                "date_format":"yyyy-MM-dd HH:mm:ss",
                "time_zone":"GMT+8",
                "skipped_line":{
                    "regex":"(^#|^//).*|"
                },
                "compression":"NONE",
                "batch_size":500,
                "header":[
                    "UserID",
                    "MovieID",
                    "Rating",
                    "Timestamp"
                ],
                "charset":"UTF-8",
                "list_format":null
            },
            "vertices":[

            ],
            "edges":[
                {
                    "label":"rating",
                    "skip":false,
                    "source":[
                        "UserID"
                    ],
                    "unfold_source":false,
                    "target":[
                        "MovieID"
                    ],
                    "unfold_target":false,
                    "field_mapping":{
                        "UserID":"id",
                        "MovieID":"id",
                        "Rating":"rate"
                    },
                    "value_mapping":{

                    },
                    "selected":[

                    ],
                    "ignored":[
                        "Timestamp"
                    ],
                    "null_values":[
                        ""
                    ],
                    "update_strategies":{

                    },
                    "batch_size":500
                }
            ]
        }
    ],
    "backendStoreInfo":null
}

seems "backendStoreInfo":null is newly, other problems u could fix it~

haohao0103 · 2022-11-09T11:48:39Z

The configuration information of the storage layer that bulkLoad depends on is specified in struct.json, so backendstoreinfo is added. The follow-up iteration is to obtain the configuration information of the storage layer from the server；

imbajin · 2022-11-09T11:59:17Z

The configuration information of the storage layer that bulkLoad depends on is specified in struct.json, so backendstoreinfo is added. The follow-up iteration is to obtain the configuration information of the storage layer from the server

it's fine, just adopt it in test 😄 (so as other test problem if exists)

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/loader/HBaseDirectLoader.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/util/SinkToHBase.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/executor/LoadOptions.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/spark/HugeGraphSparkLoader.java

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/util/MappingUtil.java

…ck faield

imbajin

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

haohao0103 · 2022-11-10T05:55:19Z

seems "backendStoreInfo":null is newly, other problems u could fix it~

Do I need to solve 3rd dependencies check failed?
I believe many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

ok

haohao0103 · 2022-11-10T05:56:55Z

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1 ?

@simon824 could u exclude it in pom? (like #363)

simon824 · 2022-11-10T08:54:25Z

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1 ?

@simon824 could u exclude it in pom? (like #363)

We can downgrade the version if necessary, hadoop dependency seems can not be excluded , loader needs it to read hdfs files.

haohao0103 · 2022-11-10T09:17:23Z

thanks, we could handle the 3rd dependencies together before release (to avoid waste a lot time on it)

many of the problems are caused by the hadoop-common upgrade from 3.2.4 to 3.3.1 ?
@simon824 could u exclude it in pom? (like #363)

We can downgrade the version if necessary, hadoop dependency seems can not be excluded , loader needs it to read hdfs files.

Yes, loader needs hadoop dependency . Internally, we read data from hdfs and load it into the graph

haohao0103 and others added 18 commits September 16, 2022 17:11

Merge pull request #1 from apache/master

0a87143

合并原仓库代码

Fix bug: SchemaCache init and after query schema then put the latest …

22a8c88

…schema to SchemaCache fix apache#333

Fix bug: SchemaCache init and after query schema then put the latest …

d75f263

…schema to SchemaCache fix apache#333

Fix bug: flink-loader SchemaCache init

d09a1f7

fix apache#333

Merge branch 'apache:master' into master

cf86be5

Merge branch 'apache:master' into master

2550ab8

Merge branch 'apache:master' into master

41e14d9

Merge branch 'apache:master' into master

6f57a3d

Merge branch 'schemaCache-optimize'

2384a94

bulkload代码提交社区第一版

9cef789

Merge branch 'master' into schemaCache-optimize

3c3f5c7

Merge branch 'apache:master' into master

7eca671

Merge branch 'master' of https://github.com/haohao0103/incubator-huge…

eda1413

…graph-toolchain

Merge branch 'master' into schemaCache-optimize

f986dc2

Merge pull request #2 from haohao0103/master

7266c4d

merge master into schemaCache-optimize

Merge branch 'schemaCache-optimize' of https://github.com/haohao0103/…

2575b39

…incubator-hugegraph-toolchain into schemaCache-optimize

指定hbase版本2.2.3

cc4130c

feature: spark-loader bypass server for hbase writing hugegraph-loader

2fd7d1e

fix apache#357

haohao0103 changed the title ~~bypass server for hbase writing hugegraph-loader (BETA)~~ feat(hbase): support gen HFile for hbase(BETA) Nov 7, 2022

imbajin added enhancement New feature or request todo labels Nov 7, 2022

javeme reviewed Nov 7, 2022

View reviewed changes

feature: spark-loader bypass server for hbase writing hugegraph-loader

5e1e9bb

adjust code style fix apache#357

feature: spark-loader bypass server for hbase writing hugegraph-loader

f0d391a

fix third-party dependencies error fix apache#357

javeme reviewed Nov 8, 2022

View reviewed changes

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/loader/HBaseDirectLoader.java Outdated Show resolved Hide resolved

hugegraph-loader/src/main/java/com/baidu/hugegraph/loader/direct/util/SinkToHBase.java Outdated Show resolved Hide resolved

haohao0103 and others added 4 commits November 9, 2022 09:42

feature: spark-loader bypass server for hbase writing hugegraph-loader

b31b355

adjust code style fix apache#357

Merge branch 'apache:master' into schemaCache-optimize

831ad71

adjust code style LoadOptions

78efb78

Merge remote-tracking branch 'origin/schemaCache-optimize' into schem…

84ad319

…aCache-optimize

imbajin previously approved these changes Nov 9, 2022

View reviewed changes

Merge branch 'master' into schemaCache-optimize

8c8d4d5

imbajin dismissed their stale review via 8c8d4d5 November 9, 2022 10:34

adopt the apache commons v1.0

6f66634

javeme reviewed Nov 9, 2022

View reviewed changes

haohao0103 added 2 commits November 9, 2022 22:41

adjust code style LoadOptions,SinkToHBase,HBaseDirectLoader...

dd7e290

adjust code style HugrGraphSparkLoader,MappingUtil

62a2e66

javeme previously approved these changes Nov 9, 2022

View reviewed changes

MappingConverterTest add backendstoreinfo,fix hugegraph-loader-ci che…

e3c8a90

…ck faield

haohao0103 dismissed javeme’s stale review via e3c8a90 November 10, 2022 05:16

imbajin approved these changes Nov 10, 2022

View reviewed changes

imbajin mentioned this pull request Nov 10, 2022

adapt to org.apache package #364

Merged

coderzc approved these changes Nov 10, 2022

View reviewed changes

imbajin changed the title ~~feat(hbase): support gen HFile for hbase(BETA)~~ feat(hbase): support gen HFile for hbase v2 (BETA) Nov 10, 2022

imbajin merged commit a622f98 into apache:master Nov 10, 2022

imbajin linked an issue Nov 10, 2022 that may be closed by this pull request

[Summary] toolchain release v1.0 todo list #340

Closed

26 tasks

JackyYangPassion mentioned this pull request Aug 1, 2024

[Question] 十亿点边如何进行快速导入 apache/incubator-hugegraph#2607

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hbase): support gen HFile for hbase v2 (BETA) #358

feat(hbase): support gen HFile for hbase v2 (BETA) #358

haohao0103 commented Nov 7, 2022 •

edited by imbajin

Loading

imbajin commented Nov 7, 2022

codecov bot commented Nov 7, 2022 •

edited

Loading

JackyYangPassion commented Nov 7, 2022

imbajin commented Nov 7, 2022

haohao0103 commented Nov 7, 2022

imbajin commented Nov 8, 2022

javeme left a comment

haohao0103 commented Nov 9, 2022

imbajin commented Nov 9, 2022 •

edited

Loading

haohao0103 commented Nov 9, 2022

imbajin commented Nov 9, 2022

imbajin left a comment

haohao0103 commented Nov 10, 2022

haohao0103 commented Nov 10, 2022 •

edited by imbajin

Loading

simon824 commented Nov 10, 2022

haohao0103 commented Nov 10, 2022

feat(hbase): support gen HFile for hbase v2 (BETA) #358

feat(hbase): support gen HFile for hbase v2 (BETA) #358

Conversation

haohao0103 commented Nov 7, 2022 • edited by imbajin Loading

imbajin commented Nov 7, 2022

codecov bot commented Nov 7, 2022 • edited Loading

Codecov Report

JackyYangPassion commented Nov 7, 2022

imbajin commented Nov 7, 2022

haohao0103 commented Nov 7, 2022

imbajin commented Nov 8, 2022

javeme left a comment

Choose a reason for hiding this comment

haohao0103 commented Nov 9, 2022

imbajin commented Nov 9, 2022 • edited Loading

haohao0103 commented Nov 9, 2022

imbajin commented Nov 9, 2022

imbajin left a comment

Choose a reason for hiding this comment

haohao0103 commented Nov 10, 2022

haohao0103 commented Nov 10, 2022 • edited by imbajin Loading

simon824 commented Nov 10, 2022

haohao0103 commented Nov 10, 2022

haohao0103 commented Nov 7, 2022 •

edited by imbajin

Loading

codecov bot commented Nov 7, 2022 •

edited

Loading

imbajin commented Nov 9, 2022 •

edited

Loading

haohao0103 commented Nov 10, 2022 •

edited by imbajin

Loading