Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hbase): support gen HFile for hbase v2 (BETA) #358

Merged
merged 40 commits into from
Nov 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
0a87143
Merge pull request #1 from apache/master
haohao0103 Sep 16, 2022
22a8c88
Fix bug: SchemaCache init and after query schema then put the latest …
haohao0103 Sep 16, 2022
d75f263
Fix bug: SchemaCache init and after query schema then put the latest …
haohao0103 Sep 20, 2022
d09a1f7
Fix bug: flink-loader SchemaCache init
haohao0103 Sep 20, 2022
cf86be5
Merge branch 'apache:master' into master
haohao0103 Oct 21, 2022
2550ab8
Merge branch 'apache:master' into master
haohao0103 Oct 27, 2022
41e14d9
Merge branch 'apache:master' into master
haohao0103 Nov 2, 2022
6f57a3d
Merge branch 'apache:master' into master
haohao0103 Nov 4, 2022
2384a94
Merge branch 'schemaCache-optimize'
haohao0103 Nov 4, 2022
9cef789
bulkload代码提交社区第一版
Oct 31, 2022
3c3f5c7
Merge branch 'master' into schemaCache-optimize
haohao0103 Nov 4, 2022
7eca671
Merge branch 'apache:master' into master
haohao0103 Nov 7, 2022
eda1413
Merge branch 'master' of https://github.com/haohao0103/incubator-huge…
haohao0103 Nov 7, 2022
f986dc2
Merge branch 'master' into schemaCache-optimize
haohao0103 Nov 7, 2022
7266c4d
Merge pull request #2 from haohao0103/master
haohao0103 Nov 7, 2022
2575b39
Merge branch 'schemaCache-optimize' of https://github.com/haohao0103/…
haohao0103 Nov 7, 2022
cc4130c
指定hbase版本2.2.3
haohao0103 Nov 7, 2022
2fd7d1e
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 7, 2022
5e1e9bb
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 7, 2022
f0d391a
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
2422c3f
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
b9fb58c
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
077b315
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
e73944f
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
4491db7
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
2bc568b
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
286da97
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
1d5fe80
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
b5a9a2e
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
32035f7
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
6629019
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 8, 2022
b31b355
feature: spark-loader bypass server for hbase writing hugegraph-loader
haohao0103 Nov 9, 2022
831ad71
Merge branch 'apache:master' into schemaCache-optimize
haohao0103 Nov 9, 2022
78efb78
adjust code style LoadOptions
haohao0103 Nov 9, 2022
84ad319
Merge remote-tracking branch 'origin/schemaCache-optimize' into schem…
haohao0103 Nov 9, 2022
8c8d4d5
Merge branch 'master' into schemaCache-optimize
imbajin Nov 9, 2022
6f66634
adopt the apache commons v1.0
imbajin Nov 9, 2022
dd7e290
adjust code style LoadOptions,SinkToHBase,HBaseDirectLoader...
haohao0103 Nov 9, 2022
62a2e66
adjust code style HugrGraphSparkLoader,MappingUtil
haohao0103 Nov 9, 2022
e3c8a90
MappingConverterTest add backendstoreinfo,fix hugegraph-loader-ci che…
haohao0103 Nov 10, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions hugegraph-dist/scripts/dependency/known-dependencies.txt
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,24 @@ hadoop-yarn-client-3.1.1.jar
hadoop-yarn-common-3.1.1.jar
hamcrest-2.1.jar
hamcrest-core-1.3.jar
hbase-client-2.2.3.jar
hbase-common-2.2.3.jar
hbase-hadoop-compat-2.2.3.jar
hbase-hadoop2-compat-2.2.3.jar
hbase-http-2.2.3.jar
hbase-mapreduce-2.2.3.jar
hbase-metrics-2.2.3.jar
hbase-metrics-api-2.2.3.jar
hbase-procedure-2.2.3.jar
hbase-protocol-2.2.3.jar
hbase-protocol-shaded-2.2.3.jar
hbase-replication-2.2.3.jar
hbase-server-2.2.3.jar
hbase-shaded-client-byo-hadoop-2.2.3.jar
hbase-shaded-miscellaneous-2.2.1.jar
hbase-shaded-netty-2.2.1.jar
hbase-shaded-protobuf-2.2.1.jar
hbase-zookeeper-2.2.3.jar
hibernate-validator-6.0.17.Final.jar
hive-classification-3.1.2.jar
hive-classification-3.1.3.jar
Expand Down Expand Up @@ -225,6 +243,7 @@ hk2-utils-3.0.1.jar
hppc-0.7.2.jar
htrace-core4-4.0.1-incubating.jar
htrace-core4-4.1.0-incubating.jar
htrace-core4-4.2.0-incubating.jar
httpclient-4.5.13.jar
httpclient-4.5.2.jar
httpclient-4.5.9.jar
Expand Down Expand Up @@ -578,3 +597,11 @@ zookeeper-3.4.9.jar
zookeeper-3.6.2.jar
zookeeper-jute-3.6.2.jar
zstd-jni-1.5.0-4.jar
disruptor-3.3.6.jar
findbugs-annotations-1.3.9-1.jar
jamon-runtime-2.4.1.jar
javax.el-3.0.1-b12.jar
javax.servlet.jsp-2.3.2.jar
javax.servlet.jsp-api-2.3.1.jar
jcodings-1.0.18.jar
joni-2.1.11.jar
2 changes: 1 addition & 1 deletion hugegraph-loader/assembly/static/bin/get-params.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ function get_params() {
--incremental-mode | --failure-mode | --batch-insert-threads | --single-insert-threads | \
--max-conn | --max-conn-per-route | --batch-size | --max-parse-errors | --max-insert-errors | \
--timeout | --shutdown-timeout | --retry-times | --retry-interval | --check-vertex | \
--print-progress | --dry-run | --help)
--print-progress | --dry-run | --sink-type | --vertex-partitions | --edge-partitions | --help )
HUGEGRAPH_PARAMS="$HUGEGRAPH_PARAMS $1 $2"
shift 2
;;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"source_name": "marko", "target_id": 1, "date": "2017-12-10", "weight": 0.4}
{"source_name": "josh", "target_id": 1, "date": "2009-11-11", "weight": 0.4}
{"source_name": "josh", "target_id": 2, "date": "2017-12-10", "weight": 1.0}
{"source_name": "peter", "target_id": 1, "date": "2017-03-24", "weight": 0.2}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"source_name": "marko", "target_name": "vadas", "date": "20160110", "weight": 0.5}
{"source_name": "marko", "target_name": "josh", "date": "20130220", "weight": 1.0}
33 changes: 33 additions & 0 deletions hugegraph-loader/assembly/static/example/spark/schema.groovy
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// Define schema
schema.propertyKey("name").asText().ifNotExist().create();
schema.propertyKey("age").asInt().ifNotExist().create();
schema.propertyKey("city").asText().ifNotExist().create();
schema.propertyKey("weight").asDouble().ifNotExist().create();
schema.propertyKey("lang").asText().ifNotExist().create();
schema.propertyKey("date").asText().ifNotExist().create();
schema.propertyKey("price").asDouble().ifNotExist().create();

schema.vertexLabel("person")
.properties("name", "age", "city")
.primaryKeys("name")
.nullableKeys("age", "city")
.ifNotExist()
.create();
schema.vertexLabel("software")
.properties("name", "lang", "price")
.primaryKeys("name")
.ifNotExist()
.create();

schema.edgeLabel("knows")
.sourceLabel("person")
.targetLabel("person")
.properties("date", "weight")
.ifNotExist()
.create();
schema.edgeLabel("created")
.sourceLabel("person")
.targetLabel("software")
.properties("date", "weight")
.ifNotExist()
.create();
57 changes: 57 additions & 0 deletions hugegraph-loader/assembly/static/example/spark/struct.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
"vertices": [
{
"label": "person",
"input": {
"type": "file",
"path": "example/spark/vertex_person.json",
"format": "JSON",
"header": ["name", "age", "city"],
"charset": "UTF-8",
"skipped_line": {
"regex": "(^#|^//).*"
}
},
"id": "name",
"null_values": ["NULL", "null", ""]
},
{
"label": "software",
"input": {
"type": "file",
"path": "example/spark/vertex_software.json",
"format": "JSON",
"header": ["id","name", "lang", "price","ISBN"],
"charset": "GBK"
},
"id": "name",
"ignored": ["ISBN"]
}
],
"edges": [
{
"label": "knows",
"source": ["source_name"],
"target": ["target_name"],
"input": {
"type": "file",
"path": "example/spark/edge_knows.json",
"format": "JSON",
"date_format": "yyyyMMdd",
"header": ["source_name","target_name", "date", "weight"]
},
"field_mapping": {
"source_name": "name",
"target_name": "name"
}
}
],
"backendStoreInfo":
{
"edge_tablename": "hugegraph:g_oe",
"vertex_tablename": "hugegraph:g_v",
"hbase_zookeeper_quorum": "127.0.0.1",
"hbase_zookeeper_property_clientPort": "2181",
"zookeeper_znode_parent": "/hbase"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"name": "marko", "age": "29", "city": "Beijing"}
{"name": "vadas", "age": "27", "city": "Hongkong"}
{"name": "josh", "age": "32", "city": "Beijing"}
{"name": "peter", "age": "35", "city": "Shanghai"}
{"name": "li,nary", "age": "26", "city": "Wu,han"}
{"name": "tom", "age": "null", "city": "NULL"}
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{ "name": "lop", "lang": "java","price": "328","ISBN": "ISBN978-7-107-18618-5"}
{ "name": "ripple", "lang": "java","price": "199","ISBN": "ISBN978-7-100-13678-5"}
127 changes: 127 additions & 0 deletions hugegraph-loader/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,117 @@
<version>30.0-jre</version>
</dependency>

<!--hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>com.github.stephenc.findbugs</groupId>
<artifactId>findbugs-annotations</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-json</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<artifactId>hadoop-auth</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-shaded-client-byo-hadoop</artifactId>
<!--<artifactId>hbase-client</artifactId>-->
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-ipc</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-mapreduce</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
<artifactId>hadoop-auth</artifactId>
<groupId>org.apache.hadoop</groupId>
</exclusion>
<exclusion>
<artifactId>javax.ws.rs-api</artifactId>
<groupId>javax.ws.rs</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-client</artifactId>
<groupId>org.glassfish.jersey.core</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-common</artifactId>
<groupId>org.glassfish.jersey.core</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-server</artifactId>
<groupId>org.glassfish.jersey.core</groupId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-distcp</artifactId>
</exclusion>
</exclusions>
</dependency>

<!-- hadoop dependency -->
<dependency>
<groupId>org.apache.hadoop</groupId>
Expand Down Expand Up @@ -342,10 +453,18 @@
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop-bundle</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
Expand All @@ -354,6 +473,14 @@
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-distcp</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import com.baidu.hugegraph.structure.schema.VertexLabel;
import org.apache.hugegraph.util.E;
import com.google.common.collect.ImmutableList;
import org.apache.spark.sql.Row;

public class EdgeBuilder extends ElementBuilder<Edge> {

Expand Down Expand Up @@ -109,6 +110,53 @@ public List<Edge> build(String[] names, Object[] values) {
}
return edges;
}

@Override
public List<Edge> build(Row row) {
String[] names = row.schema().fieldNames();
Object[] values = new Object[row.size()];
for (int i = 0; i < row.size(); i++) {
values[i] = row.get(i);
}
if (this.vertexIdsIndex == null ||
!Arrays.equals(this.lastNames, names)) {
this.vertexIdsIndex = this.extractVertexIdsIndex(names);
}

this.lastNames = names;
EdgeKVPairs kvPairs = this.newEdgeKVPairs();
kvPairs.source.extractFromEdge(names, values,
this.vertexIdsIndex.sourceIndexes);
kvPairs.target.extractFromEdge(names, values,
this.vertexIdsIndex.targetIndexes);
kvPairs.extractProperties(names, values);

List<Vertex> sources = kvPairs.source.buildVertices(false);
List<Vertex> targets = kvPairs.target.buildVertices(false);
if (sources.isEmpty() || targets.isEmpty()) {
return ImmutableList.of();
}
E.checkArgument(sources.size() == 1 || targets.size() == 1 ||
sources.size() == targets.size(),
"The elements number of source and target must be: " +
"1 to n, n to 1, n to n");
int size = Math.max(sources.size(), targets.size());
List<Edge> edges = new ArrayList<>(size);
for (int i = 0; i < size; i++) {
Vertex source = i < sources.size() ?
sources.get(i) : sources.get(0);
Vertex target = i < targets.size() ?
targets.get(i) : targets.get(0);
Edge edge = new Edge(this.mapping.label());
edge.source(source);
edge.target(target);
// Add properties
this.addProperties(edge, kvPairs.properties);
this.checkNonNullableKeys(edge);
edges.add(edge);
}
return edges;
}

private EdgeKVPairs newEdgeKVPairs() {
EdgeKVPairs kvPairs = new EdgeKVPairs();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
import org.apache.hugegraph.util.E;
import org.apache.hugegraph.util.LongEncoding;
import com.google.common.collect.ImmutableList;
import org.apache.spark.sql.Row;

public abstract class ElementBuilder<GE extends GraphElement> {

Expand All @@ -75,6 +76,8 @@ public ElementBuilder(LoadContext context, InputStruct struct) {

public abstract List<GE> build(String[] names, Object[] values);

public abstract List<GE> build(Row row);

public abstract SchemaLabel schemaLabel();

protected abstract Collection<String> nonNullableKeys();
Expand Down
Loading