Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: writer with spark engine and nebula sink #16

Merged
merged 1 commit into from
Mar 1, 2023

Conversation

wey-gu
Copy link
Owner

@wey-gu wey-gu commented Mar 1, 2023

  • API docs polished
  • writer added
  • 0.1.9 rc

close: #2

Doc

NebulaWriter

ngdi.NebulaWriter writes the computed or queried data to different sinks.
Supported sinks include:

  • NebulaGraph(Spark Engine, NebulaGraph Engine)
  • CSV(Spark Engine, NebulaGraph Engine)
  • S3(Spark Engine, NebulaGraph Engine), not yet implemented.

Functions

  • ngdi.NebulaWriter.options() sets the options for the sink.
  • ngdi.NebulaWriter.write() writes the data to the sink.
  • ngdi.NebulaWriter.show_options() shows the options for the sink.

Examples

Spark Engine

  • NebulaGraph sink

Assume that we have a Spark DataFrame df_result computed with df.algo.louvain() with the following schema:

df_result.printSchema()
# result:
root
 |-- _id: string (nullable = false)
 |-- louvain: string (nullable = false)

We created a TAG louvain in NebulaGraph on same space with the following schema:

CREATE TAG IF NOT EXISTS louvain (
    cluster_id string NOT NULL
);

Then, we could write the louvain result to NebulaGraph, map the column louvain to cluster_id with the following code:

from ngdi import NebulaWriter
from ngdi.config import NebulaGraphConfig

config = NebulaGraphConfig()

properties = {
    "louvain": "cluster_id"
}

writer = NebulaWriter(data=df_result, sink="nebulagraph_vertex", config=config, engine="spark")
writer.set_options(
    tag="louvain",
    vid_field="_id",
    properties=properties,
    batch_size=256,
    write_mode="insert",
)
writer.write()

Then we could query the result in NebulaGraph:

MATCH (v:louvain)
RETURN id(v), v.louvain.cluster_id LIMIT 10;

- API docs polished
- writer added
- 0.1.9 rc
@wey-gu wey-gu merged commit bebf5eb into main Mar 1, 2023
@wey-gu wey-gu deleted the writer_sparkengine_nebula_sink branch March 1, 2023 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spark engine: writer to sink to files or NebulaGraph
1 participant