feat: phase 1 of networkx/nebula engine, writer design

- nebula engine reader, algo passed - API docs - writer design proposed in examples partially-implement: #28
wey-gu · Mar 26, 2023 · 2691dc4 · 2691dc4
1 parent e47972d
commit 2691dc4
Show file tree

Hide file tree

Showing 9 changed files with 383 additions and 75 deletions.
diff --git a/docs/API.md b/docs/API.md
@@ -81,13 +81,31 @@ df = reader.read() # this will take some time
 df.show(10)
 ```
 
+#### NebulaGraph Engine(NetworkX)
+
+```python
+from ng_ai import NebulaReader
+from ng_ai.config import NebulaGraphConfig
+# read data with spark engine, query mode
+config_dict = {
+    "graphd_hosts": "127.0.0.1:9669",
+    "user": "root",
+    "password": "nebula",
+    "space": "basketballplayer",
+}
+config = NebulaGraphConfig(**config_dict)
+reader = NebulaReader(engine="nebula", config=config)
+reader.query(edges=["follow", "serve"], props=[["degree"],[]])
+g = reader.read()
+g.show(10)
+g.draw()
+```
+
 ## engines
 
 - `ng_ai.engines.SparkEngine` is the Spark Engine for `ng_ai.NebulaReader`, `ng_ai.NebulaWriter` and `ng_ai.NebulaAlgorithm`.
 
-- `ng_ai.engines.NebulaEngine` is the NebulaGraph Engine for `ng_ai.NebulaReader`, `ng_ai.NebulaWriter`.
-
-- `ng_ai.engines.NetworkXEngine` is the NetworkX Engine for `ng_ai.NebulaAlgorithm`.
+- `ng_ai.engines.NebulaEngine` is the NebulaGraph Engine for `ng_ai.NebulaReader`, `ng_ai.NebulaWriter` and `ng_ai.NebulaAlgorithm`, which is based on NetworkX and Nebula-Python.
 
 ## `NebulaDataFrameObject`
 

diff --git a/examples/networkx_engine.ipynb b/examples/networkx_engine.ipynb
@@ -0,0 +1,304 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a54fe998",
+   "metadata": {},
+   "source": [
+    "![image](https://user-images.githubusercontent.com/1651790/221876073-61ef4edb-adcd-4f10-b3fc-8ddc24918ea1.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f46fdd40",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# install ng_ai in the first run\n",
+    "!pip install ng_ai[networkx]"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "5b4e4143",
+   "metadata": {},
+   "source": [
+    "## AI Suite NetworkX Engine Examples\n",
+    "### read data with NetowrkX engine, query mode"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "f17abcf8",
+   "metadata": {},
+   "source": [
+    "In this example, we are leveraging the NetworkX Engine of NebulaGraph AI Suite, with the GraphD Query mode.\n",
+    "\n",
+    "#### Step 1, get dataframe by Querying the Graph\n",
+    "\n",
+    "We will scan all edge in type `follow` and `serve` first with props `degree` in `follow` and no props in `serve` as graph: `g`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e158440f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ng_ai import NebulaReader\n",
+    "from ng_ai.config import NebulaGraphConfig\n",
+    "\n",
+    "# read data with spark engine, query mode\n",
+    "config_dict = {\n",
+    "    \"graphd_hosts\": \"graphd:9669\",\n",
+    "    \"user\": \"root\",\n",
+    "    \"password\": \"nebula\",\n",
+    "    \"space\": \"basketballplayer\",\n",
+    "}\n",
+    "config = NebulaGraphConfig(**config_dict)\n",
+    "reader = NebulaReader(engine=\"nebula\", config=config)\n",
+    "reader.query(edges=[\"follow\", \"serve\"], props=[[\"degree\"], []])\n",
+    "g = reader.read()\n",
+    "g.show(10)\n",
+    "g.draw()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3617de5f",
+   "metadata": {},
+   "source": [
+    "#### Step 2, run Pagerank Algorithm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "90069aaf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pr_result = g.algo.pagerank(reset_prob=0.15, max_iter=10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66e70ca0",
+   "metadata": {},
+   "source": [
+    "#### Step 3, check results of the algorithm\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "abbce2fa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "+---------+-------------------+\n",
+      "|      _id|           pagerank|\n",
+      "+---------+-------------------+\n",
+      "|player133|0.18601069183310504|\n",
+      "|player126|0.18601069183310504|\n",
+      "|player130|  1.240071278887367|\n",
+      "|player108|0.18601069183310504|\n",
+      "|player102| 1.6602373739502536|\n",
+      "+---------+-------------------+\n",
+      "only showing top 5 rows\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "pr_result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49becbdb",
+   "metadata": {},
+   "source": [
+    "#### Step 2, run Conncted Components Algorithm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cfbcda82",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cc_result = g.algo.connected_components(max_iter=10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38181d45",
+   "metadata": {},
+   "source": [
+    "#### Step 3, check results of the algorithm\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bed14375",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cc_result"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "3d088006",
+   "metadata": {},
+   "source": [
+    "### Write back algo result to NebulaGraph\n",
+    "\n",
+    "Assume that we have a result `graph_result` computed with `g.algo.pagerank()`:\n",
+    "\n",
+    "```python\n",
+    "{'player102': 0.014770646980811417,\n",
+    " 'player100': 0.02878478843123552,\n",
+    " 'player101': 0.020163880830622937,\n",
+    " 'player129': 0.012381302535422786,\n",
+    " 'player116': 0.015041184157101154,\n",
+    " 'player121': 0.012178909379871223,\n",
+    " 'player128': 0.010197889677928056,\n",
+    "...\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "Let's write them back to tag: pagerank(pagerank). So we create a TAG `pagerank` in NebulaGraph on same space with the following schema:\n",
+    "\n",
+    "```ngql\n",
+    "CREATE TAG IF NOT EXISTS pagerank (\n",
+    "    pagerank double NOT NULL\n",
+    ");\n",
+    "```\n",
+    "\n",
+    "Then, we could write the pagerank result to NebulaGraph, to tag `pagerank` with property `pagerank`:\n",
+    "\n",
+    "```python\n",
+    "properties = [\"pagerank\"]\n",
+    "```\n",
+    "And pass it to NebulaWriter in `nebula` engine and `nebulagraph_vertex` sink"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b43261f",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# Run pagerank Algorithm\n",
+    "graph_result = g.algo.pagerank()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "c5bbf9e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ng_ai import NebulaWriter\n",
+    "from ng_ai.config import NebulaGraphConfig\n",
+    "\n",
+    "config = NebulaGraphConfig()\n",
+    "writer = NebulaWriter(\n",
+    "    data=graph_result, sink=\"nebulagraph_vertex\", config=config, engine=\"nebula\"\n",
+    ")\n",
+    "\n",
+    "# properties to write\n",
+    "properties = [\"pagerank\"]\n",
+    "\n",
+    "writer.set_options(\n",
+    "    tag=\"pagerank\",\n",
+    "    vid_field=\"_id\",\n",
+    "    properties=properties,\n",
+    "    batch_size=256,\n",
+    "    write_mode=\"insert\",\n",
+    ")\n",
+    "# write back to NebulaGraph\n",
+    "writer.write()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "9da30271",
+   "metadata": {},
+   "source": [
+    "Then we could query the result in NebulaGraph:\n",
+    "\n",
+    "```cypher\n",
+    "MATCH (v:pagerank)\n",
+    "RETURN id(v), v.pagerank.pagerank LIMIT 10;\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bcb02e2",
+   "metadata": {},
+   "source": [
+    "## How to run other algorithm examples"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff5a866d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# lpa_result  = df.algo.label_propagation()\n",
+    "# louvain_result = df.algo.louvain()\n",
+    "# k_core_result = df.algo.k_core()\n",
+    "# degree_statics_result = df.algo.degree_statics()\n",
+    "# betweenness_centrality_result = df.algo.betweenness_centrality()\n",
+    "# coefficient_centrality_result = df.algo.coefficient_centrality()\n",
+    "# bfs_result = df.algo.bfs()\n",
+    "# hanp_result = df.algo.hanp()\n",
+    "# jaccard_result = df.algo.jaccard()\n",
+    "# strong_connected_components_result = df.algo.strong_connected_components()\n",
+    "# triangle_count_result = df.algo.triangle_count()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/ng_ai/engines.py b/ng_ai/engines.py
@@ -121,15 +121,18 @@ def __init__(self, config=None):
         import networkx as nx
         import ng_nx
         from ng_nx import NebulaReader as NxReader
-        from ng_nx import NxScanReader, NxWriter
-        from ng_nx.utils import NxConfig, result_to_df
+        from ng_nx import NebulaScanReader as NxScanReader
+        from ng_nx import NebulaWriter as NxWriter
+        from ng_nx.utils import NebulaGraphConfig as NxConfig
+        from ng_nx.utils import result_to_df
 
         self.nx = nx
         self.ng_nx = ng_nx
         self.nx_reader = NxReader
         self.nx_writer = NxWriter
         self.nx_scan_reader = NxScanReader
         self._nx_config = NxConfig
+        self.nx_config = None
 
         self.result_to_df = result_to_df