This repository has been archived by the owner on May 3, 2022. It is now read-only.
Graph-Based Modeling for Anti-Gaming and Coverage Analysis #23
Labels
2.accept
accepted, move to contracting
cat.Tools/Infrastructure
category of application: Tools/Infrastructure
Data Analytics
Project:
Adaptive Network Modeling using Graph-Based Representations
Elevator Pitch:
Helium's Blockchain API is an effective way to view historical data stored on-chain, but the ledger-based format is less useful for feeding directly into network models. In this project, we propose to build a framework for a graph-based representation of blockchain activity, including Proof of Coverage and Token Flow. By capturing the natural adjacency between hotspots and accounts, we will be able to build machine learning models to, for instance, identify likely "gaming" behavior and predict coverage maps based on hotspot placement.
Total fiat/hnt ask:
18750 USD
Name and Address:
Please provide your legal name and a link to the submitted issue to grants@dewi.org.
This will streamline the contract process and KYC. A lack of this information will delay the contract.
Team or projects social: (optional)
LinkedIn
About the Applicant:
Evan is a graduate student with years of experience applying machine learning to messy datasets. A longtime member of the Helium Ecosystem, his team won the Grand Prize in the Hackster.io #IoTForGood contest for their predictive beehive monitoring system. He also maintains py-helium-console-client, a Python wrapper for the Console HTTP API. Evan fully embraces open source development and documents his projects in Medium publications like Towards Data Science and Better Programming.
Github (evandiewald)
Project Details:
The goal of this project is to create a dynamic, graph-based representation of the Helium Network and develop a preliminary suite of real-time analysis tools to characterize concepts like token flow, coverage mapping, and anomalous hotspot activity. Because Network Graphs natively capture the adjacency between nodes, they are widely used in a variety of applications, including search engines, social media platforms, and even biology. This data structure is also advantageous for the Helium Blockchain, which contains a number of connected elements, such as:
With this representation in place, we can leverage decades of research in graph theory to extract insights about network behavior. For example, Betweenness Centrality, which uses shortest path metrics to identify the nodes that uniquely connect disparate portions of a graph, has been used to identify Reddit communities with the most influence on pop culture. In the context of Proof of Coverage, betweenness can help us find the hotspots that - through witness paths - connect distinct neighborhoods in a city (see below).
In addition to position, we can also apply relevant features to each node, such as local elevation and PoCv11 antenna characteristics, as well as each edge, like the reported RSSI of that witness path. As demonstrated in this blog post, we can use these features to train Graph Neural Networks for the purpose of, for instance, anomaly detection and predictive modeling.
The interpretability of Proof of Coverage is a double-edged sword. On one hand, mining rewards incentivize productive participants to optimize network coverage through well-defined criteria for hotspot placement and configuration. However, these rules also provide convenient thresholds for malicious actors to work around. Alternatively, AI-based approaches can be used to identify nonlinear decision boundaries that are more difficult to circumvent. They also have the benefit of real-time optimization when trained on continuously-evolving datasets. While we are not proposing that such a scheme be implemented in the core consensus protocol, it may be useful for analytics, including gaming detection and predictive modeling. For example, given a certain layout of hotspots in a region, what can we expect the coverage map to look like?
From the perspective of Helium's economics, graphs can also inherently capture concepts like token flow between wallets and exchanges, as well as hotspot ownership. While this information can be extracted from the official Helium API, by storing the data in a native graph database platform (such as the open-source ArangoDB), adjacency is expressed directly, which simplifies analytics and visualization tools.
Technical Objectives:
Graph Database and Extraction Toolkit: Establish a scalable & modular pipeline for generating and storing the graphs in a database, likely ArangoDB. We will create an API with methods for common queries (e.g. get the graph for a given city), as well as an open-source Python library to transform the extracted graphs into analytics-friendly formats, such as NetworkX and PyTorch Geometric. We want to provide Helium and the community with all the tools they need to leverage the dataset in their own analysis pipelines.
Graph Development: Explore different ideas for the graphs themselves, regarding the scale and nature of the adjacency matrix. In the demo implementation, the global Helium Network was segmented on a city-to-city basis, but it may also make sense to try kRings or other, more localized representations.
Feature Engineering: With the advent of POCv11, we can also incorporate local regulations and antenna setups into the feature set, which will help us characterize regional differences and improve our ability to spot anomalous activity. A stretch goal would be to incorporate features that are not stored on-chain, but would be useful for modeling, e.g. local geography and/or elevation.
Anomaly Detection: Develop a proof-of-concept, real-time anomaly detection model. We will explore Graph Neural Network-based architectures, as well as more conventional clustering approaches, like PCA and t-SNE (the idea being to capture the main distribution of "nominal" hotspots, where outliers fall somewhere outside that main cluster).
Modeling Coverage Maps: Develop a predictive model that, given a certain arrangement of hotspots, generates the expected coverage map, rewards scales, and/or witness paths.
Dashboard: Finally, we will present these results to the community with a preliminary visualization tool. A live dashboard indicating, for instance, how many "outlier" hotspots we are detecting at any given moment, how much HNT is being lost due to these bad actors, as well as a graph of real-time token flow. In addition to useful metrics, this should give us a sense of the scalability and stability of the ETL pipeline.
Roadmap:
The text was updated successfully, but these errors were encountered: