Skip to content

git-disl/EllipticPlusPlus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elliptic++ Dataset: A Graph Network of Bitcoin Blockchain Transactions and Wallet Addresses

The Elliptic++ dataset consists of 203k Bitcoin transactions and 822k wallet addresses to enable both the detection of fraudulent transactions and the detection of illicit addresses (actors) in the Bitcoin network by leveraging graph data.

If you have any questions or create something with this dataset, please let us know by email: yelmougy3@gatech.edu.

DATASET CAN BE FOUND HERE: Google Drive

Dataset Summary

The Elliptic++ dataset contains a transactions dataset and an actors (wallet addresses) dataset.

Elliptic++ Transactions Dataset:

# Nodes (transactions) 203,769
# Edges (money flow) 234,355
# Time steps 49
# Illicit (class-1) 4,545
# Licit (class-2) 42,019
# Unknown (class-3) 157,205
# Features 183

Elliptic++ Actors (Wallet Addresses) Dataset:

# Wallet addresses 822,942
# Nodes (temporal interactions) 1,268,260
# Edges (addr-addr) 2,868,964
# Edges (addr-tx-addr) 1,314,241
# Time steps 49
# Illicit (class-1) 14,266
# Licit (class-2) 251,088
# Unknown (class-3) 557,588
# Features 56

DATASET CAN BE FOUND HERE: Google Drive

Dataset Tutorials

We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks are available for both datasets and cover dataset statistics, graph visualization, model training and classification, case analysis, and feature refinement.

Transactions dataset statistics : overall transactions data statistics.

txsstats

Actors dataset statistics : overall actors data statistics.

addrstats

Transactions graph visualization : visualizations of the Money Flow Transaction graph (tx-tx graph).

txsviz

Actors graph visualization (Actor Interaction) : visualizations of the Actor Interaction graph (addr-addr graph).

actorvizaddr

Actors graph visualization (Address-Transaction) : visualizations of the Address-Transaction graph (addr-tx-addr graph).

actorvizaddrtx

Transactions classification : model training and classification on the transactions data.

txsclassification

Actors classification : model training and classification on the actors data.

actorclassification

Transactions case analysis : unique case (EASY, HARD, AVERAGE) analysis using the transactions data.

txscaseanalysis

Transactions feature analysis : feature importance analysis of the transactions data.

txsfeatureanalysis

Actors feature analysis : feature importance analysis of the actors data.

actorsfeatureanalysis

Top-Level Directory Organization

The folder structure of this dataset repository is as follows:

.
├── Transactions Dataset                                    # Contains csv files and tutorial notebooks for the Elliptic++ Transactions Dataset
│   ├── txs_features.csv                                    # Feature data for all transactions
│   ├── txs_classes.csv                                     # Class data for all transactions
│   ├── txs_edgelist.csv                                    # Transaction-Transaction graph edgelist
│   ├── Elliptic++ Transactions Dataset Statistics.ipynb    # Tutorial notebook: dataset statistics
│   ├── Elliptic++ Transactions Graph Visualization.ipynb   # Tutorial notebook: transaction-transaction graph visualization
│   ├── Elliptic++ Transactions Classification.ipynb        # Tutorial notebook: model training and classification
│   ├── Elliptic++ Transactions Case Analysis.ipynb         # Tutorial notebook: Unique case (EASY, HARD, AVERAGE) analysis
│   └── Elliptic++ Transactions Feature Analysis.ipynb      # Tutorial notebook: feature importance analysis
├── Actors Dataset                                          # Contains csv files and tutorial notebooks for the Elliptic++ Actors Dataset
│   ├── wallets_features.csv                                # Feature data for all actors
│   ├── wallets_classes.csv                                 # Class data for all actors
│   ├── AddrAddr_edgelist.csv                               # Address-Address graph edgelist
│   ├── AddrTx_edgelist.csv                                 # Address-Transaction graph edgelist
│   ├── TxAddr_edgelist.csv                                 # Transaction-Address graph edgelist
│   ├── Elliptic++ Actors Dataset Statistics.ipynb          # Tutorial notebook: dataset statistics
│   ├── Elliptic++ Actors ActorInteraction Graph Viz.ipynb  # Tutorial notebook: address-address graph visualization
│   ├── Elliptic++ Actors AddrTx Graph Viz.ipynb            # Tutorial notebook: address-transaction-address graph visualization
│   ├── Elliptic++ Actors Classification.ipynb              # Tutorial notebook: model training and classification
│   └── Elliptic++ Actors Feature Analysis.ipynb            # Tutorial notebook: feature importance analysis
└── README.md

DATASET CAN BE FOUND HERE: Google Drive

Citation

If you use our dataset in your work, please cite our paper.

Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3580305.3599803

For a longer version of the paper, please refer to our ArXiv paper: ArXiv version

@article{elmougy2023demystifying,
  title={Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics},
  author={Elmougy, Youssef and Liu, Ling},
  journal={arXiv preprint arXiv:2306.06108},
  year={2023}
}

Acknowledgement

Released by: Youssef Elmougy, Ling Liu

School of Computer Science, Georgia Institute of Technology

If you have any questions or create something with this dataset, please let us know by email: yelmougy3@gatech.edu.

DATASET CAN BE FOUND HERE: Google Drive