Skip to content

Commit

Permalink
Merge pull request #57 from shenyangHuang/andy
Browse files Browse the repository at this point in the history
update
  • Loading branch information
shenyangHuang authored Sep 27, 2023
2 parents 20ac5f0 + ccfcecd commit 6d859be
Show file tree
Hide file tree
Showing 18 changed files with 207 additions and 1,261 deletions.
45 changes: 29 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
<!-- # TGB -->
![TGB logo](imgs/logo.png)

**Temporal Graph Benchmark for Machine Learning on Temporal Graphs** (NeurIPS 2023 Datasets and Benchmarks Track)
<h4>
<a href="https://arxiv.org/abs/2307.01026"><img src="https://img.shields.io/badge/arXiv-pdf-yellowgreen"></a>
<a href="https://pypi.org/project/py-tgb/"><img src="https://img.shields.io/pypi/v/py-tgb.svg?color=brightgreen"></a>
<a href="https://tgb.complexdatalab.com/"><img src="https://img.shields.io/badge/website-blue"></a>
<a href="https://docs.tgb.complexdatalab.com/"><img src="https://img.shields.io/badge/docs-orange"></a>
</h4>
Temporal Graph Benchmark for Machine Learning on Temporal Graphs
</h4>


Overview of the Temporal Graph Benchmark (TGB) pipeline:
- TGB includes large-scale and realistic datasets from five different domains with both dynamic link prediction and node property prediction tasks
- TGB includes large-scale and realistic datasets from five different domains with both dynamic link prediction and node property prediction tasks.
- TGB automatically downloads datasets and processes them into `numpy`, `PyTorch` and `PyG compatible TemporalData` formats.
- Novel TG models can be easily evaluated on TGB datasets via reproducible and realistic evaluation protocols.
- TGB provides public and online leaderboards to track recent developments in temporal graph learning domain
- TGB provides public and online leaderboards to track recent developments in temporal graph learning domain.

![TGB dataloading and evaluation pipeline](imgs/pipeline.png)

Expand All @@ -24,22 +24,23 @@ Overview of the Temporal Graph Benchmark (TGB) pipeline:

### Annoucements

**Please update to version `0.8.0`**
**Excited to annouce that TGB has been accepted to NeurIPS 2023 Datasets and Benchmarks Track!**

#### version `0.8.0`
Thanks to everyone for your help in improving TGB! we will continue to improve TGB based on your feedback and suggestions.

fixing metric computation issue in node property prediction task, `tgbn` leaderboards results are updated to reflect the changes.
Please refer to `examples/nodeproppred/` example folders to how to compute the metric correctly. No changes for `linkproppred` datasets.

**Please update to version `0.9.0`**

#### version `0.7.5`
#### version `0.9.0`

the negative samples for the `tgbl-wiki` and `tgbl-review` dataset has been updated and redownload of the dataset would be needed (will be prompted automatically in this version when you use the dataloader)
Added the large `tgbn-token` dataset with 72 million edges to the `nodeproppred` dataset.

fixed errors in `tgbl-coin` and `tgbl-flight` where a small set of edges are not sorted chronologically. Please update your dataset version for them to version 2 (will be promted in terminal).


### Pip Install

You can install TGB via [pip](https://pypi.org/project/py-tgb/)
You can install TGB via [pip](https://pypi.org/project/py-tgb/). **Requires python >= 3.9**
```
pip install py-tgb
```
Expand All @@ -63,6 +64,22 @@ if website is unaccessible, please use [this link](https://tgb-website.pages.dev
- For the dynamic node property prediction task, see the [`examples/nodeproppred`](https://github.com/shenyangHuang/TGB/tree/main/examples/nodeproppred) folder for example scripts to run TGN, DyRep and EdgeBank on TGB datasets.
- For all other baselines, please see the [TGB_Baselines](https://github.com/fpour/TGB_Baselines) repo.

### Acknowledgments
We thank the [OGB](https://ogb.stanford.edu/) team for their support throughout this project and sharing their website code for the construction of [TGB website](https://tgb.complexdatalab.com/).


### Citation

If code or data from this repo is useful for your project, please consider citing our paper:
```
@article{huang2023temporal,
title={Temporal graph benchmark for machine learning on temporal graphs},
author={Huang, Shenyang and Poursafaei, Farimah and Danovitch, Jacob and Fey, Matthias and Hu, Weihua and Rossi, Emanuele and Leskovec, Jure and Bronstein, Michael and Rabusseau, Guillaume and Rabbany, Reihaneh},
journal={Advances in Neural Information Processing Systems},
year={2023}
}
```
<!--
### Install dependency
Our implementation works with python >= 3.9 and can be installed as follows
Expand Down Expand Up @@ -152,8 +169,4 @@ torch-sparse==0.6.17
torch-spline-conv==1.2.2
pandas==1.5.3
clint==0.5.1
```


### Acknowledgments
We thank the [OGB](https://ogb.stanford.edu/) team for their support throughout this project and sharing their website code for the construction of [TGB website](https://tgb.complexdatalab.com/).
``` -->
5 changes: 5 additions & 0 deletions examples/linkproppred/tgbl-coin/edgebank.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,11 @@ def get_args():
hist_dst = np.concatenate([data['destinations'][train_mask]])
hist_ts = np.concatenate([data['timestamps'][train_mask]])


# #! check if edges are sorted
# sorted = np.all(np.diff(data['timestamps']) >= 0)
# print (" INFO: Edges are sorted: ", sorted)

# Set EdgeBank with memory updater
edgebank = EdgeBankPredictor(
hist_src,
Expand Down
37 changes: 32 additions & 5 deletions examples/linkproppred/tgbl-flight/dyrep.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,21 +314,42 @@ def test(loader, neg_sampler, split_mode):
dataset.load_val_ns()

val_perf_list = []
train_times_l, val_times_l = [], []
free_mem_l, total_mem_l, used_mem_l = [], [], []
start_train_val = timeit.default_timer()
for epoch in range(1, NUM_EPOCH + 1):
# training
start_epoch_train = timeit.default_timer()
loss = train()
end_epoch_train = timeit.default_timer()
print(
f"Epoch: {epoch:02d}, Loss: {loss:.4f}, Training elapsed Time (s): {timeit.default_timer() - start_epoch_train: .4f}"
f"Epoch: {epoch:02d}, Loss: {loss:.4f}, Training elapsed Time (s): {end_epoch_train - start_epoch_train: .4f}"
)
# checking GPU memory usage
free_mem, used_mem, total_mem = 0, 0, 0
if torch.cuda.is_available():
print("DEBUG: device: {}".format(torch.cuda.get_device_name(0)))
free_mem, total_mem = torch.cuda.mem_get_info()
used_mem = total_mem - free_mem
print("------------Epoch {}: GPU memory usage-----------".format(epoch))
print("Free memory: {}".format(free_mem))
print("Total available memory: {}".format(total_mem))
print("Used memory: {}".format(used_mem))
print("--------------------------------------------")

train_times_l.append(end_epoch_train - start_epoch_train)
free_mem_l.append(float((free_mem*1.0)/2**30)) # in GB
used_mem_l.append(float((used_mem*1.0)/2**30)) # in GB
total_mem_l.append(float((total_mem*1.0)/2**30)) # in GB

# validation
start_val = timeit.default_timer()
perf_metric_val = test(val_loader, neg_sampler, split_mode="val")
end_val = timeit.default_timer()
print(f"\tValidation {metric}: {perf_metric_val: .4f}")
print(f"\tValidation: Elapsed time (s): {timeit.default_timer() - start_val: .4f}")
print(f"\tValidation: Elapsed time (s): {end_val - start_val: .4f}")
val_perf_list.append(perf_metric_val)
val_times_l.append(end_val - start_val)

# check for early stopping
if early_stopper.step_check(perf_metric_val, model):
Expand All @@ -353,14 +374,20 @@ def test(loader, neg_sampler, split_mode):
test_time = timeit.default_timer() - start_test
print(f"\tTest: Elapsed Time (s): {test_time: .4f}")

save_results({'model': MODEL_NAME,
'data': DATA,
save_results({'data': DATA,
'model': MODEL_NAME,
'run': run_idx,
'seed': SEED,
'train_times': train_times_l,
'free_mem': free_mem_l,
'total_mem': total_mem_l,
'used_mem': used_mem_l,
'max_used_mem': max(used_mem_l),
'val_times': val_times_l,
f'val {metric}': val_perf_list,
f'test {metric}': perf_metric_test,
'test_time': test_time,
'tot_train_val_time': train_val_time
'train_val_total_time': np.sum(np.array(train_times_l)) + np.sum(np.array(val_times_l)),
},
results_filename)

Expand Down
1 change: 1 addition & 0 deletions examples/linkproppred/tgbl-flight/edgebank.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ def get_args():
val_mask = dataset.val_mask
test_mask = dataset.test_mask


#data for memory in edgebank
hist_src = np.concatenate([data['sources'][train_mask]])
hist_dst = np.concatenate([data['destinations'][train_mask]])
Expand Down
37 changes: 32 additions & 5 deletions examples/linkproppred/tgbl-flight/tgn.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,21 +303,42 @@ def test(loader, neg_sampler, split_mode):
dataset.load_val_ns()

val_perf_list = []
train_times_l, val_times_l = [], []
free_mem_l, total_mem_l, used_mem_l = [], [], []
start_train_val = timeit.default_timer()
for epoch in range(1, NUM_EPOCH + 1):
# training
start_epoch_train = timeit.default_timer()
loss = train()
end_epoch_train = timeit.default_timer()
print(
f"Epoch: {epoch:02d}, Loss: {loss:.4f}, Training elapsed Time (s): {timeit.default_timer() - start_epoch_train: .4f}"
f"Epoch: {epoch:02d}, Loss: {loss:.4f}, Training elapsed Time (s): {end_epoch_train - start_epoch_train: .4f}"
)
# checking GPU memory usage
free_mem, used_mem, total_mem = 0, 0, 0
if torch.cuda.is_available():
print("DEBUG: device: {}".format(torch.cuda.get_device_name(0)))
free_mem, total_mem = torch.cuda.mem_get_info()
used_mem = total_mem - free_mem
print("------------Epoch {}: GPU memory usage-----------".format(epoch))
print("Free memory: {}".format(free_mem))
print("Total available memory: {}".format(total_mem))
print("Used memory: {}".format(used_mem))
print("--------------------------------------------")

train_times_l.append(end_epoch_train - start_epoch_train)
free_mem_l.append(float((free_mem*1.0)/2**30)) # in GB
used_mem_l.append(float((used_mem*1.0)/2**30)) # in GB
total_mem_l.append(float((total_mem*1.0)/2**30)) # in GB

# validation
start_val = timeit.default_timer()
perf_metric_val = test(val_loader, neg_sampler, split_mode="val")
end_val = timeit.default_timer()
print(f"\tValidation {metric}: {perf_metric_val: .4f}")
print(f"\tValidation: Elapsed time (s): {timeit.default_timer() - start_val: .4f}")
print(f"\tValidation: Elapsed time (s): {end_val - start_val: .4f}")
val_perf_list.append(perf_metric_val)
val_times_l.append(end_val - start_val)

# check for early stopping
if early_stopper.step_check(perf_metric_val, model):
Expand All @@ -342,14 +363,20 @@ def test(loader, neg_sampler, split_mode):
test_time = timeit.default_timer() - start_test
print(f"\tTest: Elapsed Time (s): {test_time: .4f}")

save_results({'model': MODEL_NAME,
'data': DATA,
save_results({'data': DATA,
'model': MODEL_NAME,
'run': run_idx,
'seed': SEED,
'train_times': train_times_l,
'free_mem': free_mem_l,
'total_mem': total_mem_l,
'used_mem': used_mem_l,
'max_used_mem': max(used_mem_l),
'val_times': val_times_l,
f'val {metric}': val_perf_list,
f'test {metric}': perf_metric_test,
'test_time': test_time,
'tot_train_val_time': train_val_time
'train_val_total_time': np.sum(np.array(train_times_l)) + np.sum(np.array(val_times_l)),
},
results_filename)

Expand Down
3 changes: 3 additions & 0 deletions examples/linkproppred/tgbl-review/edgebank.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,11 +111,14 @@ def get_args():

MODEL_NAME = 'EdgeBank'



# data loading with `numpy`
dataset = LinkPropPredDataset(name=DATA, root="datasets", preprocess=True)
data = dataset.full_data
metric = dataset.eval_metric


# get masks
train_mask = dataset.train_mask
val_mask = dataset.val_mask
Expand Down
Loading

0 comments on commit 6d859be

Please sign in to comment.