-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmark] Adding inference benchmark suite #4915
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4915 +/- ##
==========================================
- Coverage 82.83% 82.82% -0.02%
==========================================
Files 331 331
Lines 18047 18069 +22
==========================================
+ Hits 14949 14965 +16
- Misses 3098 3104 +6
|
Adding inference folder Adding inference benchmark for different hyper params and batchsize: datasets: reddit, ognb-products, ogbn-mag models: gcn, gat, edge_conv, pna_conv , to_hetero(gat) , to_hetero(graphsage) using NeighborLoader Co-authored-by: kgajdamo , dszwicht
for more information, see https://pre-commit.ci
NeighborLoader takes a long time to initialize, and there is no need to re-create it every time the number of layers changes. So in order to speed up the tests evaluation, it can be elevated to an outer loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This looks good. I think it would be great to re-use some existing code from torch_geometric.nn.models.basic_gnn
(instead of writing new models for every benchmark). Happy to add an "inference-mode" to these models directly within PyG.
6d21c30
to
81c4d01
Compare
@@ -169,3 +170,19 @@ def __repr__(self): | |||
return (f'{self.__class__.__name__}({self.in_channels}, ' | |||
f'{self.out_channels}, towers={self.towers}, ' | |||
f'edge_dim={self.edge_dim})') | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we briefly test this in nn/conv/test_pna_conv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
fa7beaf
to
1c6f95e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the updates! Left some comments, please let us know if you have any questions.
hetero = True if dataset_name == 'ogbn-mag' else False | ||
mask = ('paper', None) if hetero else None | ||
degree = None | ||
|
||
data = dataset[0].to(device) | ||
inputs_channels = data.x_dict['paper'].size( | ||
-1) if hetero else dataset.num_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 29 and 33-34 seem specific to ogbn-mag
(instead of hetero
), and will likely break if we add more heterogeneous datasets in the future. Can we condition on if dataset_name == 'ogbn-mag'
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done
mask = ('paper', None) if hetero else None | ||
degree = None | ||
|
||
data = dataset[0].to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return dataset[0]
from get_dataset
, so this logic can be handled properly for datasets that may override __getitem__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
subgraph_loader = NeighborLoader( | ||
copy.copy(data), | ||
num_neighbors=[-1], | ||
input_nodes=mask, | ||
batch_size=batch_size, | ||
shuffle=False, | ||
num_workers=args.num_workers, | ||
) | ||
subgraph_loader.data.n_id = torch.arange(data.num_nodes) | ||
|
||
for layers in args.num_layers: | ||
if hetero: | ||
subgraph_loader = NeighborLoader( | ||
copy.copy(data), | ||
num_neighbors=[args.hetero_num_neighbors] * layers, | ||
input_nodes=mask, | ||
batch_size=batch_size, | ||
shuffle=False, | ||
num_workers=args.num_workers, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the differences for NeighborLoader
between homogeneous and heterogeneous graphs, as well as the interplay between num_neighbors
and num_layers
. For the purposes of this benchmark, it should be the case that len(num_neighbors) == len(num_layers)
(for both homogeneous and heterogeneous graphs).
If we want to be able to specify the number of neighbors for each layer, we can consolidate this logic and just have num_neighbors = args.num_neighbors
. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mananshah99 Maybe i will point out few things to make the situation more clear:
- num_layers is parameter for benchmark, it is list of layer sizes to bench for the models -> layer(s) is passed to model to define the number of layers to create in model in basic_gnn and hetero_* . So we are benching in the loop models created with different number of layers separately.
- Inference for basic_gnn is performed "layer-wise" and inference for hetero is performed "batch-wise". In case of layer-wise approach we always working only on the nearest neighborhood (neighborloader num_neighbors "len" is always 1) . In case of batch-wise we need to define neighborloader num_neighbor's as deep as the layers size.
- The hetero_num_neighbors args was added only because right now inference for hetero in batch-wise mode is so long. Added such arg to have an option for user/CI to easily change the num_neighbors to do some smaller bench.
- Creation of loader's takes time and we are performing in the loop many benchmarks for different setups so to save the time for homogeneous workloads where we don't depend on layer size we create loader before the layers size loop but for hetero models we need to know the layer size in the moment of creating NeighborLoader object so we create it in the num_layers loop.
Let me know if that explains the situation and approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For homogeneous graphs, we do inference via layer-wise computation. As such, I think the code is correct. We should nonetheless document this properly here to avoid confusion.
default=['edge_conv', 'gat', 'gcn', 'pna_conv', 'rgat', | ||
'rgcn'], type=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we automatically obtain this from supported_sets.values()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of that default param is to have an option to change the performed bench suites during some checking etc. - for user's convenience who can do it in CLI or change directly in script - I personally would prefer to have it written explicitly in the list - pulling from supported_sets would not give such a flexibility.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need to get the union of all values. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably okay to leave it as it is for now :)
argparser.add_argument('--datasets', nargs='+', | ||
default=['ogbn-mag', 'ogbn-products', | ||
'reddit'], type=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we automatically obtain this from supported_sets.keys()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look into the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
argparser.add_argument( | ||
'--hetero-num-neighbors', default=-1, type=int, | ||
help='number of neighbors to sample per layer for hetero workloads') | ||
argparser.add_argument('--num-workers', default=2, type=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a reasonable default (instead of cpu_count // 2
, for example?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shortly: As for now - for NeighborLoader should be good enough - we can always changed it when we will do some more analysis of workloads and while improving the software stack.
More: Dataloading is tightly coupled with model inference/train loop - the dataloader "producing" efficiency interact the model "consuming" efficiency. In general empirical experience shows that X num_workers can be optimal for cpu cores in range Y to Z - the accurate values may wary and depends things like HW (e.g. single vs multi sockets, memory) and OS/env settings.
'--models', nargs='+', | ||
default=['edge_conv', 'gat', 'gcn', 'pna_conv', 'rgat', | ||
'rgcn'], type=str) | ||
argparser.add_argument('--root', default='../../data', type=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document this parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
argparser.add_argument('--root', default='../../data', type=str) | ||
argparser.add_argument('--eval-batch-sizes', nargs='+', | ||
default=[512, 1024, 2048, 4096, 8192], type=int) | ||
argparser.add_argument('--num-layers', nargs='+', default=[2, 3], type=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as comments above, let's unify this (along with hetero num neighbors) for both homogeneous and heterogeneous graphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look into the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks mostly great. Left a few minor comments.
benchmark/inference/utils.py
Outdated
print(f'Model {name} not supported!') | ||
|
||
if name == 'rgat': | ||
model = model_type(metadata, params['hidden_channels'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model = model_type(metadata, params['hidden_channels'], | |
return model_type(metadata, params['hidden_channels'], |
It is recommended to return immediately. The following elif
can be converted to if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
benchmark/inference/utils.py
Outdated
|
||
def get_model(name, params, metadata=None): | ||
try: | ||
model_type = models_dict[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_type = models_dict[name] | |
model_type = models_dict.get(name, None) |
to avoid the try/catch? Personal preference though, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
benchmark/inference/utils.py
Outdated
|
||
def get_model(name, params, metadata=None): | ||
try: | ||
model_type = models_dict[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_type = models_dict[name] | |
Model = models_dict[name] |
Can we make this uppercase so that it is clear that it returns a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - done.
benchmark/inference/utils.py
Outdated
kwargs['heads'] = params['num_heads'] | ||
model = model_type(params['inputs_channels'], | ||
params['hidden_channels'], params['num_layers'], | ||
params['output_channels'], **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params['output_channels'], **kwargs) | |
params['output_channels'], heads=params['num_heads'],) |
I think this is cleaner than first constructing a dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
shuffle=False, | ||
num_workers=args.num_workers, | ||
) | ||
subgraph_loader.data.n_id = torch.arange(data.num_nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be dropped IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
subgraph_loader = NeighborLoader( | ||
copy.copy(data), | ||
num_neighbors=[-1], | ||
input_nodes=mask, | ||
batch_size=batch_size, | ||
shuffle=False, | ||
num_workers=args.num_workers, | ||
) | ||
subgraph_loader.data.n_id = torch.arange(data.num_nodes) | ||
|
||
for layers in args.num_layers: | ||
if hetero: | ||
subgraph_loader = NeighborLoader( | ||
copy.copy(data), | ||
num_neighbors=[args.hetero_num_neighbors] * layers, | ||
input_nodes=mask, | ||
batch_size=batch_size, | ||
shuffle=False, | ||
num_workers=args.num_workers, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For homogeneous graphs, we do inference via layer-wise computation. As such, I think the code is correct. We should nonetheless document this properly here to avoid confusion.
shuffle=False, | ||
num_workers=args.num_workers, | ||
) | ||
subgraph_loader.data.n_id = torch.arange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
argparser.add_argument('--datasets', nargs='+', | ||
default=['ogbn-mag', 'ogbn-products', | ||
'reddit'], type=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
default=['edge_conv', 'gat', 'gcn', 'pna_conv', 'rgat', | ||
'rgcn'], type=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need to get the union of all values. +1
Please also add your PR to the CHANGELOG.md :) |
Adding inference benchmark for different num_layers, hidden_channels and batch_sizes.
Datasets: reddit, ognb-products, ogbn-mag
Models: gcn, gat, edge_conv, pna_conv , rgat as to_hetero(gat) , rgcn as to_hetero(graphsage)
Loader: NeighborLoader
Modes: measure time for graph sampling + inference , pure gnn mode - prepare batches and measure just models inference time
Co-authored-by: @kgajdamo , @dszwicht