-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DGLError Expected data to have %d rows, got %d. occurs at large batch size #4512
Comments
So the issue seems to occur when |
It seems that the |
Surely it's a bug, right? Because |
Hi, I am also facing this problem. The seed_nodes I input contains some duplicated ids. But I need to get these duplicate embeddings. Is there any solution for now? |
I try to use import torch
import dgl
src = torch.LongTensor(
[0, 0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9, 10,
1, 2, 3, 3, 3, 4, 5, 5, 6, 5, 8, 6, 8, 9, 8, 11, 11, 10, 11])
dst = torch.LongTensor(
[1, 2, 3, 3, 3, 4, 5, 5, 6, 5, 8, 6, 8, 9, 8, 11, 11, 10, 11,
0, 0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9, 10])
g = dgl.graph((src, dst))
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
# Sample in CPU
idx = torch.LongTensor([8,8])
src_nodes, dst_nodes, mfgs = sampler.sample_blocks(g, idx)
print(dst_nodes) # tensor([8, 8])
print(mfgs[-1].num_dst_nodes()) # 1
print(mfgs[-1].dstdata) # {'_ID': tensor([8, 8])}
# Inconsistant
# Sample in GPU
device = torch.device('cuda:0')
src_nodes, dst_nodes, mfgs = sampler.sample_blocks(g.to(device), idx.to(device))
print(dst_nodes) # tensor([8, 8], device='cuda:0')
print(mfgs[-1].num_dst_nodes()) # 2
print(mfgs[-1].dstdata) # {'_ID': tensor([8, 8], device='cuda:0')}
# Consistant |
BTW, it seems that sampling in GPU still can' t solve the problem of duplicated nodes in heterograph import torch
import dgl
import dgl.nn.pytorch as dglnn
src = torch.LongTensor(
[0, 0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9, 10,
1, 2, 3, 3, 3, 4, 5, 5, 6, 5, 8, 6, 8, 9, 8, 11, 11, 10, 11])
dst = torch.LongTensor(
[1, 2, 3, 3, 3, 4, 5, 5, 6, 5, 8, 6, 8, 9, 8, 11, 11, 10, 11,
0, 0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9, 10])
graph_data = {
('user', 'plays', 'game') : (src, dst),
('user', 'follows', 'user'): (torch.LongTensor([0, 1, 2, 3]), torch.LongTensor([5, 6, 7, 8]))
}
g = dgl.heterograph(graph_data)
g.nodes['user'].data['h'] = torch.ones(g.num_nodes('user'), 16)
g.nodes['game'].data['h'] = torch.ones(g.num_nodes('game'), 16)
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
uid = torch.LongTensor([0, 0, 2, 2, 4, 4])
device = torch.device('cuda:0')
src_nodes, dst_nodes, mfgs = sampler.sample_blocks(g.to(device), {'user': uid.to(device)})
print(dst_nodes) # {'user': tensor([0, 0, 2, 2, 4, 4], device='cuda:0')}
print(mfgs[-1].num_dst_nodes()) # 6
conv1 = dglnn.HeteroGraphConv({
'plays': dglnn.SAGEConv(16, 32, 'gcn'),
'follows': dglnn.SAGEConv(16, 32, 'gcn')
}, 'sum').to(device)
conv2 = dglnn.HeteroGraphConv({
'plays': dglnn.SAGEConv(32, 32, 'gcn'),
'follows': dglnn.SAGEConv(32, 32, 'gcn')
}, 'sum').to(device)
out = mfgs[0].srcdata['h']
print(mfgs[0].num_dst_nodes()) # 3
print(len(out['game'])) # 0
print(len(out['user'])) # 3
out = conv1(mfgs[0], out)
print(mfgs[1].num_dst_nodes()) # 6
print(len(out['game'])) # 0
print(len(out['user'])) # 3
out = conv2(mfgs[1], out) # Error |
Sorry we currently don't support duplicate values in the seed nodes for sampler. We’ve added it to our backlog to get prioritized over other feature requests in our roadmap. |
I also met this error on the paper100M dataset. Has this bug been fixed yet? Are there any other potential solutions? |
Haven't been solved yet. We suggest users to explicitly deduplicate seed nodes. |
🐛 Bug
DGLError('Expected data to have %d rows, got %d.') occurs at large batch__size, and doesnt occur at smaller batch_size. The larger the batch_size the larger the difference in rows. Feels like a rounding error somewhere.
To Reproduce
Expected behavior
Shouldn't DGLError
Environment
Additional context
model:
Error stack:
The text was updated successfully, but these errors were encountered: