Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

The AGP_Pruner example provided can not run successfully, An error occurred: TypeError: unsupported operand type (s) for *: 'Tensor' and 'dict' #2035

Closed
yeliang2258 opened this issue Feb 11, 2020 · 10 comments
Assignees

Comments

@yeliang2258
Copy link

yeliang2258 commented Feb 11, 2020

The error is:
image
The first epoch can run, but the second epoch reports an error.

The code is from the sample main_torch_pruner.py(https://github.com/microsoft/nni/blob/master/examples/model_compress/main_torch_pruner.py)

code:

from nni.compression.torch import AGP_Pruner
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms

class Mnist(torch.nn.Module):
def init(self):
super().init()
self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
self.fc2 = torch.nn.Linear(500, 10)

def forward(self, x):
    x = F.relu(self.conv1(x))
    x = F.max_pool2d(x, 2, 2)
    x = F.relu(self.conv2(x))
    x = F.max_pool2d(x, 2, 2)
    x = x.view(-1, 4 * 4 * 50)
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return F.log_softmax(x, dim=1)

def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))

def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)

print('Loss: {}  Accuracy: {}%)\n'.format(
    test_loss, 100 * correct / len(test_loader.dataset)))

def main():
torch.manual_seed(0)
device = torch.device('cpu')

trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=True, download=True, transform=trans),
    batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=False, transform=trans),
    batch_size=1000, shuffle=True)

model = Mnist()
model.to(device)

'''you can change this to LevelPruner to implement it
pruner = LevelPruner(configure_list)
'''
configure_list = [{
    'initial_sparsity': 0,
    'final_sparsity': 0.8,
    'start_epoch': 0,
    'end_epoch': 10,
    'frequency': 1,
    'op_types': ['default']
}]

pruner = AGP_Pruner(model, configure_list)
model = pruner.compress()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
for epoch in range(10):
    pruner.update_epoch(epoch)
    print('# Epoch {} #'.format(epoch))
    train(model, device, train_loader, optimizer)
    test(model, device, test_loader)
pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28])

if name == 'main':
main()

Thanks!

@Cjkkkk
Copy link
Contributor

Cjkkkk commented Feb 11, 2020

Hi @yeliang2258 , thanks for bringing up this issue!Could you try using nni v1.4? This is fixed in latest version.

@yeliang2258
Copy link
Author

yeliang2258 commented Feb 12, 2020

Hi @yeliang2258 , thanks for bringing up this issue!Could you try using nni v1.4? This is fixed in latest version.

Hello, using cpu, AGP can work normally, but using cuda will report an error, the error message is as follows. The code is the provided example, I changed cpu to cuda, Thanks!
image

@Cjkkkk
Copy link
Contributor

Cjkkkk commented Feb 12, 2020

Hi, @yeliang2258 , could you try add model = model.to(device) after line model = pruner.compress() and see if it works? It seems some buffers registered by pruner are not transfered into cuda, which caused the error. Thanks!

@yeliang2258
Copy link
Author

yeliang2258 commented Feb 12, 2020

Hi, @yeliang2258 , could you try add model = model.to(device) after line model = pruner.compress() and see if it works? It seems some buffers registered by pruner are not transfered into cuda, which caused the error. Thanks!

Still not working, the error message is as follows:
image

The function pruner.export_model ()) in the example has no effect. Thanks!

@Cjkkkk
Copy link
Contributor

Cjkkkk commented Feb 12, 2020

Hi @yeliang2258, could you change pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28]) into pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28], device)?
default device for export_model is cpu, which cause the error.
If it works, you are welcome to submit a PR for this outdated example. Thanks!

@yeliang2258
Copy link
Author

yeliang2258 commented Feb 12, 2020

Hi @yeliang2258, could you change pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28]) into pruner.export_model('model.pth', 'mask.pth', 'model.onnx', [1, 1, 28, 28], device)?
default device for export_model is cpu, which cause the error.
If it works, you are welcome to submit a PR for this outdated example. Thanks!

I modified it and found two problems. First, the pruner.export_model () function did not generate the corresponding file. Second, cuda still couldn't be used, and the same error was reported. my torch is 1.2.0,and nni is V1.4

@Cjkkkk
Copy link
Contributor

Cjkkkk commented Feb 12, 2020

Hi @yeliang2258, after some debugging, it turns out there is a bug in code for transfering buffers between device. Since most examples set origin buffers on cuda, the bug is not spotted. Anyway, I will fix this issue later and inform you after this issue is fixed and tested.

@yeliang2258
Copy link
Author

Hi @yeliang2258, after some debugging, it turns out there is a bug in code for transfering buffers between device. Since most examples set origin buffers on cuda, the bug is not spotted. Anyway, I will fix this issue later and inform you after this issue is fixed and tested.

Ok! thank you very much! Also, the pruner.export_model () function in AGP does not seem to work, it does not generate the corresponding file.

@Cjkkkk
Copy link
Contributor

Cjkkkk commented Feb 12, 2020

Hi, @yeliang2258 , the file is generated in the same directory as the directory you run the python command. example: python a/b/example.py then file is in directory a. Files are generated as expected in my machine. could you check if files are in other directory? Thanks!

@scarlett2018
Copy link
Member

Closing as the original problem is fixed. thanks @yeliang2258 and @Cjkkkk

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants