-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model.eval() #2083
Comments
@svekars I believe |
/assigntome |
@JaejinCho as you mentioned
While the role of However, I would tend to agree that for beginners to learn best practices, they should use both Do you think what is needed is updating the example with comments to clarify the use case of these two modes? and to update the example to include |
@JaejinCho Would the following additions to the tutorial be sufficient? Current test loop: def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n") Addition of comments and model.eval() def test_loop(dataloader, model, loss_fn):
# Set the model to evaluation mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.eval()
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
# Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
# also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item() |
In the tutorial below, isn't it better to have model.eval() for more general cases in addition to the context manager torch.no_grad(), or at least have a brief explanation regarding the difference between the two? I think no_grad does not take care of dropout or batchnorm. Although not having model.eval() is fine in this tutorial, it seems necessary generally for evaluation.
tutorials/beginner_source/basics/optimization_tutorial.py
Line 172 in 5152270
cc @suraj813 @jerryzh168 @z-a-f @vkuzo
The text was updated successfully, but these errors were encountered: