why does the pytorch model perform poorly after setting eval()?

  • Last Update :
  • Techknowledgy :

Did you solve this problem? I run into the same problem these days…Use model.train() got very high performance even in the first iteration during the test time, but significance drop when use model.eval()…,I also believed this is caused by BatchNorm layer, when i drop out the model.eval() in the validation phase, the validation loss is very similar to training loss (i used the MSE loss). However, when model.eval() is activated, the performance becomes very bad, i found sometimes the performance becomes even worse and worse as the epoch increases (quite interesting).,I encountered the same problem here, test performance even gets worse and worse as the epoch increases, I’v set model.train() before optimizer.zero_grad(), same results.,I also tried some other CNN tools, such as matconvnet and tensorflow, they both work fine, and the performance is better when calling ´model.eval()´ in the test phase. (I think it is related to the batchnorm module in my network, I hope for a higher performance when using running average and variance, but the results are opposite)

Here is my network.

class Net(nn.Module):
   def __init__(self):
   self.layers = 17
super(Net, self).__init__()
self.layer_m = self.make_h_layers()
self.layer_f = nn.Conv2d(64, 1, 3, padding = 1)

def make_h_layers(self):
   layer_1 = nn.Sequential(nn.Conv2d(1, 64, 3, padding = 1), nn.ReLU(inplace = True))
layer_m = nn.Sequential(nn.Conv2d(64, 64, 3, padding = 1), nn.BatchNorm2d(64), nn.ReLU(inplace = True))
layers = []
for i in range(self.layers - 2):
return nn.Sequential( * layers)

def forward(self, x):
   x = self.layer_m(x)
x = self.layer_f(x)
return x

if you want the model to work in eval mode, simply set the track_running_stats to False for all batch norm layers:

for child in model.children():
   for ii in range(len(child)):
   if type(child[ii]) == nn.BatchNorm2d:
   child[ii].track_running_stats = False

Suggestion : 2
for m in model.modules():
   if m.__class__.__name__.startswith('Dropout'):

Suggestion : 3

During and after training we need a way to evaluate our models to make sure they are not overfitting while training and generalize well on unseen or real-world data. There are generally 2 stages of evaluation: validation and testing. To some degree they serve the same purpose, to make sure models works on real data but they have some practical differences.,Perform one evaluation epoch over the test set. It’s separated from fit to make sure you never run on your test set until you want to.,You can still run inference on a test dataset even if the test_dataloader() method hasn’t been defined within your lightning module instance. This would be the case when your test data is not available at the time your model was declared.,Testing is usually done once we are satisfied with the training and only with the best model selected from the validation metrics.

# run full training

#(1) load the best checkpoint automatically(lightning tracks this
   for you)
trainer.test(ckpt_path = "best")

#(2) load the last available checkpoint
trainer.test(ckpt_path = "last")

#(3) test using a specific checkpoint
trainer.test(ckpt_path = "/path/to/my_checkpoint.ckpt")

#(4) test with an explicit model(will use this model and not load a checkpoint)
model1 = LitModel()
model2 = GANModel()

trainer = Trainer()
model = MyLightningModule.load_from_checkpoint(
   checkpoint_path = "/path/to/pytorch_checkpoint.ckpt",
   hparams_file = "/path/to/experiment/version/hparams.yaml",
   map_location = None,

# init trainer with whatever options
trainer = Trainer(...)

# test(pass in the model)
# setup your data loader
test_dataloader = DataLoader(...)

# test(pass in the loader)
trainer.test(dataloaders = test_dataloader)
class MyDataModule(pl.LightningDataModule):

   def test_dataloader(self):
   return DataLoader(...)

# setup your datamodule
dm = MyDataModule(...)

# test(pass in datamodule)
trainer.test(datamodule = dm)