RuntimeError: cuda runtime error (59) : device-side assert triggered

Hi, im building a jetbot with the sparkfun jetson nano 2GB kit. The V01-00 image worked though the training model isn’t working properly. It seems to have issues with the cuda system. These our the errors I have been getting:

1.RuntimeError: CUDA error: device-side assert triggered - at one of the times I tried to run the program.

  1. RuntimeError: cuda runtime error (59) : device-side assert triggered at /media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THC/generic/THCTensorMath.cu:16.

3.RuntimeError: cuda runtime error (59) : device-side assert triggered at /media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THC/generic/THCTensorMath.cu:26

This is my code:

NUM_EPOCHS = 30

BEST_MODEL_PATH = ‘best_model.pth’

best_accuracy = 0.0

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for i, data in enumerate(all_dataloader):

for epoch in range(NUM_EPOCHS):

for images, labels in iter(train_loader):

images = images.to(device)

labels = labels.to(device)

optimizer.zero_grad()

outputs = model(images)

loss = F.cross_entropy(outputs, labels)

loss.backward()

optimizer.step()

test_error_count = 0.0

for images, labels in iter(test_loader):

images = images.to(device)

labels = labels.to(device)

outputs = model(images)

test_error_count += float(torch.sum(torch.abs(labels - outputs.argmax(1))))

test_accuracy = 1.0 - float(test_error_count) / float(len(test_dataset))

print(‘%d: %f’ % (epoch, test_accuracy))

if test_accuracy > best_accuracy:

torch.save(model.state_dict(), BEST_MODEL_PATH)

best_accuracy = test_accuracy

This is the error:
RuntimeError Traceback (most recent call last)

in

13 outputs = model(images)

14 loss = F.cross_entropy(outputs, labels)

—> 15 loss.backward()

16 optimizer.step()

/usr/local/lib/python3.6/dist-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)

105 products. Defaults to False.

→ 107 torch.autograd.backward(self, gradient, retain_graph, create_graph)

109 def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)

91 Variable._execution_engine.run_backward(

92 tensors, grad_tensors, retain_graph, create_graph,

—> 93 allow_unreachable=True) # allow_unreachable flag

RuntimeError: cuda runtime error (59) : device-side assert triggered at /media/nvidia/WD_BLUE_2.5_1TB/pytorch-v1.1.0/aten/src/THC/generic/THCTensorMath.cu:26

Try this solution and see how it goes: https://github.com/NVIDIA-AI-IOT/jetbot/issues/111

Thanks, so it really seemed to solve that problem but i still can’t run the following code in the tarin module:

NUM_EPOCHS = 30

BEST_MODEL_PATH = ‘best_model.pth’

best_accuracy = 0.0

optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for epoch in range(NUM_EPOCHS):

for images, labels in iter(train_loader):

images = images.to(device)

labels = labels.to(device)

optimizer.zero_grad()

outputs = model(images)

loss = F.cross_entropy(outputs, labels)

loss.backward()

optimizer.step()

test_error_count = 0.0

for images, labels in iter(test_loader):

images = images.to(device)

labels = labels.to(device)

outputs = model(images)

test_error_count += float(torch.sum(torch.abs(labels - outputs.argmax(1))))

test_accuracy = 1.0 - float(test_error_count) / float(len(test_dataset))

print(‘%d: %f’ % (epoch, test_accuracy))

if test_accuracy > best_accuracy:

torch.save(model.state_dict(), BEST_MODEL_PATH)

best_accuracy = test_accuracy

I have been getting 2 different error every time i try to run this code:

Server Connection Error - A connection to the Jupyter server could not be established. JupyterLab will continue trying to reconnect. Check your network connection or Jupyter server configuration.

Kernel Restarting - The kernel for Notebooks/collision_avoidance/train_model.ipynb appears to have died. It will restart automatically.

Which in this case I believe doesn’t update the ‘best modul.pth’ which appears on the left side and causes the following error when trying to run the live demo:

IncompatibleKeys(missing_keys=, unexpected_keys=)

Would be happy for further support.

Thanks