SPI Transfer Error 4

I am using the Arduino development environment to perform a bazillion SPI transfers to an LCD display. It takes a lot of operations to transfer a pixel image to a display one pixel at a time! A fair number of the transfers are failing with the following error:

got an error on _transfer: 4

This message gets printed to my Arduino Serial connection. It’s not from my code, so I’m thinking that it must come from somewhere inside the Arduino SPI driver.

Whenever one of these errors occurs, the drawing process freezes up for a second or so, and leaves a dark spot instead of my pixel. It makes me think that there is some sort of timeout involved. I am not sure why there would be a timeout though. It’s not like SPI needs to wait for anything unless the SPI transactions are being queued up inside the driver.

Does anyone know what this error means?

I figured out that this is indeed a timeout error, but I still have no idea why I should be getting it. Here is where the message is getting generated inside the Sparkfun Arduino core for SPI:

void SPIClass::_transfer(void *buf_out, void *buf_in, size_t count)
{
  ...
  retVal32 = am_hal_iom_blocking_transfer(_handle, &iomTransfer);
  ...
  if (retVal32 != 0)
  {
    Serial.printf("got an error on _transfer: %d\n", retVal32);
  }
}

The specific error 4 is defined here as a timeout error:

typedef enum
  {
    AM_HAL_STATUS_SUCCESS,
    AM_HAL_STATUS_FAIL,
    AM_HAL_STATUS_INVALID_HANDLE,
    AM_HAL_STATUS_IN_USE,
    AM_HAL_STATUS_TIMEOUT,
    AM_HAL_STATUS_OUT_OF_RANGE,
    AM_HAL_STATUS_INVALID_ARG,
    AM_HAL_STATUS_INVALID_OPERATION,
    AM_HAL_STATUS_MEM_ERR,
    AM_HAL_STATUS_HW_ERR,
    AM_HAL_STATUS_MODULE_SPECIFIC_START = 0x08000000,
  } am_hal_status_e;

Why should the HAL be timing out? I am sending tons of sequential SPI operations, but they are being sent via a blocking transfer mechanism so they should not be queing up anywhere.

OK, either I don’t know how to use the SPI interface, or the Artemis Arduino SPI implementation is broken. To replicate the issue, try out this minimal Arduino sketch:

#include <SPI.h>

void setup() {
  SPI.begin();
  Serial.begin(115200);
}

void loop() {
  uint32_t cnt=0;
  while (1) {
    cnt++;
    if ((cnt%1000)==0) Serial.println(cnt);
    SPI.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0));
    for (int i=0; i<4; i++) {
      SPI.transfer(i);
    }
    SPI.endTransaction();
  }
}

I don’t care about asserting SS: it is fine for the SPI data to just vanish into the ether. All I care about is that I should be able to begin a transaction, send 4 bytes, and end the transaction. Repeat forever.

Interestingly, this test fails very roughly 1 out of every 2000 times. Here is some output from a typical test run:

1000
2000
3000
4000
5000
6000
7000
got an error on _transfer: 4
8000
9000
10000
got an error on _transfer: 4
11000
got an error on _transfer: 4
12000
got an error on _transfer: 4
got an error on _transfer: 4
13000
14000
got an error on _transfer: 4
15000
got an error on _transfer: 4
got an error on _transfer: 4
16000
17000
18000
19000
got an error on _transfer: 4
20000
21000

These errors occur randomly. Here is another run:

got an error on _transfer: 4
1000
2000
got an error on _transfer: 4
3000
got an error on _transfer: 4
4000
got an error on _transfer: 4
5000
6000
7000
got an error on _transfer: 4
8000
got an error on _transfer: 4
9000
10000
got an error on _transfer: 4
11000
12000
13000
14000
15000
16000
17000
got an error on _transfer: 4
got an error on _transfer: 4
18000

From a user perspective, this is an undetectable error: the Artemis SPI driver may generate an error message to the Serial port, but the Arduino transfer() mechanisms do not support returning error information to the user’s application code, so the driver has no choice but to throw the error on the ground. Sadly, ignorance is not bliss in this case.

Finally, experimentation shows that if the inner loop transmits 4 or more bytes in the transaction, there will be errors. If I change that to be 3 bytes or less, then all the errors go away.

On a whim, I tried the SPI transfer test program on another Redboard I had laying around. Interestingly, I could not replicate the SPI errors on that second board. So I tried a third board I had made myself with a bare Artemis module on it. I let that system run the SPI transfer test overnight. As of this morning, it had no SPI transfer errors in over 440 million transfers. That would seem to suggest that my original board has a processor that is misbehaving under rare circumstances. In my experience, that is an profoundly unusual occurrence, but maybe that’s what it is. I would be interested if anyone else sees any errors when running the same test.

Thanks for writing this up Robin. That’s surprising behavior to be sure. I also did not know that that debugging printf statement had snuck its way into a release.

I am totally OK with the error message! It sure beats silently having a problem with a transfer.

I agree - but also have problems forcing the user to see it if they do not want to. Perhaps we can add a way to configure some errors to appear on a desired serial port… or we could extend the Arduino SPI API so that there is an accessible report of the last status message. Logging issue on GitHub: https://github.com/sparkfun/Arduino_Apollo3/issues/196

I can generate transfer error 4 timeout situations at will now. What follows is one way to do it.

I modified the Sparkfun SPI _transfer() method to print out if it was being invoked to TX, RX or both. This is not critical to generating the bug, but it helps explains the test output that will follow:

void SPIClass::_transfer(void *buf_out, void *buf_in, size_t count)
{
    Serial.printf("_transfer(%s,%s,%d)\n", buf_out?"TX":"", buf_in?"RX":"", count);
...

It took a surprising amount of time to figure out that it could be generated in a trivially simple fashion:

    SPI.begin();
    uint8_t buffer[256];
    SPISettings settings = SPISettings();
    settings.clockFreq = 16000000;
    SPI.beginTransaction(settings);
    uint32_t testIter = 0;
    while (1) {
      Serial.println(testIter++);
      SPI.transfer(0);
      SPI.transferOut((void*)buffer, sizeof(buffer));
    }
0
_transfer(TX,RX,1)
_transfer(TX,,256)
got an error on _transfer: 4
1
_transfer(TX,RX,1)
got an error on _transfer: 4

The timeout error 4 shows up when there is a bidirectional transfer() followed by a unidirectional write transferOut().

Final Notes:

  • - If I remove the bidirectional transfer(0) from the test code, the 256 byte transferOut() write runs perfectly forever
  • - If I change the bidirectional transfer(0) call to be a 1-byte unidirectional transferOut(buffer, 1) call, then both transfers run perfectly forever
  • At the moment, I do not know if this issue is in the Sparkfun Arduino SPI driver or maybe the Ambiq HAL, or perhaps an interaction between the two of them. But I’m pretty sure it’s not my code for a change :slight_smile:

    Cool - I will try to keep an eye on this forum topic in case you find anything out. If you reach a point of being willing to call it an issue in the HAL you could of course report it here:

    https://github.com/sparkfun/AmbiqSuiteS … dated-desc

    That’s where we are trying to improve the AmbiqSuiteSDK

    Are there any news… I’ve run into strange behaviors:

     SPISettings settings(24000000, MSBFIRST, SPI_MODE0);
        SPIEth.begin();
        uint8_t b[256] = {0x00};
        while(1){
          SPIEth.beginTransaction(settings);
          //Serial.println("write");
          SPIEth.transferOut((void*)b,256);
          //Serial.println("\nread");
          //for(int i = 0;i < 100;i++){
          SPIEth.transferIn((void*)b,256);
          //}
          SPIEth.endTransaction();
          delay(4);
        }
    

    If the delay is removed it will go to an error 4 for all spi sends or receives after ~700 Transfers

    If the delay is set to 4 (ms) it only needs around 40 transfers until errors appearing. (The error will stay for all spi transfers afterwards)

    If the delay is set to 10 it needs only 24 transfers until error…

    With delay set to 20ms it was stable… :?:

    Timeouterror comes from: “am_hal_flash_delay_status_check(…)” in am_hal_flash.c but this function is already called with a 0.5s Blocking wait time. (AM_HAL_IOM_MAX_BLOCKING_WAIT)

    I never bothered debugging it further after figuring out the workaround described above that resolved my specific issue.

    I did see this a couple of days ago: https://support.ambiqmicro.com/hc/en-us … e-release-

    The gist is that there is a fix to 2.4.2 that deals with problems involving blocking full-duplex SPI transfers. That was right in the area of the issues I was seeing, but it is not clear that it would cover what you are seeing. That said, Ambiq has made improvements to the SPI driver and it might be worth applying the patch at that link to see if it improves your situation. Please let us know what you find!

    Update: It’s only occurring on some artemis mcu. Artemis Redboard gives the above described errors. On an other Artemis Module there where no errors at all.

    I used the same hal_sdk and only non duplex SPI transfers.

    That was my experience too. However, that does not mean it is a silicon issue. It feels like a driver problem involving something like a race condition.

    Today, I saw that Ambiq released SDK 2.5.1. It claims to have the SPI fixes inside it that were available as patches before now. I tried it out, and now my projects don’t even build. Boo. It appears to me that they introduced a bug into a header file so that C++ projects won’t compile. I filed a report with Ambiq. Of course, it wouldn’t be the first time that I thought I found a bug that turned out to be my own problem :oops:

    I’ve been running into this error and getting quite frustrated, while I’m still fixing a few things in my own code and still using core 1.2.0. They might be useful to someone.

    Here’s a few things which seemed to have helped my cause:

    • Fixing a memory leak caused by not closing files

    • Removing some of my debugging statements, which were doing a lot of Serial printing, and also logging to a large buffer, which was eventually written to the SD card

    Another good way to find the “Transfer Error 4” is to try to open a file on the SD card while the card is absent.

    I had simmiliar problem with PIC32 as hardware spi (https://911electronic.com/spi-communica … spi-works/) interface connection for simple graphic LCD

    I realize this thread is a year old at this point, but I have been having the same issue with SPI transfers Error 4 on both SDK 2.4.2 (with the update) and 2.5.1 . Both full-duplex transfer and receive functions continuously give me the error condition. I’m not on my own program, using IAR for a compiler, and it’s definitely somewhere in the HAL. Any other workarounds pop up yet?

    Thanks in advance.

    If you can avoid the use of full-duplex transfers, things seem to work OK. I never debugged it deeper than that. I wasn’t that excited about debugging the HAL.