STM32F1 SDIO layout error

Hi all,

I probably made a mistake in my first custom layout for STM32F103RE, in my SDIO lines layout to a MICROSD card :

  • all my lines :

D0 up 47K, (I tried 33K too)

D1 up 47K,(I tried 33K too)

CMD up 47K, (I tried 33K and 10K too)

CLK,

D2 up 47K, (I tried 33K too)

D3 up 47K, (I tried 33K too)

…are very short (< 35 mm) (I tried 33K to, and or 10K only for CMD)

  • all lines on the same layer (top), with a quasi continuous ground plane on the bottom

  • the ground plane is only cut for 2 vias mandatory (one for D0 and one D1)

  • I placed 2 bypass capacitors (100nF and 22uF CER) near the card alimentation, and my VDD appears to be really stable.

The only “strange” thing I made : CMD line is jumped over the CLK line, with a strap jumper R0 (I tried to replace that R0 with a R22 Ohms for same results).

With this layout, the standard ST library as software for SDIO FatFs operations, and with a 4GB Sandisk class 4, I can only operate at 2MHz ??!!!

Trying from 2 to 24 MHz, I have (rarely) some random errors (especially in read operations, but not only).

So it works, but not as expected…

Unfortunately, I own only a 1 MHz bandwidth DSO Nano oscillo, so I can’t confirm anything on the signal quality.

What could you advise me to fix my problems please ???

When I was playing around with SDIO it was with the STM32F4 Discovery card and I used jumpers (15cm or so) to a micro-SD socket on a breakout board in a breadboard. So not the best possible connection and far worse than what you have. It worked just fine although I never pushed the clock frequency too hard. A quick check of my code shows I used 48MHz/5 or 9.6MHz. I tested with an old slow 2GB micro-SD card and managed to write at a sustained 2MB/s.

The resistor pullups don’t matter much except in the very early part of the initialization process. At that point the CMD line uses open drain drivers and requires a pullup. (Thus the limit of a 400KHz clock at that point.) But after that things go push-pull and the only reason to have a weak pullup is to put a known value on the lines if nothing is happening.

Maybe it is something in the library you used. I wouldn’t know anything about it because I wrote my own code.

Tx for your answer David,

I thought about sending ACMD42 to disable D3 card internal pull up, but from what you said, I understand it won’t probably change anything ?

I haven’t told everything, one point let me think it is a layout problem : the VSS pin on my microsd socket (top layer) was probably using a too small via (to gnd plane bottom), because, initialization was not complete at first.

Then I soldered an extra wire to my ground plane and it worked. But with the limitation I have now : 2 MHz stable, but not above…

And my bypass capacitors (top layer) near the socket probably suffer from the same disease on their VSS : too small via to my ground plane (and bad layout routing method ??)…

Could it come from this hardware point ?

Tx again :wink:

It has been more than a year since I was working on my code so I don’t recall if using ACMD42 makes a difference. I can tell you that it is in my code to switch to 4 bit mode.

Without only written descriptions of the hardware design it is difficult to say if there is any problems with it. Schematics and PCB designs would help.

If you suspect power supply trouble, put that scope on the power pins of the SD socket.

Thank you David,

The ST lib is switching 4b too after init step.

Using my little oscilloscope (only 1MHz bandwidth), the VSS is perfectly still, but perhaps it’s not exactly the best oscillo to be affirmative.

Here is my layout :

http://imageshack.us/a/img833/9127/21qc.jpg

Little vias are 0.6 diameter, 0.25 drill, bigger one (like the one on GND pin of the socket) are 0.8/0.4.

Probably it’s not enough, what do you think about please ???

I don’t see anything in your layout that looks obviously wrong. It doesn’t mean much because I am no expert.

You did watch this: https://www.sparkfun.com/news/1280 I hope.

It has a discussion of vias. It and Part 1 are good introductions to PCB layout.

One gripe I have with it is that vias do not have as much copper as discussed. Board fab typically starts with a board with 1/2 oz. copper and the plating process adds another 1/2 oz. So while the board has 1 oz. copper the vias have only 1/2 oz. But there is still more copper in the vias than the traces typically attached to them.

How are your GPIO pins configured? Their drive capability is programmable and I used the 50MHz settings:

  GPIOC->OTYPER = 0;                  // Push/Pull
  GPIOC->PUPDR = (GPIOC->PUPDR&0xff00ffff)|0x00550000;  // pull up on data lines
  GPIOD->PUPDR = (GPIOD->PUPDR & 0xffffffcf) | 0x10; // pull up on CMD
  GPIOC->OSPEEDR = (GPIOC->OSPEEDR & 0xfc00ffff) | 0x02aa0000; // high speed (50MHz)
  GPIOD->OSPEEDR = (GPIOD->OSPEEDR & 0xffffffcf) | 0x10;
  GPIOD->MODER = (uint32_t)0x0020L | (GPIOD->MODER & 0xffffffcf); // alt func
  GPIOC->MODER = (uint32_t)0x2aa0000L | (GPIOC->MODER & 0xfc00ffff); //alt func
  GPIOD->AFR[0] = (GPIOD->AFR[0] & 0xfffff0ff) | 0xc00;  // AF12
  GPIOC->AFR[1] = (GPIOC->AFR[1] & 0xfff00000) | 0x000ccccc;  // AF12

Hi David,

I just watch this video, which is a great source for sure (Sparkfun is a must for DIYers). From this, I don’t see any problem with my vias configuration… A microSD card is 150 mA max…

The only problem I see in my routing is the lack of “top layer trace” between my 100 nF+ 22uF capacitors, and my GND pin on the socket : there will be extra inductance through the 2 existing vias, and with a “long” path, it probably can explain a bad decoupling at high frequency ???

But I have to find an answer why is it working with the extra wire GND path, and why it does not without ? Probably cross talk ?

Without the extra-wire, I saw (but I’m not sure because of bad oscillo) a LOW level that was gradually going up during the init phase on the CMD line (probably during the SD response), and it appears to disappear with the extra wire.

Thank you for your pin configuration, mine is 50MHz configured too :

 RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOC | RCC_APB2Periph_GPIOD , ENABLE);

  /*!< Configure PC.08, PC.09, PC.10, PC.11, PC.12 pin: D0, D1, D2, D3, CLK pin */
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_8 | GPIO_Pin_9 | GPIO_Pin_10 | GPIO_Pin_11 | GPIO_Pin_12;
  GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz; /// GPIO_Speed_50MHz;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
  GPIO_Init(GPIOC, &GPIO_InitStructure);

  /*!< Configure PD.02 CMD line */
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_2;
  GPIO_Init(GPIOD, &GPIO_InitStructure);

I don’t use internal pull-ups, but could it matter in any way ?

Another question : in the initialization step, how did you managed with the 1.8V feature of the SD ?

Tx again for your help, I appreciate 8)

neoirto:
Another question : in the initialization step, how did you managed with the 1.8V feature of the SD ?

What 1.8V feature? If you mean 1.8V signaling, you have to request that the card switch to that mode by sending the right commands. Which you wouldn’t because your SDIO interface doesn’t support it.

Yes, 1.8V signaling. Disabled to operate with STM32…

So I always look for an hardware misconfiguration. :doh:

Do you think it could come from a bad sizing of the external capacitors from my external crystal HSE ? Bad clock in any way ??

neoirto:
Do you think it could come from a bad sizing of the external capacitors from my external crystal HSE ? Bad clock in any way ??

The only way that this could conceivably be a problem is if you didn’t use the PLL. Otherwise all the CPU clocks are derived from that nice conditioned PLL clock.

What errors are you getting? CRC errors on the SDIO interface? Errors in the responses? Something else?

Ok David, first I use PLL to clock at 72 MHZ, so as you say, it’s probably something else…

It appears I fixed part of the errors by a better writing schem but the writing speed stills very slow.

My errors were especially during reading operations at 18 MHz on the Sandisk 4GB FAT32: the card randomly enters a BUSY state longer than expected, or return a timeout error very rarely.

But on another card (1GB dirty noname : FAT format), even at 2MHz, I had (sometimes, after a lot of successful multibloc and single bloc writing) the same TIMEOUT errors in multiblock writing :

SDIO_GetFlagStatus(SDIO_FLAG_DTIMEOUT)

I fixed this by using more writing temp, and changed for :

SDIO_DataInitStructure.SDIO_DataTimeOut = 0x001FFFFF;

instead of 0x000FFFFF.

Now I can write (it looks stable) with an SDIO clock at 18 MHz, on both 1 and 4 GB card, but both can’t work at 24 MHz. I now write blocs of 4096 Bytes in 15 ms (273 KB/s) which is a bit disappointing, but because of the busy random state, I can’t stabilize better than 32 KB/s continuous on both cards…

I believe I will still need advises !

There can be a lot of variablility in SD card response times. If you are getting DTIMEOUT errors it probably means that the card was busy and you didn’t wait long enough. I have forgotten most of this but fortunately, I put comments in my code:

 /* Setup data path controller for 512 byte transfer
     Timeout is 100ms for read and 500ms for write. These are the
     recomended values for SDHC and XC cards. Well, 250ms is the 
     recommended value for write but it could be up to 500ms. 
     Especially at the end of a multi block write.
     See 4.6.2 of the SD Physical Layer Spec. (Ver. 3.01)
  */
  SDIO->DTIMER = (uint32_t)((48000000/(HIGH_SPEED_DIVIDER+2)) *
                            ( dir ? 0.5 : 0.1));

Well…

Well, 250ms is the

recommended value for write but it could be up to 500ms.

Especially at the end of a multi block write.

...tx for that

I can confirm the value… It’s too bad !

But about my “top writing speed” (only at 273 KB/s) on the multibloc write, I probably have to find an “hardware reason”.

Is the extra-inductance due to the vias a good candidate to explain that BIG FAIL ?

If you aren’t getting CRC errors then your PCB layout has absolutely nothing to do with it.

I don’t know what you mean by top writing speed. When I performed write speed testing I noticed that while most blocks wrote quickly, some were slower. If you use the slowest time then it will be slow. But if you have a buffer for your data then the overall speed can be a lot faster. I think I measured over 2MB/s with my old slow 2GB micro-SD card with a SD clock of less than 10MHz.

(Don’t ask about the details of how that was produced. I found it skulking in a directory with stuff I produced when testing write speed almost two years ago.)

Testing write speed was easy. I filled a block with copies of the time (based on CPU clock) and wrote as fast as possible. Then process the resulting file to get timing. Something like:

      for(i=0; i < 5000; i++)
        {
          uint64_t time;
          time = ((uint64_t)ticks)*168000 + (168000-SysTick->VAL);
          for(j=0;j <64*NUMBLOCKS;j++)
            buffer[j]= time;
          if(fat_write((uint8_t *)buffer, NUMBLOCKS))
            {
              usart2_puts("file write failed\r\n");
              while(1)
                ;
            }
          if((i%100) == 0)
            usart2_send('.');
        }

Typical FAT libraries will be much slower than my code because I defer updating the FAT until file close time.

Thank you again David,

I appreciate a lot your help.

If you aren’t getting CRC errors then your PCB layout has absolutely nothing to do with it.

No CRC error means that my signal integrity is definitly ok, if I understand well ?

The needed extra-wire is so probably related to something about powering the card with enough current, and may not be signal related, what do you think about that diagnosis ?

In your example, at what value your “NUMBLOCKS” is defined ? Is it 512 to write a complete cluster (a lot of RAM… But that’s the fastest way to write on SD) ?

And so I understand a mean 3.5ms is the time take a “fat_write()” to execute ( which is a multi-bloc write operation).

Is that true ?

My example is different : I have to stabilize a constant writing speed. Using big buffer is a good practice to deal with “random” busy state of the card.

About the writing speed, I take into account the time an “f_write()” is occuring with FatFs. And you’re true : in FatFs, many disk_read() and/or disk_write() single and/or multi-blocs can occur in a single f_write() in FAT or FAT32, dealing with clusters and fragmentation of the card. So the constant stabilized write speed is a lot dependent of the size of your buffer in RAM.

Perhaps I can increase a bit this speed by decreasing busy state time with better current management in my layout…

neoirto:
No CRC error means that my signal integrity is definitly ok, if I understand well ?

Correct.

The needed extra-wire is so probably related to something about powering the card with enough current, and may not be signal related, what do you think about that diagnosis ?

Evidence is lacking. You said that you looked at the power rails using a scope and saw nothing. If you really measured this at the SD card then you have no power problems.

In your example, at what value your “NUMBLOCKS” is defined ? Is it 512 to write a complete cluster (a lot of RAM… But that’s the fastest way to write on SD) ?

It is a variable and I tested with various values. Multi-block writes were always faster than single block. That should be the case for you as well so long as the code uses multi-block writes. If it uses a bunch of 512 byte block writes then it will not see the speed improvement.

And so I understand a mean 3.5ms is the time take a “fat_write()” to execute ( which is a multi-bloc write operation).

Is that true ?

More or less. As I said I have forgotten the details of the conditions that produced that graph.

About the writing speed, I take into account the time an “f_write()” is occuring with FatFs. And you’re true : in FatFs, many disk_read() and/or disk_write() single and/or multi-blocs can occur in a single f_write() in FAT or FAT32, dealing with clusters and fragmentation of the card. So the constant stabilized write speed is a lot dependent of the size of your buffer in RAM.

There are some truly horrible examples of FAT file system running around and how well your particular choice performs will limit your speed. I recall one that read the entire FAT chain in order allocate a new cluster. Twice! (There was a discussion around here somewhere a while back.) In order to compensate for that you must have a set of write buffers to use as a FIFO. Then while you are waiting for one write to complete your data can be filling the other buffers.

If you really want speed you have to avoid dealing with the FAT. Allocating a cluster requires reading a block and then writing it back. While you can buffer that block to prevent most of the reads, once every 128 clusters (FAT32) you will have to read it. Dealing with fragmentation is much worse because the number of reads can be very large.

My [code (not updated for a while and only does FAT16) scans the FAT looking for a large continuous free area. (It starts at the end on the assumption that this will find the largest free area.) It then begins writing at the start of that region and continues until it is finished. Only then does it go back and update the FAT. This of course leaves the file system in a bad state if the file is never closed but it is the trade required for speed.](http://home.earthlink.net/~schultdw/logOmatic/)