WAV Trigger limiting factors, comparison to SmartWAV 2

It seems like the robertsonics WAV Trigger and the Vizic SmartWAV 2 are the two man products in this space of serial-controllable polyphonic sound modules. Also, they are both based on ARM Cortex processors.

I am really curious:

  • - If it's not an industry secret: what's the limiting factor when it comes to processing 14 stereo voices (something like 1,234,800 samples/s or about 2.36 MB/s)? Is it the speed of the processor? Or the speed of reading from the SD card? The amount of memory? Something else? All of the above?
  • - How do the two products compare? The obvious notable differences are: the WAV Trigger is about $20 cheaper, can report about track & voice state, and has arguably better documentation. On the other hand, the SmartWAV 2 is physically more compact (valuable for an embedded product). Are there any other important differences that aren't as obvious?
  • While not directly answering your questions (I have no experience with the SmartWAV 2 and the documentation doesn’t mention some critical performance specs) you might find this article informative.

    https://www.robertsonics.com/blog/2021/ … rosd-cards

    Very interesting, nice post! I didn’t know about your blog. I’ve done a lot of high-level programming, and a decent bit of hobby Arduino programming, but my knowledge has a big gap right in the middle within which lies your product’s niche. I was aware of double buffering in graphics interfaces, but for some reason I never realized it’s the same mechanism going on with audio buffers. That also explains why stuff like this are important when the buffer is longer: https://solhsa.com/soloud/voicegroups.html I can imagine that in worst case scenarios, 14 voices * 1ms/read could add up to some delay, and depending on your SD read buffer size, you could end up with memory management issues too. It sounds like challenging but fun work. Thanks for the response!

    To answer your original question, the limiting factor in streaming polyphonic audio from microSD, at least for Robertsonics players, is the SDIO interface to the microSD card and the response time of the cards. This can be mitigated by increasing the read buffer size and doing fewer, larger SD random block reads, but then you bump up against the available SRAM in the MCU. Each voice requires it’s own buffers, so any increase is multiplied by the number of voices. You run out of SRAM very quickly. I’ve spent a lot of time tuning the balance of buffers, SD block read sizes and number of voices to ensure that there are never any overruns (i.e. clicks and pops) with a properly behaving microSD card.