IIR-filter optimization

monstrum · February 10, 2010, 5:00pm

I’m currently working on an IIR-filtering function-set and though it works, I find it a little time demanding for my SAM7.

I run the core-clock at about 55 MHz, and I have timed one filter update to 5.5 µS. This is for an order 4 filter.

Well, it might sound quite fast, but in fact it takes 300 clock cycles.

I have written my own fixed-point math routines, which I have checked does not use more than a single SMULL-instruction and a shift to do a multiplication.

So the pseudo code for one filter update is something like this:

A and B are the pre-generated nominator and denominator for the filter.

inbuffer[0] = inbuffer[1];
inbuffer[1] = inbuffer[2];
inbuffer[2] = inbuffer[3];
inbuffer[3] = in_signal;

sum = inbuffer[0] * B[0] + 
          inbuffer[1] * B[1] + 
          inbuffer[2] * B[2] + 
          inbuffer[3] * B[3] -
          feedback[0] * A[0] -
          feedback[1] * A[1] -
          feedback[2] * A[2];

feedback[0] = feedback[1];
feedback[1] = feedback[2];
feedback[2] = sum;

signal_out = sum;

Of course, my routines are a little more structured, but they are inlined so it should compile into something very similar to this. Should this really take 300 cycles to execute?

monstrum · February 10, 2010, 6:12pm

Update, it seems that WinARM is not very good at handling inline functions.

I moved in the actual code so that it looks very much like the above pseudo code. It made quite a difference. I’m now down to about 170 cycles per filter update. However, I still think there should be some room for improvement.

lehmanna · February 12, 2010, 2:31pm

Have you considered using a ringbuffer instead of doing all the copying of the array elements? This should considerably lessen (slow) memory access.

jerhee · April 27, 2010, 10:36pm

You are using Direct Form I implementation structure. Direct Form I requires 2N storage elements for order N filter. Consider using a Direct Form II or Direct Form II Transpose structure, which use only N storage elements. Direct Form II has the same computational complexity as Direct Form I, but it requires fewer storage elements, so you should be able to save a few cycles by eliminating assignment operations. Also, consider using a cascade of 2nd order sections, which is more resistant to coefficient quantization and roundoff errors than other forms. For this, check out the matlab function ‘zp2sos’.

Topic		Replies	Views
ARM, gcc and MUL MicroMod arm-lpc	5	1663	July 23, 2007
LPC2148 vs AT91SAM7 vs STM32, another battle at 48MHz MicroMod arm-lpc	7	12424	July 7, 2008
Speed Issues MicroMod arm-lpc	5	3690	February 8, 2006
LPC2138 Kalman IMU Memory problem MicroMod arm-lpc	7	1740	July 22, 2009
New to ARMs, many questions MicroMod arm-lpc	3	1567	August 12, 2006

IIR-filter optimization

Related topics