IIR-filter optimization

I’m currently working on an IIR-filtering function-set and though it works, I find it a little time demanding for my SAM7.

I run the core-clock at about 55 MHz, and I have timed one filter update to 5.5 µS. This is for an order 4 filter.

Well, it might sound quite fast, but in fact it takes 300 clock cycles.

I have written my own fixed-point math routines, which I have checked does not use more than a single SMULL-instruction and a shift to do a multiplication.

So the pseudo code for one filter update is something like this:

A and B are the pre-generated nominator and denominator for the filter.

inbuffer[0] = inbuffer[1];
inbuffer[1] = inbuffer[2];
inbuffer[2] = inbuffer[3];
inbuffer[3] = in_signal;

sum = inbuffer[0] * B[0] + 
          inbuffer[1] * B[1] + 
          inbuffer[2] * B[2] + 
          inbuffer[3] * B[3] -
          feedback[0] * A[0] -
          feedback[1] * A[1] -
          feedback[2] * A[2];

feedback[0] = feedback[1];
feedback[1] = feedback[2];
feedback[2] = sum;

signal_out = sum;

Of course, my routines are a little more structured, but they are inlined so it should compile into something very similar to this. Should this really take 300 cycles to execute?:diamonds:

Update, it seems that WinARM is not very good at handling inline functions.

I moved in the actual code so that it looks very much like the above pseudo code. It made quite a difference. I’m now down to about 170 cycles per filter update. However, I still think there should be some room for improvement.

Have you considered using a ringbuffer instead of doing all the copying of the array elements? This should considerably lessen (slow) memory access.

You are using Direct Form I implementation structure. Direct Form I requires 2N storage elements for order N filter. Consider using a Direct Form II or Direct Form II Transpose structure, which use only N storage elements. Direct Form II has the same computational complexity as Direct Form I, but it requires fewer storage elements, so you should be able to save a few cycles by eliminating assignment operations. Also, consider using a cascade of 2nd order sections, which is more resistant to coefficient quantization and roundoff errors than other forms. For this, check out the matlab function ‘zp2sos’.