Hello, new member here, been reading for some while now i have an issue.
Did anyone else experiende gcc not producing a ‘mul’ asm mnemonic when multiplying a variable with a constant?
In my case it produces a whole lot of code with mov’s shifts etc.
Does anyone know how to avoid this behavior (other than writing an ASM instruction)?
Thank you.
Depending on the arm processor, that can actually be faster! [using the mov / shift].
IIRC there is an mcpu / march flag which controls which arm you’re targeting, and that should cause it to pick the fastest sequence for that particular arm variant. However, I don’t quite recall the exact flags, so those should be looked upon as a starting point.
Cheers,
–David Carne
Hm, that makes somehow sense.
But I red somewhere that a mul takes max 4 cycles, is this correct? Because even 4 cycles are faster than 8 instructions (which is now). Or am i wrong?
A mul instruction on an ARM7TDMI-S takes up to 5 cycles, depending on the operand size, but as David already mentioned you have to tell GCC that it should optimize code for this processor. The default is probably an older target like ARM6 where a multiply could take up to 17 cycles.
You should use the -march=armv4t and -mcpu=arm7tdmi-s switches to have GCC output code for your architecture and core.
Regards,
Dominic
Ok thanks for the replies.
I’m sure i have the -mcpu=arm7tdmi-s switch, not sure about the first though.
The operand is somewhat small - 1000. So i think it would be faster.
I’ll check some more.
Just out of curiosity: how does the compiler determine when to implement literals as a bunch of 8-bit values shifted and added together and when to simply load it from a memory location?
Is there any awareness in the compiler of how effective flash read operations are at all?
On the LPC21xx I think this can make quite a difference, but can become quite complicated especially with the MAM.