8 and 16 Bit Integer Operations on 32 Bit Platform

Hello all,

Some time ago, I decided to shift from 8 bit AVRs to the 32 bit ARM Cortex M3 core MCUs. After struggling with Eclipse etc for a long time, I’ve now been successfully writing code and programming the LPC1768 with the evaluation version of the Keil MDK.

Anyway, on the 8 bit AVRs, the operations with integers above 8 bits (uint16_t, uint32_t etc) are software emulated with multiple 8 bit operations and are therefore slower. I’m sure it’ll be something similar with uint64_t on the 32 bit ARM.

But what about smaller integers, say uint8_t on a 32 bit platform? I was wondering if there’re any drawbacks to using uint8_t instead of uint32_t if I only needed 8 bits. Would the hardware store it as 8 bits, access 8 bits, convert to 32 bits, perform operations and convert it back to 8 bits? Or is it capable of operating on it directly?

The answer can likely be found if I understood ASM code, but unfortunately, I still can’t! It’ll be really great if someone could shed some light on this. Thank you!

The compiler will take care of most conversions between char, short and int which in the ARM world are 8,16 and 32 bits. It will also allocate memory according to that declaration.

One thing that often results in errors are whether you define those as signed or unsigned. This leads to the need to extend the sign bits in to the upper bits. For instance -1 is 0xFF, 0xFFFF and 0xFFFFFFFF depending on the size of the variable. Again the compiler will do much of this for you, but if you mix signed and unsigned it can lead to errors.

This will probably cause a bunch of flames, but don’t worry too much about assembly language. Its useful to understand the concepts of registers and bit encoding and some ideas of instruction encoding, but for the most part the need to deal in assembly language is not necessary. Note to flamers-- this is from someone who started writing assembly language on embedded controllers in the 1970s, and other than writing compilers, I don’t deal with assembly much anymore.

Interesting. Thanks for the reply!

When you said “the compiler does it”, does this mean all operations have to be done as 32 bit integers? For example, would addition of two 8 bit integers take more operations than if I used 32 bit integers?

I looked into assembly a little while back but the learning curve is indeed rather steep. It seems like something good to know since I’m very much into optimizing code.

The registers are all 32 bits and the accumulator is 32 bits. So there are machine instructions that load or store byte values.

So byte values will be loaded, the addition will happen as 32 bits, and the result will be saved with a byte store. The compiler handles all that for you. It takes no additional time to do 32 bit operations on a 32 bit CPU. It’s the 8 and 16 bit CPUs that need multiple instructions to to 32 bit operations.

Spend your time on better design of your code, and you won’t be spending time optimizing. Compilers do a pretty good job of optimizing these days, and with a 100 MHz there are very few examples of where you’ll be hand tuning code. Most professional programmers don’t even know the instruction sets of the machines they are programming, this is not a criticism, just a fact. They don’t need to know that, and it doesn’t really help their work. Yes there are some exceptions, the best is those doing DSP programming, where they do spend time optimizing code and register use, but that is probably about 0.001% of the programmers out there.

I apologise if my question was ambiguous, but what I meant was if 8 bit operations be slower than 32 bit operations on a 32 bit platform. On 8 or 16 bit platforms, multiple operations required to compose the 32 bit operation obviously make it slower, and its similarly clear that 32 bit integer operations on 32 bit platforms would be like 8 bit integer operations on 8 bit platforms, requiring only 1 instruction excluding loading registers.

Anyway, I too believe planning before coding is definitely very impprtant, but it’s still good to reduce bad coding habits that may otherwise contribute to unnecessarily high overheads.

8bit and 32bit operations on an ARM are the same speed

Ok, thanks again for replying. I assume the 8 and 32 bit operation instructions are exactly the same then?

Hello,

Yes they are the same … the Cortex-M3 also have special instructions to “extend” signed bytes/short to int (SXTB / SXTH).

You also have different instructions to load and save bytes/shorts to/from registers.

But basically arithmetic operations are done on 32 bit integers.

Thomas.

[EDIT] There is also a division instruction which only require 2 clock cycles (on PIC24, requires IIRC 18 cycles for 32/16bit division). I LOVE the Cortex-M3 and STM32F2xx :wink:

Ota:
Hello,

Yes they are the same … the Cortex-M3 also have special instructions to “extend” signed bytes/short to int (SXTB / SXTH).

You also have different instructions to load and save bytes/shorts to/from registers.

But basically arithmetic operations are done on 32 bit integers.

Thomas.

…I LOVE the Cortex-M3 and STM32F2xx :wink:

Did you do a non-trivial comparison of STM and NXP for Cortex M3? I'm curious, given their relative market shares.

The nice thing about the STM32F (which accounts for over 50% of the Cortex-M market) parts is that the peripheral modules are pretty much identical throughout the entire product range. This is unlike the NXP where, for example, some parts have 16/32-bit timers while others are 32-bit only. Also, they offer a 2-channel, 12-bit DAC which none of the others (except Kinetis) do. The low-end STM32F100 starts at under $1 and are price/performance competitive with other vendors 8 and 16-bit parts.

On the high-end (>$4-5 unit price) the STM32F10x parts are price/performance competitive with anything from NXP. However, the Kinetis (Cortex-M4) parts from FreeScale currently set the bar with things like DSP, DDR and NAND controller, 16-bit ADC, PGA, up to 150MHz, hardware encyption etc.

Hello,

stevech:
Did you do a non-trivial comparison of STM and NXP for Cortex M3? I’m curious, given their relative market shares.

Yup, and for my project, peripherals are FAAAAAAAR ahead on the STM32F2xx than any NXP (and any other Cortex-M3 based product).

Need a lot of PWM input / output, etc.

Just the I2C unit is crap :frowning: (will switch for SPI).

Thomas.

Ota:
Hello,

stevech:
Did you do a non-trivial comparison of STM and NXP for Cortex M3? I’m curious, given their relative market shares.

Yup, and for my project, peripherals are FAAAAAAAR ahead on the STM32F2xx than any NXP (and any other Cortex-M3 based product).

Thomas.

In summary, the superiority is?

Hello,

Can’t say a lot but 120MHz with 0 wait state (through ART) plus 17 timers (IIRC), almost infinite IC/OC, 6 U(S)ART, ST Periph lib, lots of examples …

The 17 timers and XX input capture, output compare is the main point for my project.

But of course this is the decision process for THIS project, maybe on another project I’d choose an NXP or TI, etc.

Regards,

Thomas.

PS : I’m using STM32F205R MCU.

I’m also using STM32F2xx. Any by the way, I’m using the I2C on it - it’s working fine for me.

I2C on the STM32 doesn’t seem trivial or straight forward as I thought it would be! I was able to work out the USART, but the I2C seems much more trouble! I’m curious frankvh, how you worked it out? which libraries have you used? and what toolchain?