Some months ago I posted a speed comparison between the LPC2000 and AT91SAM CPUs: viewtopic.php?p=43398
I think it is time to update the benchmarks because I can provide the scores for the new STM32, and also because the new GCC compiler, now at version 4.3.0.
Test setup
The CPU were tested in ARM mode using speed settings and in THUMB mode using code size reduction settings. The STM32 is tested in THUMB2 mode of corse (it has no ARM mode).
The benchmarks
AT91SAM7X256, ARM mode (-O2 -fomit-frame-pointer -mabi=apcs-gnu)
Kernel size: 6.028 bytes
*** Kernel Benchmark, context switch test #1 (optimal):
Messages throughput = 113365 msgs/S, 226730 ctxswc/S
*** Kernel Benchmark, context switch test #2 (no threads in ready list):
Messages throughput = 89245 msgs/S, 178490 ctxswc/S
*** Kernel Benchmark, context switch test #3 (04 threads in ready list):
Messages throughput = 89245 msgs/S, 178490 ctxswc/S
*** Kernel Benchmark, threads creation/termination:
Threads throughput = 72051 threads/S
*** Kernel Benchmark, I/O Queues throughput:
Queues throughput = 240596 bytes/S
AT91SAM7X256, THUMB mode (-Os -fomit-frame-pointer -mabi=apcs-gnu )
Kernel size: 3.808 bytes
*** Kernel Benchmark, context switch test #1 (optimal):
Messages throughput = 96647 msgs/S, 193294 ctxswc/S
*** Kernel Benchmark, context switch test #2 (no threads in ready list):
Messages throughput = 83775 msgs/S, 167550 ctxswc/S
*** Kernel Benchmark, context switch test #3 (04 threads in ready list):
Messages throughput = 83775 msgs/S, 167550 ctxswc/S
*** Kernel Benchmark, threads creation/termination:
Threads throughput = 72268 threads/S
*** Kernel Benchmark, I/O Queues throughput:
Queues throughput = 242252 bytes/S
LPC2148, ARM mode (-O2 -fomit-frame-pointer -mabi=apcs-gnu -falign-functions=16)
Kernel size: 6.512 bytes
*** Kernel Benchmark, context switch test #1 (optimal):
Messages throughput = 142327 msgs/S, 284654 ctxswc/S
*** Kernel Benchmark, context switch test #2 (no threads in ready list):
Messages throughput = 110956 msgs/S, 221912 ctxswc/S
*** Kernel Benchmark, context switch test #3 (04 threads in ready list):
Messages throughput = 110955 msgs/S, 221910 ctxswc/S
*** Kernel Benchmark, threads creation/termination:
Threads throughput = 93770 threads/S
*** Kernel Benchmark, I/O Queues throughput:
Queues throughput = 343752 bytes/S
LPC2148, THUMB mode (-Os -fomit-frame-pointer -mabi=apcs-gnu -falign-functions=16)
Kernel size: 4.208 bytes
*** Kernel Benchmark, context switch test #1 (optimal):
Messages throughput = 98118 msgs/S, 196236 ctxswc/S
*** Kernel Benchmark, context switch test #2 (no threads in ready list):
Messages throughput = 82958 msgs/S, 165916 ctxswc/S
*** Kernel Benchmark, context switch test #3 (04 threads in ready list):
Messages throughput = 82956 msgs/S, 165912 ctxswc/S
*** Kernel Benchmark, threads creation/termination:
Threads throughput = 73291 threads/S
*** Kernel Benchmark, I/O Queues throughput:
Queues throughput = 241820 bytes/S
STM32, THUMB2 mode (-O2 -fomit-frame-pointer -mabi=apcs-gnu -falign-functions=16)
Kernel size: 4.576 bytes
*** Kernel Benchmark, context switch test #1 (optimal):
Messages throughput = 157965 msgs/S, 315930 ctxswc/S
*** Kernel Benchmark, context switch test #2 (no threads in ready list):
Messages throughput = 132211 msgs/S, 264422 ctxswc/S
*** Kernel Benchmark, context switch test #3 (04 threads in ready list):
Messages throughput = 132211 msgs/S, 264422 ctxswc/S
*** Kernel Benchmark, threads creation/termination:
Threads throughput = 113976 threads/S
*** Kernel Benchmark, I/O Queues throughput:
Queues throughput = 377696 bytes/S
STM32, THUMB2 mode (-Os -fomit-frame-pointer -mabi=apcs-gnu -falign-functions=16)
Kernel size: 4.400 bytes
*** Kernel Benchmark, context switch test #1 (optimal):
Messages throughput = 148646 msgs/S, 297292 ctxswc/S
*** Kernel Benchmark, context switch test #2 (no threads in ready list):
Messages throughput = 130769 msgs/S, 261538 ctxswc/S
*** Kernel Benchmark, context switch test #3 (04 threads in ready list):
Messages throughput = 130769 msgs/S, 261538 ctxswc/S
*** Kernel Benchmark, threads creation/termination:
Threads throughput = 108285 threads/S
*** Kernel Benchmark, I/O Queues throughput:
Queues throughput = 368876 bytes/S
The STM32 clearly outperforms the other 2 micro controllers, the new Cortex-M3 core is clearly a winner compared to the ARM7.
The LPC2148 clearly outperforms the AT91SAM7X256 in ARM mode but in THUMB mode there is not much difference.
The STM32 code size efficiency is very high, it is close to the classic THUMB mode while allowing a much much better performance.
I tried to make the comparison as fair as possible, all the CPUs are clocked at the same speed (48MHz) even if the upper limits are very different: 60MHz for the LPC2148, 55MHz for the AT91SAM7X256 and 72MHz for the STM32. The speed difference is even greater than the reported scores when using the chips at their top speed.
Giovanni