I have a mostly super-simple program that I program into an Olimex LPC-P2148 with an ARM-USB-OCD. It reads the pushbuttons and turns the two LEDs on and off. It is split into several functions, and runs in a loop.
When everything is compiled to run out of flash, it works perfectly. It will even run properly when some of the do-nothing functions are set up to run out of ram. However most combinations of running out ram will fail in various ways that suggest that some call or return has failed.
Cross-compiled on Debian/Linux-testing: binutils-2.17, gcc-4.1.1 (also tried gcc-4.2.0).
The startup code is very close to that shown in the embedded systems series of articles for bare-metal system development, with appropriate translations for the LPC2148. No compiler, assembler, or linker errors.
Eyeballing the code and listings has been fruitless.
Any thoughts regarding what might cause this problem? Regrettably I don’t think I can meaningfully run insight since this does not happen when everything starts out running in ram.
It could be everything, a stack overflow on the code, or an array overflow as example, also make sure to set the MAM to mode 2 (fully enabled), there is an errata on that chip that affects RAM accesses when the MAM is not properly set up.
I use that board for development too, some time ago I tried to move some critical parts of my project into RAM but it seems there are no performance advantages.
I use GCC 4.2.1, binutils 2.17, Eclipse 3.3+Zylin plugins.
Thanks for the idea. In checking NXP’s errata sheet (which I had, but had forgotten)- I had the right values, but mistakenly was writing the MAMTIM value after the enable. Unfortunately that didn’t fix anything…
I’m still trying to find out what’s wrong. My latest contender is that the assembly/linker is not working properly. Could someone please check this? I have a locator script which includes the following memory section:
.fastcode :
{ __fastcode_load = LOADADDR(.fastcode);
__fastcode_start = .;
*(.glue_7t) *(.glue_7)
*(.text.fastcode)
. = ALIGN (4);
__fastcode_end = .;
} >ram AT>flash
However, in checking the listing file:
__fastcode_load : 40000310 (!?)
__fastcode_start : 40000200
__fastcode_end : 40000220
That __fastcode_load value seems completely wrong, shouldn’t it point to flash? If my surmisal is correct, any idea how binutils could have done this? I’ve recompiled but nothing’s changed.
/*
ChibiOS/RT - Copyright (C) 2006-2007 Giovanni Di Sirio.
This file is part of ChibiOS/RT.
ChibiOS/RT is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
ChibiOS/RT is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* LPC2148 memory setup.
*/
__und_stack_size__ = 0x0004;
__abt_stack_size__ = 0x0004;
__fiq_stack_size__ = 0x0010;
__irq_stack_size__ = 0x0080;
__svc_stack_size__ = 0x0004;
__sys_stack_size__ = 0x0100;
__stacks_total_size__ = __und_stack_size__ + __abt_stack_size__ + __fiq_stack_size__ + __irq_stack_size__ + __svc_stack_size__ + __sys_stack_size__;
MEMORY
{
flash : org = 0x00000000, len = 512k - 12k
ram : org = 0x40000200, len = 32k - 0x200 - 288
}
__ram_start__ = ORIGIN(ram);
__ram_size__ = LENGTH(ram);
__ram_end__ = __ram_start__ + __ram_size__;
__dma_start__ = 0x7FD00000;
__dma_size__ = 8k;
__dma_end__ = 0x7FD00000 + __dma_size__;
SECTIONS
{
. = 0;
.text :
{
_text = .;
*(.text);
*(.rodata);
*(.rodata*);
*(.glue_7t);
*(.glue_7);
. = ALIGN(4);
_etext = .;
} > flash
_textdata = _etext;
.data :
{
_data = .;
*(.data)
. = ALIGN(4);
*(.ramtext)
. = ALIGN(4);
_edata = .;
} > ram AT > flash
.bss :
{
_bss_start = .;
*(.bss)
. = ALIGN(4);
*(COMMON)
. = ALIGN(4);
_bss_end = .;
} > ram
}
PROVIDE(end = .);
_end = .;
__heap_base__ = _end;
__heap_end__ = __ram_end__ - __stacks_total_size__;
You also have to specify -mlong-calls as GCC option and mark the functions you want in RAM like this:
I have confirmed that it was the value returned by LOADADDR that was the problem. Creating the value to be “what it should have been” fixed the original problem - I can now do whatever I want taking functions in and out of the “fastcode” (ram) section, and it works in each case.
Thanks for the warning regarding efficiency of RAM code reduced by the long calls. That will be easy for me to test in my application. You are using some other method (not LOADADDR) to get the addresses for your .data section copy, as I have not done - that’s certainly one possibility.
It still seems really wrong for LOADADDR to give a bogus value, and makes me a bit nervous about the assembler/compiler/… that I’ve built. At a minimum I will have to run the unit tests for the suite. Silly me, I thought that compiling without errors would be sufficient… apparently not!
I cobbled a simple benchmark: basically two different functions: a short simple loop that did little more than toggle one of the LEDs; and a slightly longer one that intentionally jumped around a bit in addition to toggling the same LED. Every so often the loop expired, and the pushbuttons were read. This way more than one function could be tried for each compile cycle. GCC optimization was kept at -O2. The GCC docs allude to possibly optimizing calls to be short when the jump distance is small enough - it clearly wasn’t doing this, as noted by inspecting the listing files. That might make a significant improvement!
Execution speed was measured by probing the LED with an oscilloscope.
Functions running from RAM were the fastest (despite the long calls);
Functions running from flash, using long calls were the slowest.
And yes, flash-based functions with short calls were intermediate.
The difference was substantially affected by the kind of function (as described above), with the groups separated by 10-25%.