copying strings

Hi,

is there any way I could write something like:

char buf[10];

buf = “test”;

I would like to have this compiled as something like that:

mov buf,‘t’

mov buf+1,‘e’

mov buf+2,‘s’

mov buf+3,‘t’

The benefit of having this would be that the “test” string would not be defined in the data segment but only as immediate values in the flash memory. This would increase speed and lower sram usage while augmenting flash usage. I am using an atmega1284p which has 128k of flash memory and I want to make use of that.

I don’t know your C, but maybe standard stuff works. Try the examples shown in http://www.crasseux.com/books/ctutorial … rings.html

Note that you don’t necessarily need to count characters.

See example 5 in http://www.ehow.com/how_2056297_initial … les-c.html

If you don’t plan to change the string at run time, add ‘const’ before ‘char’.

AVR-LIBC has a set of functions for that. You probably will want to read: http://www.nongnu.org/avr-libc/user-man … space.html

The documentation explains it much better than I could. One caveat, remember that in C all strings are NULL terminated, so make sure your buffers are set up for that extra byte.

char buffer[4] = “TEST”; is ok because “TEST” is actually “TEST\0” so buffer[0-4] can hold it

char buffer[3] = “TEST”; overflows the buffer

You also may want to look at the string.h library.

Yeah i’ve seen the pgmspace functions but the thing is that it will read the byte to a register (or even ram) and then rewrite it to another place in ram. The way I was describing it in my first post would be to have the characters be written one by one and used as immediate values. Just like doing c=2, c=3, c=4 etc… or just as if I would do: buf[0]=‘a’, buf[1]=‘b’ etc…

If it isn’t possible then I might just use the pgmspace functions.

Flash has to be programmed in blocks, so I don’t see a way of doing it easily and efficiently. In theory you could use the read-while-write bootloading functions, but you’d have to pull the entire block into SRAM, make your change and write it back. I don’t have the 1284’s datasheet, but you may be able to use the EEPROM features to do it.

Actually, I was to avoid reading flash memory.

Basically, I would like to have a macro such as:

WRITETOBUF(buf,“0123”)

which would produce code like this:

buf[0]=48 // 48=‘0’

buf[1]=49

buf[2]=50

buf[3]=51

where “buf” is a buffer in sram. In this example, “0123” will not be stored sram and wont have to be read from flash (other than by the execution unit in the MCU)

dumais:
where “buf” is a buffer in sram. In this example, “0123” will not be stored sram and wont have to be read from flash (other than by the execution unit in the MCU)

Now I’m confused. Where do you intend to store “0123” if not in the Flash and not in SRAM? Are you trying to push your string onto the stack?

well like I said, “0123” is just a series of immediates,

My ASM is rusted but I believe that each instructions would be saved like this:

mov buf+0,48

mov buf+1,49

mov buf+2,50

mov buf+3,51

48,49,50,51 are all stored in flash but they are read by the execution unit (or whatever that thing is called). so there would be no need to read the string from flash using pgmspace.h functions.

Why do you care how the registers are initialized? Are you interested in saving execution time, or are you worried about saving space in the program memory?

Can’t you try the C code and see what it generates? It may generate exactly what you are looking for.

I have not used your assembler, but a quick search seems to say that LDI would be your mnemonic.

What are you trying to accomplish?

hehe LDI, that’s right. I haven’t done avr assembly for a while now.

basically I want to gain performances, reduce sram usage and use more flash.

So my description of how to load a string seems to be the best way. Just like using compile sprites in old games on the 80386.

So I want to find a way for the compiler to break it down for me. So instead of doing:

If I do:

char buf[6]=“01234”; it works like a charm. Looking at the disassembly, it does exactly that. But the problem is that this will only work for initialization.

I guess I will have to write a script that will change my code for me.

to save RAM in an AVR, for constants, numbers or especially strings, use the avr lib for GCC, and the “xxx_P” functions and macros.

These put constants in flash and don’t copy them to RAM at initialization.

there’s a particular .h for those - it’s something like <avr/progmem.h>

So you code

PSTR(“hello world”); and that goes to flash.

There are routines like

printf_P()

that take the format string from flash rather than RAM.

And puts_P() and so on.

where the _P means program memory.

That .h also gives macros and typedefs for putting constants in flash.

The AVR and PIC have two address spaces - one for flash program and constants and one for RAM. They overlap numerically, i.e., there is an address 0 in both spaces. So there have to be special instructions to fetch data (not instructions) from flash. This is why there are the _P library things.

The megaAVRs do not bank-switch RAM or instruction address spaces (up to 64KB) whereas most popular 8 bit PICs have an unfortunately tiny page size. This is a huge simplification and avoids lots of bank switching code wasting space.

Nice thing about the small and large ARMs is that they too have both flash and RAM, but the address spaces never overlap. Thus all instructions work for either.

The AVR/PIC dual address spaces date way back to an era when large static RAMs were very expensive.

The ARMs, esp. NXP, keep flash speeds from slowing the processor by doing very wide reads of flash - 8 or so bytes per read, then they take the needed chunks to execute the instruction in that area. Data in flash is retrieved using the same bit/byte/word/longword instructions as for RAM.

And instructions/code can be placed and executed from RAM, usually for debugging.

The LPC21xx and LPC176x are really convenient to use and for hobby projects, are not significantly more expensive than high end AVR/PIC 8 bitters.

PIC and AVR 32 bit CPUs are nonsensical to me, given ARM and how its licensed to so many vendors. I can’t see how AVR32 or PIC32 can compete - just based on popularity of ARM and lower risk of longevity.

So it’s a mistake to say that a “Harvard” architecture micro is for flash and RAM address spaces. The ARM has both memory types but in one linear address space, so I say it’s not Harvard. Since most compilers are von Neuman oriented, the single linear space is much easier on the software developer.

Thank you for the usefull information.

But unfortunately this still doesn’t do it.

By writting:

int main()
{
char *c1=PSTR("te");
char c2[2];

c2[0]=c1[0];
c2[1]=c1[1];

}

I get

    ldi r24,lo8(__c.1386)
    ldi r25,hi8(__c.1386)
    std Y+2,r25
    std Y+1,r24
    ldd r30,Y+1
    ldd r31,Y+2
    ld r24,Z
    std Y+3,r24
    ldd r24,Y+1
    ldd r25,Y+2
    movw r30,r24
    adiw r30,1
    ld r24,Z
    std Y+4,r24

See all the memory transfers?

but if I do this:

int main()
{
char c2[2];

c2[0]='t';
c2[1]='e';

}

I get this:

    ldi r24,lo8(116)
    std Y+1,r24
    ldi r24,lo8(101)
    std Y+2,r24

we can see that this code will execute way faster because we use les instructions, and we are loading an immediate value instead of doing a memory transfer. So I want the compiler to do this automatically for me when the strings are defined at compile time. The optimization flags tend to help some times but I would prefer not use optimization and specify this behavior manually.

putting constants in ram at run time is always faster at the expense of space in RAM. Most often, the speed penality to get from Flash is insignificant as compared to the benefit of not filling of a large amount of RAM with constants like strings used for display (slowly too). And of course, these constants must be in flash anyway, so they’re copied to RAM by the block init startup. So you code can just access them directly from flash. At least that’s how I’ve always done AVR code that has a lot of string constants.

hmm, from what I understood (and correct me if I’m wrong), the instruction register is loaded with the next instruction directly from flash and then fed to the ALU. So the instruction never goes in ram. Since the constant I am passing is part of the instruction (ldi r24,101 is one full instruction) it will never even go in ram, it will be loaded in the r24 register directly. Avoiding a memory transfer. The only “memory transfer” is the one the the instruction register needed to be loaded but that is insignificant because we would need it anyway if we were to do it any other way. When I do “std Y+2,r24” there is also a memory transfer from register to RAM that I would have to do any way.

SO instead of having:

FLASH->iReg (loading instruction register with “ldd r24,Y+1”)

FLASH → r24 (reading at location Y+1)

… other overhead for changing the value of ‘Y’ to point in RAM

FLASH->iReg (loading instruction register with “std Y+3,r24”)

r24 → RAM (copy r24 to ram)

I would have:

FLASH->iReg (loading instruction register with “ldd r24,101”)

r24 → RAM (copy r24 to ram)

But of course, if you say that the instruction would be prefetched from FLASH to RAM and then loaded from RAM to the instruction register, then it would take more ram. But according to this: https://ccrma.stanford.edu/workshops/20 … llers.html it does not seem to be the case.

Thank you for your time. At this point though, I do understand that the compiler cannot do what I want to do. I would have to do it myself by coding all my strings like this:

bufToSendToenc28j60[0]=‘t’;

bufToSendToenc28j60[0]=‘e’;

bufToSendToenc28j60[0]=‘s’;

bufToSendToenc28j60[0]=‘t’;

But it is going to be a pain to do it this way. I will use the method that you showed me in your other post.

yes, load immediate has its operand in flash along with the instruction op code.

That’s fine for small sized constants.

yes, use the AVR library xxx_P functions. They make it easier, but not as easy as a single address space processor like ARM or ARM Cortex. But AVRs were aimed at applications that are done in high volume so more development time/costs are not as critical.

AVRs do not/cannot execute code from RAM.

That URL link points to a short dissertation written by a naive student is riddled with errors and misstatements.

This may be a little old, but I figure I’ll add to it anyways.

dumais:
Since the constant I am passing is part of the instruction (ldi r24,101 is one full instruction) it will never even go in ram, it will be loaded in the r24 register directly. Avoiding a memory transfer. The only “memory transfer” is the one the the instruction register needed to be loaded but that is insignificant because we would need it anyway if we were to do it any other way. When I do “std Y+2,r24” there is also a memory transfer from register to RAM that I would have to do any way.

AVR does not have a SRAM->SRAM instruction. Everything has to go through registers.

As stevech has pointed out, by using the LDI instructions you are trading off program space for execution time.

I don’t think I mentionned anything about sram to sram. :slight_smile:

bufToSendToenc28j60[0]=‘t’; will load an immediate to register, then register to sram.

I don’t think I am trading program space for execution time either. Because using LDI instead of fetching from pgrm space will: save program space by using less instruction (see post #12) and save execution time by having less instructions to execute.

So basically, if there is a way of doing this faster/smaller:

ldi r24,lo8(116)
std Y+1,r24
ldi r24,lo8(101)
std Y+2,r24

Then, I sure would like to know. Otherwise, I would like to know how to generate the previous example using C and using the values in a “string” form instead of char by char like the example in post #12.

Should have been memory, not SRAM. AVR has no memory->memory instruction.

dumais:
I don’t think I am trading program space for execution time either.

Depending on the length of your string, at some point the LDI instructions will exceed the size of the program read byte macro + the program memory usage of the string itself.

The LDI instruction requires 2 bytes of progspace per character. The program read byte function requires N bytes of progspace, and each character requires 1 byte progspage. At some point those lines intersect. For small strings, the LDI instruction is smaller, but for larger strings the program read byte is smaller.

As for how to generate the LDI/STD instructions more cleanly in C, it seems like something a cleverly written macro can do. The following is very untested, but it might give you an idea how to start. It should iterate over the string and create var[n-1] = str[n-1]; var[n-1] = str[n-2]; … var[0] = str[0];

#define LOAD_STRING(var_name, str, len) { var_name##[##len-1##]=str[len-1]; if(len>0)LOAD_STRING(var_name, str, len-1); }

Hmm, the example does not work but I see what you are getting at. I will play around with that. Thanks!

As for program space and the string lenght: yes you are right, sorry about that. This is exactly what I want though. I have 128K of flash and I am only using 16k right now, so I would like use the rest of it in order to gain more speed.

Thanks again. That should do it.

Meh. For some reason I thought you could do recursive macros, but I must have been mistaken.

I was also looking at this: http://stackoverflow.com/questions/1286 … a-sequence, but I cant quite get it to work with the string either.

If you like python, the following

with open("tmp.txt", 'w') as f:

    s = "test_string";
    f.write("char str[");
    f.write(str(len(s)));
    f.write("];\r\n");
    
    for i in range(len(s)):
        f.write("str[");
        f.write(str(i));
        f.write("] = '");
        f.write(str(s[i]));
        f.write("';\r\n");

import os;
os.startfile("tmp.txt");

will generate

char str[11];
str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '_';
str[5] = 's';
str[6] = 't';
str[7] = 'r';
str[8] = 'i';
str[9] = 'n';
str[10] = 'g';

which generates

str[0] = 't';
  94:	84 e7       	ldi	r24, 0x74	; 116
  96:	89 83       	std	Y+1, r24	; 0x01

str[1] = 'e';
  98:	85 e6       	ldi	r24, 0x65	; 101
  9a:	8a 83       	std	Y+2, r24	; 0x02

[... et al ...]

The python should be clean enough for you to modify to fit your needs. I’m a little upset I cant figure out how to do it with the macros though… My problem is that I can’t break up the passed string into its individual characters in the macro. Somehow you have to get from “string” to ‘s’, ‘t’, ‘r’, ‘i’, ‘n’, ‘g’. I think it probably works if you pass each character as an individual parameter, but thats not a whole lot better than doing it all by hand.