Best way to parse

Chupa · January 27, 2009, 5:43pm

I feel like there’s a better way to parse data using C than how I am currently doing it. I am receiving a packet via UART, storing it in a buffer, then running it through a huge array of if statements to parse out the meat and potatoes. It hardly seems like the most efficient way to do this. One of the other things that bugs me is how do I decide when to actually run the packet in the buffer to the packet? What I have been doing is parsing the packet length ASAP and then count bytes after that till the number of bytes in the buffer matches the packet length. When it does I send the whole packet to the parser to get the data.

TLDR: Im looking for any information or guidelines on the most efficient way to parse data on a µC!

leon_heller · January 27, 2009, 6:32pm

A state machine is the best way to do it.

Leon

signal7 · January 29, 2009, 12:12pm

You may be able to get a tiny bit of efficiency using switch statements instead of if statements, but I don’t think the generated assembly code will really be all that different. The compiler is really pretty good at optimization. A switch statement looks neater in the code, though.

If you have all of your parser logic in one big function call, you may want to break it up into smaller chunks. It’ll be easier to code/understand than trying to fit it all into one function.

Why do you need the efficiency? Are you running out of flash? Is the processing of the packet taking too long?

newbie123 · January 29, 2009, 12:26pm

What about a checksum byte at the end of every packet?

Chupa · January 30, 2009, 12:01am

signal7:
You may be able to get a tiny bit of efficiency using switch statements instead of if statements, but I don’t think the generated assembly code will really be all that different. The compiler is really pretty good at optimization. A switch statement looks neater in the code, though.

If you have all of your parser logic in one big function call, you may want to break it up into smaller chunks. It’ll be easier to code/understand than trying to fit it all into one function.

Why do you need the efficiency? Are you running out of flash? Is the processing of the packet taking too long?

yea im working with switch statements now.

I do need to get better at breaking it up into multiple functions. It looks like a huge mess now in one huge one.

As far as efficiency im just always worried there’s a better way to do something than how im currently doing it. Thats all i wanted to know.

Each packet is ended with a checksum which I am working with as well.

signal7 · January 30, 2009, 12:37pm

All I could offer is that maybe you would want to take a look at how a compiler works. Given the large number of different syntax one could use in a programming language, the compiler has to reduce everything down to machine code. Also, computer science has solved that problem many times, so it would probably be the most helpful. I’m just not sure you need anything that complex on a microcontroller, though, but it might give you some ideas.

A google search for lex and yacc would get you started. Here’s one result that seems pretty straight forward: http://dinosaur.compilertools.net/

riden · January 30, 2009, 2:25pm

I’ve been following this thread waiting for more specifics about the data being parsed. However, I’ll will say that the “best” way to parse data (in your case I’ll assume that is a combination of the smallest code footprint followed by speed) is highly dependent on the nature of the data. Is the packet data deterministic? Are there different packet types that must be parsed? Do the packets have variant data (same basic type of packet that can contain different data structures depending upon a sub-type)? I always look at the nature of the data and then determine the appropriate approach to take with this data set.

Regarding the checksum calculation, almost every parser loop has a getNextByte to process section, which is an ideal place to put the code. Once a packet has been parsed, grab the checksum and clear the variable for the next packet.

As Leon said, state machines are generally the way to go. One downside, for me, is that programs that use them can be harder to debug. My mind works in a top-down way and it takes me more effort to wrap myself around what is happening at any point of the program execution. A breakpoint placed in a state machine can be tripped at many places in the program as a given state may be encountered in many places in program execution. My explanation may not be the clearest, but when you encounter the situation, you’ll know what I mean. Still, a well conceived state machine is a thing of beauty.

So, how about some details about your packets?

leon_heller · January 30, 2009, 2:41pm

State machine code generators are available - you just create the state diagram and the C code is produced for you.

Leon

Chupa · January 31, 2009, 6:32am

Its just simple xbee module API packets. Standard format is start delim, length (2 byte), API frame identifier (defines what type of frame it will be), API frame type specific data, checksum.

My current strategy is to put all incoming data into a buffer array. After the 3rd received byte call a function to check the length. After X more bytes are received where X=buffered packet length run the checksum check. If it returns good run the parser to get the desired data.

There are 4 different types of API packets that im interested in now. One of which is just a data packet, the other 3 are specific status packets indicating the status of the zigbee network, and will require more “pick and place” parsing.

Im not TOO worried about efficiency from a code size/speed point of view. Efficiency was probably the wrong word to use. I was more interested in knowing the proper way to do it, and its clear now that a state machine is the proper way to do it.

IM going to check out this yacc stuff. thanks for the tip!

stevech · January 31, 2009, 8:31am

I’d think that Digi or someone on the 'net has shared some C code to parse those API packets.

And generate them.

You’re referring to the binary API, right? Rather than the ASCII “Hayes modem” AT command API? I used the latter for a rushed but complicated 10 node project. Normal mode: transparent data sent out as broadcast and back as unicast. I used AT commands rarely to change addresses and get contents of “s” registers. This, because I didn’t have time to write the binary API - and this was long ago before XBee’s were as popularly used.

leon_heller · January 31, 2009, 12:50pm

YACC is intended for generating compilers, and produces vast amounts of code! It is sometimes used for parsing stuff like manual text entry, but the code is going to be run on a PC where efficiency doesn’t matter.

Leon

Topic		Replies	Views
Serial.read noob MicroMod avr-atmel	1	1029	March 29, 2010
Packets and C? MicroMod avr-atmel	6	1580	March 21, 2009
NMEA parcing MicroMod avr-atmel	3	4280	May 15, 2007
Serial Communication MicroMod avr-atmel	4	1530	December 1, 2012
ST ARM Cortex M-3 (Olimex board) MicroMod arm-lpc	4	1824	April 23, 2008

Best way to parse

Related topics