Xbee Pro sending garbage after several hours of good data

We’re using two Zigbee 2.4GHZ Xbee Pro units (2.4GHz, u.fl antenna connectors) paired with a PIC18LF2680 to do telemetry between a solar car and a chase vehicle. Everything works well for a while, but after several hours of running the system continuously, we find that the “good” data rate falls to 20% of the overall data rate. It starts at about 98%.

The PICs are on 8MHz crystals and communicate with the Xbees at 115.2 kbps. The Xbees are configured for transparent mode, but have an encryption key to avoid accidental crosstalk with other modules which may be in the area.

Our data packet looks something like this:

1 start byte

4 bytes of timestamp

1 bytes of origin ID

1 bytes of message ID

8 bytes of payload

1 hash byte ( [FNV-1a[/b], 8-bit)
The PIC has a software 256 byte buffer and can listen to the CTS line to stop sending data until the Xbee buffer empties out. The receiver is a computer. It listens to a packet, computes the hash. If it matches, it’s OK. If it doesn’t, it slides one byte down in the buffer and does it again. When there’s hash hit, you know the two streams are synchronized.
Does anybody have any idea what causes the successful packet rate to drop to 20% after a few hours of operation? It comes right back with a system reboot.
Thanks for the help!](Fowler–Noll–Vo hash function - Wikipedia)

need more info, system description; is this it?

System description: sensors → PIC → XBee (wireless) ->XBee → PC

XBee series 1 or 2?

API mode or transparent serial?

Packet rate

Your application’s behavior if a packet is lost or an XBee transmission error happens (CCA or ACK fault)

and so on

It’s a pair of series 2.5 Xbee Pro 2.4GHz. I think I mentioned in the original post the mode and application behavior, but it merits emphasis.

Both modules are in transparent mode, running at 115.2 kbps. The link is bidirectional, but the vast majority of data (99.9% or more) is from the car to the chase van. The software receives bytes from the Xbee module in a stream. It looks at the first byte, figures out the expected packet length, and then computes the hash (checksum) based on what it finds. If it’s a hit and checks out, then it assumes it has a good packet. If not, it throws away the first byte in the buffer and redoes the calculation. It does this until it comes up with a hit, at that point it knows that it is looking at good data. If a byte is lost, it de-synchronizes the two sides. At that time the packet fails the hash check and the receiving software throws away bytes until it comes up with another match.

The overall system looks something like this:

[45 CAN nodes] <-125 kbps CAN bus-> [Telemetry “node” with PIC18F2680] <-115.2kbps USART-> [Xbee Pro] <-2.4GHz wireless-> [Xbee Pro] <-FTDI USB/RS232-> [C# client program on laptop]

We do about 60 data packets per second. That’s about 960 bytes/sec, well below the nominal transmission rate.

Thanks for the help!

960 bytes/sec… 10 bits per byte on the UART (start/stop bits) = 9600 baud.

'15.4 frame, without ZigBee is just under 100 bytes each.

Check the XBee 2.5 - I’ve no experience with them, only the series 1 - but I thought Digi was required to use Ember’s ZigBee on all 2.5 modules, no bare 15.4 mode. If I’m wrong, great. If I’m right, you have the bytes in the frame lost to the ZigBee overhead.

The air link bit rate is 250Kbps. After overhead and ACK delays, I believe a realistic rate is like 80kbps net, absent any interference or competition for air time on the chosen RF channel. Competition comes from other '15.4 nodes, WiFi frames, and other 2.4GHz signals.

Errors and clear channel assessment (CCA) delays can reduce the frames/sec capacity in the absence of interference.

what I’d look for

Intermittent CCA delays due to other transmissions.

In series 1, there’s a tally of CCA failures kept in one of the AT registers.

Also monitor the retry counters.

Reduce UART rates from 115K to 56K or less. If any UART in the chain loses frame synch (start bit detection) there will be garbled data until there’s a pause in the byte stream.

Determine if there are retries for error correction in the '15.4 layer. Here’s the issue: if a '15.4 frame is received with uncorrectable bit errors, no ACK is sent. The sender times out the ACK and resends up to n times. This delay can cause the input buffer to overflow unless you use CTS or have an application program protocol for flow control.

Use a data source from a PC or some such instead of CAN, to see if the error is CAN specific.

ensure adequate received signal strength. If the '15.4 radios move, ensure no conditions exist for weak signals/fades or RF path occlusion.

I’d feel more comfortable attempting 60 packets per second sustained using series 1 than series 2, and the higher power of the PRO modules.

I’ll check those registers and see if they’re saying something interesting. Given that the RF speed is around 250 kbps and we’re sending 9.6kbps, I would hope that 96.16% available bandwidth for overhead should be sufficient :wink:

We also had this problem in the middle of the outback in Australia while doing the solar car race and we would just ask the driver to reboot the car while it was rolling down the highway. Out there it seems unlikely that there was a lot of radio interference. We also get this problem when the computer and the CAN transmitter are in the same room. Our building is a military quonset hut leftover from WWII, made entirely of sheet steel.

I know that there are no CAN errors, we also have a recording device in the car that dumps the CAN packets to an SD card in case of wireless telemetry failure. All of the packets on the SD card are OK.

Right now my inclination is to believe that the problem is synchronization drift on the UART between the PIC and the Xbee Pro. Our buffers on the PIC are plenty large enough to store many messages, and our overall bandwidth is relatively low. Decreasing the UART speed might help delay the problem by giving us more margin for error, but I doubt it will make it go away. That said, it’s a good experiment.

Why would you prefer using a Series1 module over a Series2 module? Also, it might be worth clarifying that we’re using the Xbee in transparent mode and where it decides to packetize can be different than the packets we are sending.

Cheers,

Sasha