LPC1759 creating a parallel port ideas

I have connected a 18bit LCD to the LPC1759. In NXP’s wisdom (Ya right), they decided to create their fastest part without a address and data bus. Furthermore, the I/O pins are skipping in bits. For example, it would have been great if i wanted to use Port1 that all 8 bits of port 1 were there. instead its p1.0, p1.1,p1.4,p1.8,p1,9… you get the idea. so when connecting a 8or 16 bit device to these i/o pins makes it extra work to bit twiddle/fiddle to the it right. OK, so i figured that since this part is 120MHZ that it should be no real issue if i take and twiddle/fiddle all 18 bits to the LCD.

I just got it running now, an i am NOT happy with the speed.

I was wondering if maybe there was another way, better or faster way i can minipulate the bits to these pins better than the way i am doing it now. Here is the call for the way i spit data out to the port pins now.

FYI, my 18 bit LCD is tied into port pins as follows.

PORT PIN — Data Bit

P1.0 0

P1.1 1

P1.4 2

P1.8 3

P1.9 4

P1.10 5

P1.14 6

P1.15 7

P1.18 8

P1.19 9

P1.20 10

P1.22 11

P1.23 12

P1.24 13

P1.25 14

P1.26 15

P1.28 16

P1.29 17

Here is the code to twiddle / fiddle the data to the correct pins.

void Update_LCD_PORT(unsigned int data)
{
        // Bring all Data Pins Low First
        FIO1PINL = 0x0000;
        FIO1PINU = 0x0000;

	if(data & 0x0001)
		FIO1SET0 |= (1<<0);
	if(data & 0x0002)
		FIO1SET0 |= (1<<1);
	if(data & 0x0004)
		FIO1SET0 |= (1<<4);

//*******************************
	if(data & 0x0008)
		FIO1SET1 |= (1<<0);
	if(data & 0x0010)
		FIO1SET1 |= (1<<1);
	if(data & 0x0020)
		FIO1SET1 |= (1<<2);
	if(data & 0x0040)
		FIO1SET1 |= (1<<6);
	if(data & 0x0080)
		FIO1SET1 |= (1<<7);

//*******************************
	if(data & 0x0100)
		FIO1SET2 |= (1<<2);
	if(data & 0x0200)
		FIO1SET2 |= (1<<3);
	if(data & 0x0400)
		FIO1SET2 |= (1<<4);
	if(data & 0x0800)
		FIO1SET2 |= (1<<6);
	if(data & 0x1000)
		FIO1SET2 |= (1<<7);
//*******************************
	if(data & 0x2000)
		FIO1SET3 |= (1<<0);
	if(data & 0x4000)
		FIO1SET3 |= (1<<1);
	if(data & 0x8000)
		FIO1SET3 |= (1<<2);
	if(data & 0x10000)
		FIO1PIN3 |= (1<<4);
	if(data & 0x20000)
		FIO1PIN3 |= (1<<5);


}

seulater:
I was wondering if maybe there was another way, better or faster way i can minipulate the bits to these pins better than the way i am doing it now.

All of those 'FIOSET |= ...' statements should just be 'FIOSET = ...'

FIOSET assignments only modify bits that are 1’s. That will at least save all of the unnecessary reads that you are doing. NOTE: that doesn’t apply to your FIOPIN statements. I’m not sure why you are doing those two differently from the rest.

Additionally, I suspect you would be better off building the set of bits in a temporary variable for each of your four groups and just having a single FIOSET assignment for each group.

All of those ‘FIOSET |= …’ statements should just be ‘FIOSET = …’

i did correct that right after i posted it. Thanks.

NOTE: that doesn’t apply to your FIOPIN statements. I’m not sure why you are doing those two differently from the rest.

with FIOPINL and PIOPINH you can do all 4 groups at once.

I also tried the variable, there was no noticeable difference. I will look at it with the scope.

seulater:
I also tried the variable, there was no noticeable difference. I will look at it with the scope.

What sort of speed were you expecting? What sort of speed are you seeing?

I am using the same screen with the LPC2148 and though its slower than the LPC1759, the 2148 is much faster. With the 2148 i cannot even see the screen refresh to different colors, but with the 1759 i can. With the 1759 being so much faster i expected more.

seulater:
With the 1759 being so much faster i expected more.

There are many more factors that affect the relative performance of an LPC2148 and an LPC1759 than just max clock speed. If I were you I'd write the simplest program that timed how long it took to toggle a pin on both systems. If the LPC1759 is significantly slower I'd then a) check the code generated by the compiler b) double-check the startup configuration. If the LPC1759 is faster that would imply the problem lies elsewhere.

There is no need, the LCD on the 2148 is wired up so that i dont need to manipulate the data to the LCD like i need to do with the 1759. It this routine that is slowing things down considerably. I am just looking to see if there might be some other way to manipulate the data that i have not thought of already.

seulater:
It this routine that is slowing things down considerably.

What makes you so sure that is the cause? You certainly haven't succeeded in convincing me that is true from the information you have provided us so far. That is why I proposed the tests.

What makes you so sure that is the cause?

if i comment out the twiddling section the CS signal goes from 1.8us to 309ns

You certainly haven’t succeeded in convincing me that is true from the information you have provided us so far. That is why I proposed the tests.

Dont want to be rude, but my post is not about trying to figure out why this part is slower, or anything like that. There are allot of details i know and its not worth posting here because i have already nailed it down to that routine is what is slowing it all up.

I am just looking to see if there is another alternative to that routine that would be faster that what i have already done. I have used the LPC2148 extensively. using the FIO on either part is very fast. In fact its even faster on the 1759. The only difference is the way the LCD is wired up on either part. There is no twiddling on the 2148.

I do thank you for trying to help, but i need the help with the code.

ok this new edition gets it from 1.8us to 750ns.

IF anyone can think of an even batter way i am willing to try.

void Update_LCD_PORT(unsigned int data)
{
	unsigned int a=0;

	if(data & 0x0001)
		a |= 0x0001;
	if(data & 0x0002)
		a |= 0x0002;
	if(data & 0x0004)
		a |= 0x0010;
//*******************************
	if(data & 0x0008)
		a |= 0x0100;
	if(data & 0x0010)
		a |= 0x0200;
	if(data & 0x0020)
		a |= 0x0400;
	if(data & 0x0040)
		a |= 0x04000;
	if(data & 0x0080)
		a |= 0x8000;

		FIO1PINL = a;

   		a=0;
//*******************************
	if(data & 0x0100)
		a |= 0x0004;
	if(data & 0x0200)
		a |= 0x0008;
	if(data & 0x0400)
   		a |= 0x0010;
	if(data & 0x0800)
		a |= 0x0040;
	if(data & 0x1000)
		a |= 0x0080;
//*******************************
	if(data & 0x2000)
   		a |= 0x0100;
	if(data & 0x4000)
   		a |= 0x0200;
	if(data & 0x8000)
   		a |= 0x0400;
	if(data & 0x10000)
   		a |= 0x1000;
	if(data & 0x20000)
   		a |= 0x2000;

		FIO1PINU = a;

}

seulater:
ok this new edition gets it from 1.8us to 750ns.

IF anyone can think of an even batter way i am willing to try.

You could combine contiguous groups together e.g.

   if(data & 0x2000)
         a |= 0x0100;
   if(data & 0x4000)
         a |= 0x0200;
   if(data & 0x8000)
         a |= 0x0400;

Will simplify to something like:

   a |= (data & 0x0E000) >> 5;

Chris, now that’s what i am talking about !

I implemented your idea, and it also got me thinking that i could also use it for non contiguous groups as well. That would allow me to get rid of the “if” statements all together.

So i changed the code to this:

void Update_LCD_PORT(unsigned int data)
{
	unsigned int a=0;
   	unsigned int b=0;

	// Lower 8 Bits
	a |= (data & 0x0003);
	a |= ((data & 0x0004) << 2);
	a |= ((data & 0x0038) << 5);
	a |= ((data & 0x00c0) << 8);

	// Upper 8 Bits
	b |= ((data & 0x0700) >> 6);
	b |= ((data & 0x1800) >> 5);
	b |= ((data & 0xe000) >> 5);
	b |= ((data & 0x30000) >> 9);

	FIO1PINL = a;
	FIO1PINU = b;

}

This new code took me down from 750ns to 350ns!

Thanks Chris!

seulater:
This new code took me down from 750ns to 350ns!

That's good! As you have discovered (two brains are better than one!) it is actually the similar shifts that can be grouped together. You can shave off a few more nanoseconds by combining:
	b |= ((data & 0x1800) >> 5);
	b |= ((data & 0xe000) >> 5);

into

	b |= ((data & 0xF800) >> 5);

already caught that :wink:

Thanks again.

seulater:
already caught that :wink:

Have you already caught this one as well? :wink:
   // Lower 8 Bits
   a = (data & 0x0003) |
      ((data & 0x0004) << 2) |
      ((data & 0x0038) << 5) |
      ((data & 0x00c0) << 8);

   // Upper 8 Bits
   b = ((data & 0x0700) >> 6) |
         ((data & 0x1800) >> 5) |
         ((data & 0xe000) >> 5) |
         ((data & 0x30000) >> 9);

   FIO1PINL = a;
   FIO1PINU = b;

You may be able to get rid of a and b altogether and assign directly to FIO1PINx.

Oooohh. i like it. I gave that a try but it increased it from 350ns to 380ns.

i also tried this too but with the same results of 380ns.

	// Lower 8 Bits
	FIO1PINL = ((data & 0x0003) |
		((data & 0x0004) << 2) |
		((data & 0x0038) << 5) |
		((data & 0x00c0) << 8));

	// Upper 8 Bits
	FIO1PINU = ( ((data & 0x0700) >> 6) |
	    ((data & 0xf800) >> 5) |
	    ((data & 0x30000) >> 9));

seulater:
Oooohh. i like it. I gave that a try but it increased it from 350ns to 380ns.

i also tried this too but with the same results of 380ns.

Looking at the code generated there is not much to choose between them - the difference in time might be due to other subtle factors. However, now that you are just working with Pin1.x connections you can just assign all 32 bits in one go can't you? I would expect that to be faster than two separate 16-bit assignments:

i.e. FIO1PIN = …

I never tired the FIOPIN idea, this is my own dam fault. I was reading the Users Manual, and on page 126 they dont list a FIOPIN register so i never tried that. They only listed the following registers.

FIOxPIN0

FIOxPIN1

FIOxPIN2

FIOxPIN3

FIOxPINL

FIOxPINU

Looking more into it, the compiler does have defines for FIOPIN, so i gave it a try.

using “FIO1PIN = (b << 16) | a;” took it from 350ns to 310ns.

i then wondered if i were to shift the bits in b before i sent it to FIO1PIN, so i changed b to this:

	// Upper 8 bits on data bus, but shifted up 16 bits for FIO1PIN
	b = ((data & 0x0700) << 10) |
		((data & 0xf800) << 11) |
		((data & 0x30000) << 7);


	FIO1PIN = b | a;

in doing this it took it down to 300ns, shaved off another 10ns.

strange, i would have made a bet that doing “FIO1PIN = (b << 16) | a;” would be more efficient than doing them in b.

Again, great ideas and thanks. so with you ideas i went from 1.8us down to 300ns.