Massively parallel (ARM?) system (5k-10k nodes)

jconradt · March 11, 2008, 3:11pm

Hi out there!

For a research project I am designing a massive parallel computing system. Not 100, not 1000, but rather on the order of 5.000 - 10.000 microcontroller/-processors that work in parallel. I do not need various interfaces (who wants 5000 USB ports anyways?) - but rather computing power on each of the nodes.

We are currently evaluating which microcontroller/-processor to use. We have defined our top priorities (in ascending order):

-) price (as we need 5000-10000, it should be <= 2…3 US$)

-) 32bit, maybe maybe maybe 16bit

-) high clock rate, >=70MHz, the more the better

-) internal SRAM >= 8Kb, better 16K or 32K or …

-) >= 2 serial ports, better 4

-) >= 1 SPI port

I might need to trade some of these features against others, obviously!

I guess I need a “small but fast” microcontroller without anything fancy!

Any ideas / suggestions? Currently we are looking at NXP2103 which is available for about 2.8US$ at 5000 pcs. But it’s short on memory and we’d prefer higher clock rate. We are not committed to ARM - so any other suggestions are welcome as well!

Where would I search for such quantities? I guess Digikey is not the first choice

Thanks for all ideas / comments!

Jorg

gdisirio · March 11, 2008, 4:37pm

You may consider the new ST family with Cortex-M3 core. As example the STM32F103, 3 UARTS, 2 SPI, USB, 72MHz, 20KB RAM. It is much more efficient in interrupt handling than the ARM7.

Not sure about prices.

leon_heller · March 11, 2008, 5:12pm

Another option is to use softcores in a large FPGA. It would save a lot of wiring and the inter-processor comms could be nice and fast. Expansion would be easy, just add more FPGAs. Designing your own core would mean that it could be optimised for such a highly-parallel system.

How are you going to program the thing? I designed systems based on the Inmos transputer many years ago; Inmos designed the Occam language specifically for parallel MIMD machines.

Leon

silic0re · March 11, 2008, 6:37pm

i have actually been thinking recently that it would be a fun and interesting project to try and implement a small Connection Machine-style SIMD machine with single-bit processors in a FPGA. it would also be a neat project to help learn verilog, since designing a single-bit processor seems relatively simple (it’s just essentially a complex single-bit serial ALU with some extra registers)

When I was designing the dsPIC cluster, it didn’t take long to realize that just in raw processing power (aside from the issues in designing a fast communications network between the processors/microcontrollers), the performance-to-cost ratio isn’t very large for general computational problems requiring a lot of math. For instance:

a dsPIC at 30MIPS requires about 100 clock cycles to emulate one floating point operation – bringing its performance to about 0.3 MFLOPS. A cell processor (in a PS3) using vectorized code has 6 cores that can perform 4 single-precision floating point operations per second each, for (in the ideal case) about 24 FLOP clock cycle, or (running at 2.4GHz) 57600 MFLOPS (this is discluding the onboard dual-core Power-PC processor). To achieve the same level of performance on (for example) a dsPIC cluster would take about 192000 dsPIC’s, assuming you could solve the problems of communication speed. At $3/dsPIC that’s a very large sum. For an ARM cluster (I don’t know a great deal about ARM processors yet), even assuming the processors ran at 80MIPS and could execute 1 FLOP/clock cycle, making them 80MFLOPS, you would still need 720 of them to achieve the same processing performance (ignoring the communications problems, again) of a single cell processor. At say $3 each, that would be about $2100 just for processors.

I am very, very much interested in making fun clusters of microcontrollers or FPGAs for fun, learning, and just to watch them work once you’ve built them, but unless you have a specific type of computing application that would be particularly well suited to such a cluster, using it likely won’t be quicker than a modern PC. On the other hand, you would get to build it, and that would be very fun!

jconradt · March 11, 2008, 7:18pm

Hi all!

Thanks for the initial replies so far! Indeed, we have a very special project in mind in which we simulate a large network that only requires nearest-neighbor connections. Hence a regular grid is sufficient. Looking at the cost of at least 5000 x 3 US$ (better 10000 x 3 US$) + some support electronic I guess this will be more than a “toy” - project.

I like the idea of using FPGAs; however, we need to have a small prototype (maybe 4x4 or 8x8 nodes) working quickly, ideally within this month. I have never used FPGAs, so I think this is no option for me — maybe later.

The STM32F103 looks great! The version with 10KB SRAM is roughly as expensive as the LPC2103; the 20KB SRAM is about 4 US$ @ 5000 (according to digikey).

I am happily looking forward to further suggestions!

J

seulater · March 11, 2008, 7:38pm

just curious… what would be the point for doing this ?

jconradt · March 11, 2008, 7:47pm

I am happily looking forward to further suggestions!

just curious… what would be the point for doing this ?

for this?

No, seriously… I am working in academia and we are interested in biologically plausible information processing (how brains make sense out of sensory perceptions). We have several simple distributed algorithms (check e.g. “Boltzmann-Machine” in Wikipedia) that we run on single computers - and even if we run them on a network in our institute we might have 20 or 50 computers, but it still takes a long time to compute.

With this project we want to show that the developed algorithms are absolutely parallelizable: no common memory, hardly any waiting - such that we can distribute them on a network of cheap microcontollers and get a response (an equilibrium state in the network) significantly faster as on any current computer.

If we succeed with such a test system we might receive funding for a large network of significant computing power, e.g. 16000pcs PXA320 at 800MHz — or at least LPC3180 at 200MHz.

Does this help your curiosity?

J

gdisirio · March 11, 2008, 8:32pm

VERY interesting.

I would try to pack as many mini-cores as possible on a single FPGA then create a scalable array of interconnected FPGAs as leon_heller suggested.

Would a SIMD architecture work for this application ? this would make things even easier.

andy4us · March 11, 2008, 9:25pm

Certainly for single quantities, em.avnet.com has better pricing for LPC chips. Doesn’t seem to be that much for the lpc2103, but a couple of bucks for the 2378.

Andy

silic0re · March 17, 2008, 9:55pm

hi jconradt,

has there been any news on your project? I would be interested to hear how you are making out!

jconradt · March 17, 2008, 10:29pm

Hi silic0re,

sorry, no big news so far. We have been busy looking into SIMD setups (eg by nVidia and AMD) - and as you (or someone else here) pointed out they seem to outperform a grid of microcontroller substantially. Both, in FLOPS and in low-budget. Well, we’ll see. We’ll be discussion advantages and disadvantages of both approaches shortly and take a go/no-go decision afterwards.

The microcontroller grid has a few advantages:

-) operates asynchronously

-) can generate “randomness” much easier compared to SIMD

-) is “infinitely” extensible, given money, space, and power

-) serves as multi-purpose machine, whereas SIMD typically all run the same code

But the price…

If we continue, we’ll probably build a small system of LPC2103 (just because I already know these microcontroller well) and see how well it works. By “small” I mean in the order of 8x8 or at most 16x16 initially.

Jorg

wiml · March 18, 2008, 5:46am

A kind of left turn— you could look into the IntellaSys SEAforth chips. ([a blogpost review here) They sound pretty interesting, I’ve seen prices quoted at $5 for 24 cores running at 1GHz, but I don’t know if they’re actually, y’know, physically available.](http://www.falvotech.com/blog/index.php?/archives/200-Forth-Day-Report.html)

silic0re · March 18, 2008, 4:32pm

Thanks for that link! The SeaForth chips look really neat!

It looks like they won’t be available to purchase for a little while ( http://www.intellasys.net/phpBB/viewtopic.php?f=10&t=67 ) while they migrate away from a BGA to a QFP. At least that’s good news!

It looks as though they do currently have a very limited number of USB demo modules that they will give to people/businesses with good applications for their technology.

Topic		Replies	Views
starting with avr / arduino Arduino	27	3798	September 28, 2011
Upgrading from ARM7 MicroMod arm-lpc	43	6733	February 25, 2010
which ARM for a newbie? MicroMod arm-lpc	26	6293	January 30, 2010
Why did you choose ARM? MicroMod arm-lpc	64	12922	September 23, 2009
FPGA - Xilinx or Altera New Product Ideas	20	5779	December 8, 2007

Massively parallel (ARM?) system (5k-10k nodes)

Related topics