Massively parallel (ARM?) system (5k-10k nodes)

Hi out there!

For a research project I am designing a massive parallel computing system. Not 100, not 1000, but rather on the order of 5.000 - 10.000 microcontroller/-processors that work in parallel. I do not need various interfaces (who wants 5000 USB ports anyways?) - but rather computing power on each of the nodes.

We are currently evaluating which microcontroller/-processor to use. We have defined our top priorities (in ascending order):

-) price (as we need 5000-10000, it should be <= 2…3 US$)

-) 32bit, maybe maybe maybe 16bit

-) high clock rate, >=70MHz, the more the better

-) internal SRAM >= 8Kb, better 16K or 32K or …

-) >= 2 serial ports, better 4

-) >= 1 SPI port

I might need to trade some of these features against others, obviously!

I guess I need a “small but fast” microcontroller without anything fancy!

Any ideas / suggestions? Currently we are looking at NXP2103 which is available for about 2.8US$ at 5000 pcs. But it’s short on memory and we’d prefer higher clock rate. We are not committed to ARM - so any other suggestions are welcome as well!

Where would I search for such quantities? I guess Digikey is not the first choice :wink:

Thanks for all ideas / comments!

Jorg

You may consider the new ST family with Cortex-M3 core. As example the STM32F103, 3 UARTS, 2 SPI, USB, 72MHz, 20KB RAM. It is much more efficient in interrupt handling than the ARM7.

Not sure about prices.

Another option is to use softcores in a large FPGA. It would save a lot of wiring and the inter-processor comms could be nice and fast. Expansion would be easy, just add more FPGAs. Designing your own core would mean that it could be optimised for such a highly-parallel system.

How are you going to program the thing? I designed systems based on the Inmos transputer many years ago; Inmos designed the Occam language specifically for parallel MIMD machines.

Leon

i have actually been thinking recently that it would be a fun and interesting project to try and implement a small Connection Machine-style SIMD machine with single-bit processors in a FPGA. it would also be a neat project to help learn verilog, since designing a single-bit processor seems relatively simple (it’s just essentially a complex single-bit serial ALU with some extra registers) :slight_smile:

When I was designing the dsPIC cluster, it didn’t take long to realize that just in raw processing power (aside from the issues in designing a fast communications network between the processors/microcontrollers), the performance-to-cost ratio isn’t very large for general computational problems requiring a lot of math. For instance:

a dsPIC at 30MIPS requires about 100 clock cycles to emulate one floating point operation – bringing its performance to about 0.3 MFLOPS. A cell processor (in a PS3) using vectorized code has 6 cores that can perform 4 single-precision floating point operations per second each, for (in the ideal case) about 24 FLOP clock cycle, or (running at 2.4GHz) 57600 MFLOPS (this is discluding the onboard dual-core Power-PC processor). To achieve the same level of performance on (for example) a dsPIC cluster would take about 192000 dsPIC’s, assuming you could solve the problems of communication speed. At $3/dsPIC that’s a very large sum. For an ARM cluster (I don’t know a great deal about ARM processors yet), even assuming the processors ran at 80MIPS and could execute 1 FLOP/clock cycle, making them 80MFLOPS, you would still need 720 of them to achieve the same processing performance (ignoring the communications problems, again) of a single cell processor. At say $3 each, that would be about $2100 just for processors.

I am very, very much interested in making fun clusters of microcontrollers or FPGAs for fun, learning, and just to watch them work once you’ve built them, but unless you have a specific type of computing application that would be particularly well suited to such a cluster, using it likely won’t be quicker than a modern PC. On the other hand, you would get to build it, and that would be very fun!

Hi all!

Thanks for the initial replies so far! Indeed, we have a very special project in mind in which we simulate a large network that only requires nearest-neighbor connections. Hence a regular grid is sufficient. Looking at the cost of at least 5000 x 3 US$ (better 10000 x 3 US$) + some support electronic I guess this will be more than a “toy” - project. :wink:

I like the idea of using FPGAs; however, we need to have a small prototype (maybe 4x4 or 8x8 nodes) working quickly, ideally within this month. I have never used FPGAs, so I think this is no option for me — maybe later.

The STM32F103 looks great! The version with 10KB SRAM is roughly as expensive as the LPC2103; the 20KB SRAM is about 4 US$ @ 5000 (according to digikey).

I am happily looking forward to further suggestions!

J

just curious… what would be the point for doing this ?

I am happily looking forward to further suggestions!

just curious… what would be the point for doing this ?

for this? :wink:

No, seriously… I am working in academia and we are interested in biologically plausible information processing (how brains make sense out of sensory perceptions). We have several simple distributed algorithms (check e.g. “Boltzmann-Machine” in Wikipedia) that we run on single computers - and even if we run them on a network in our institute we might have 20 or 50 computers, but it still takes a long time to compute.

With this project we want to show that the developed algorithms are absolutely parallelizable: no common memory, hardly any waiting - such that we can distribute them on a network of cheap microcontollers and get a response (an equilibrium state in the network) significantly faster as on any current computer.

If we succeed with such a test system we might receive funding for a large network of significant computing power, e.g. 16000pcs PXA320 at 800MHz :slight_smile: — or at least LPC3180 at 200MHz.

Does this help your curiosity?

J

VERY interesting.

I would try to pack as many mini-cores as possible on a single FPGA then create a scalable array of interconnected FPGAs as leon_heller suggested.

Would a SIMD architecture work for this application ? this would make things even easier.

Certainly for single quantities, em.avnet.com has better pricing for LPC chips. Doesn’t seem to be that much for the lpc2103, but a couple of bucks for the 2378.

Andy

hi jconradt,

has there been any news on your project? I would be interested to hear how you are making out!

Hi silic0re,

sorry, no big news so far. We have been busy looking into SIMD setups (eg by nVidia and AMD) - and as you (or someone else here) pointed out they seem to outperform a grid of microcontroller substantially. Both, in FLOPS and in low-budget. Well, we’ll see. We’ll be discussion advantages and disadvantages of both approaches shortly and take a go/no-go decision afterwards.

The microcontroller grid has a few advantages:

-) operates asynchronously

-) can generate “randomness” much easier compared to SIMD

-) is “infinitely” extensible, given money, space, and power

-) serves as multi-purpose machine, whereas SIMD typically all run the same code

But the price… :frowning:

If we continue, we’ll probably build a small system of LPC2103 (just because I already know these microcontroller well) and see how well it works. By “small” I mean in the order of 8x8 or at most 16x16 initially.

Jorg

A kind of left turn— you could look into the IntellaSys SEAforth chips. ([a blogpost review here) They sound pretty interesting, I’ve seen prices quoted at $5 for 24 cores running at 1GHz, but I don’t know if they’re actually, y’know, physically available.](http://www.falvotech.com/blog/index.php?/archives/200-Forth-Day-Report.html)

Thanks for that link! The SeaForth chips look really neat!

It looks like they won’t be available to purchase for a little while ( http://www.intellasys.net/phpBB/viewtopic.php?f=10&t=67 ) while they migrate away from a BGA to a QFP. At least that’s good news!

It looks as though they do currently have a very limited number of USB demo modules that they will give to people/businesses with good applications for their technology.