NOTE: Hizook.com has transformed from Travis Deyle's homepage (now here) into a robotics-centric website. We're maintaining old posts (such as the one below) for the health of the internet. We encourage you to scope out the new Hizook!

 


 

Hopefully people aren't naive enough to fall for a "number of cores" competition, much like the Intel/AMD "number of Mega/Giga Hertz" competition a while back. Intel and AMD are now planning quad-core and 8-core chips for future desktop PCs, and I'm sure this will usher in a new era of performance computing. This is all well-and-good, but why a 1000 core processor?!?

OK, clearly a 1000-core chip isn't going to be used for general purpose computing (at least not at this time). But power-performance ratios are really the future of embedded and special-purpose computing. Consider the Asynchronous Array of Simple Processors (AsAP) project, which implements 36 cores (6x6 array) of 32-bit processors (9-stage pipeline, 54 32-bit instructions, and 16-bit datapath).

 

AsAP Overview  AsAP 6x6 Cores

 

AsAP processor operates at 475 MHz; and each processor dissipates 32 mW while executing applications, 84 mW while 100% active, and 144 mW worst-case at 1.8 V. Most of AsAP's area (66%) is for the core which is a high area utilization. Each processor occupies 0.66 mm2, which is more than 20 times smaller than the other traditional processors such as ARM. AsAP processor also achieves more than 5 times higher performance density and energy efficiency compared with others, as shown at below.


AsAP Justification

One really unique thing about the AsAP (at least from my perspective) is its asynchronous FIFO buffers that allow each core to talk to its surrounding ones despite the differences in clock rates/latencies (see this paper). In this fashion, a core is only active (consuming power) when it has data in its buffer to process. Each processor is programmed with a small code snippet (in C or assembly), which can be assigned (or auto-mapped) into the AsAP. Now for the really cool part. With 36 cores each with a small instruction set (54 possible instructions), 64 words of RAM, 128 words of ROM, you can do some really amazing things! 

  • FIR Filters
  • Signal Convolution
  • Sorting
  • CORDIC sin, cos, arcsin, arccos, arctan
  • Pseudo Random Number Generators (LFSR)
  • CRC Calculations
  • Huffman Encoding
  • FFTs
  • JPEG Encoders (9 cores, 224mW @ 300 MHz) (shown below)
  • A complete 802.11a/g wireless LAN base-band transmitter (22 cores, 407mW @ 300 MHz)


AsAP JPEG Encoder

All of these applications offer huge power and performance savings compared to other, traditional solutions (as explained in this presentation). The fully-functional, complete 802.11a/g wireless LAN base-band transmitter implemented in 22 cores at a mere 407 mW is perhaps the most impressive application to date! Check out the diagram below to see what each core is doing!



AsAP 22 Core 802.11 Transmitter

But wait, the title of this post mentions a 1000-core processor, not 36. Well, that is all just a matter of scaling. Additional cores can be (easily?) added into the array.

We have designed a 0.18 ?m CMOS chip that was fabricated during the summer of 2005 (the 36-core chip). Early testing in the fall of 2005 has shown it is fully functional! We believe it is the highest clock rate fabricated processor designed in any university. A 13 mm x 13 mm chip utilizing the exact same design in 90 nm CMOS would contain more than 1000 processors and be capable of more than 1 Tera-op/sec peak performance.

Just imagine what would be possible with 1000 cores! (see the image below) One could argue, "This can be done with FPGAs." And you'd be correct (in fact, they already have the AsAP architecture loaded into FPGAs for testing). However, having the processors in silicon gives you additional benefits.

  • Performance increases (FPGAs are limited because of general-purpose nature)
  • Power decreases (FPGAs logic is always active, the AsAP only runs when it has data)
  • Easy functional decomposition and mapping
  • Programming each core (and then auto-mapping) in C or assembly instead of Verilog and VHDL

 

I hope they work with a chip-fab to make some of these chips available to hobbyists. These chips would be really valuable in robotics, as you could activate and deactivate the different cores depending on your power budget (not to mention the power savings inherent with the AsAP design).



AsAP Applications

As a side-note, I almost worked with Dr. Baas on these project(s) for my PhD. Ultimately, I decided chip design wasn't my thing (at least not for a PhD, but it would have been great for a MS). However, my good friend Toney is doing some work on the project. Last I knew, he was working with the test setup, shown below. It should be obvious that this group at UC Davis does some amazing work. Good job guys!

 


AsAP Test Structure

You can find out more about the AsAP project here.