12 October 2013

Probabilistic computing: Imprecise chips save power, improve performance




As the barriers to CPU scaling have risen with each successive node shrink, the number of scientists looking for alternate methods of driving higher performance and/or saving power has also steadily risen. We recently covered three of the most intriguing areas for boosting compute performance, including adopting new methods of CPU cooling, new semiconductor manufacturing technologies, and the use of entirely new types of CPU cores. One other idea — and something we’ve touched on before — is the idea of using circuits that are intentionally manufactured to be inexact and imprecise; circuits that are deliberately allowed to get things wrong, some of the time.

That’s exactly the opposite of how computers are typically built. Semiconductors today are manufactured to tolerances of a nanometer or less, fab air quality is controlled to the point where contaminants are measured in parts-per-trillion, and we’re working on building chips using wavelengths of light just 40nm wide. But it’s precisely because manufacturing to such tight tolerances is so difficult that scientists are working to find ways to build chips that can handle failure gracefully — and in some cases, even embrace imprecision.


From an earlier project by Rice University. The chip on the right uses 1/15 the power of the chip on the left.

Christian Enz, the new Director of the Institute of Microengineering, believes that the time for “good enough” is now, and is pushing research into the new field. According to Enz, “the ‘good enough’ approach has been getting some traction in the corporate sector, because chipmakers can’t see any real alternative. Intel, for example, is interested in ‘good enough’ engineering. In addition, there are teams of research scientists working on it all over the world.”

Perfect imperfection

First, the good news. If an application can tolerate “3.14″ as opposed to “3.14159265358,” you can save quite a bit of power in computation. In some cases, you may be able to improve performance by leaving off significant figures, though this is highly application dependent. Power savings is the major goal — and simplified circuit design. Today, a great deal of work is done to ensure that circuits will return the proper results every single time. Since chips are extremely complex and defect densities are notoriously difficult to control, engineers compensate with additional circuitry that adds die size and reduces the performance and power consumption benefit of moving to a smaller process node.

Start jumping into processor manuals, and you quickly find evidence that this process doesn’t always work properly. Here’s a few examples, drawn from the CPU manuals of analyst Agner Fog:
Intel’s Ivy Bridge prefetches one instruction every 43 cycles. Sandy Bridge prefetches two instructions per clock cycle.

AMD claims Bobcat and Jaguar can decode up to 32 bytes per clock cycle. In reality, the two chips top out at 16 bytes per clock cycle.
Intel’s 45nm Atom, which debuted in 2008, has always had an FPU bug. Two consecutive FPU instructions fail to pair and instead are executed with a delay of one clock cycle between each instruction.

AMD’s Bulldozer and Piledriver handle 256-bit AVX instructions very poorly. Piledriver has a latency of 17-20 cycles when storing 256-bit AVX instructions.

And these are just a handful of the high level problems. Both Intel and AMD errata manuals contain pages of documentation on even lower-level bugs that occur in every version of a chip.

How to build an intentionally imprecise computer chip

One way of fixing these bugs is to intentionally create imperfect designs, but where the imperfections are placed extremely carefully in areas where humans can control the final output. The problem with allowing imperfection is that chips have to be capable of detecting the difference between a “good enough” answer and a wrong answer. There are a huge number of areas — GPS navigation, autopilots, robot-assisted surgery, spreadsheets, and scientific computing, for example — where “good enough” simply isn’t an option. Crucially, however, there are an equal number of areas where “good enough” might be just fine. Audio/video playback, web browsing, gaming, and other casual uses all involve trade-offs where people might choose good enough as a way to improve battery life.



Intel, as Enz notes, has explored the idea of a variable FPU that can drop to 6-bit computation when a full eight bits isn’t required. This kind of power saving option relies on circuit gating to be able to save power by shutting parts of an individual function unit off effectively, rather than power gating a larger element, but it’s emblematic of the approach. I suspect we’ll start to see this kind of approach soon, driven by coprocessor development from companies like Apple and Google, both of whom have been talking up the additional chips inside their latest SoCs.

How much fidelity does Apple’s M7 or Motorola’s X8 need when detecting a single voice command to activate? How much accuracy are consumers willing to trade if the end result is a 10-15% increase in battery life when operating in that mode? Finally, just how much accuracy can be traded for better power consumption?

What’s likely to happen is that we’ll see a GE (Good Enough) core implemented alongside conventional chips, with the conventional ARM/x86 CPU handling all fallback computation. There needs to be a flag for programmers to trip in order to tell the chip “Perform this on the main CPU” to ensure that accurate computation is always available. But, provided that such particulars can be developed, there’s real promise to this kind of design for many of the basic functions that people perform with a smartphone or tablet.

The work done by Enz is a continuation of the Rice University research we’ve also reported on, and a further refinement of the techniques used there, with a long-term goal of bringing these capabilities to shipping silicon.


The flesh-and-blood, created-by-evolution human brain is a prime example of “good enough” computation.

Imitating nature

Although this is not my general field, it’s worth noting that biological systems are nearly by definition designed to be “good enough.” The myth of humans as perfectly rational figures is blown apart by the enormous list of cognitive biases and misconceptions we fall prey to, or summed up simply in the old adage: “Common sense isn’t very common.” Nevertheless, there’s a larger point — human brains work extremely quickly, and draw very little power when doing it. Part of the reason is because brains are excellent at three tasks: pattern matching, extrapolation (the projection of known values into the future), and interpolation (finding values that fit within other known values). All of these are often formed in “good enough” terms.

The ability to be gracefully wrong is part of the brain’s underpinnings. We can tolerate a high signal-to-noise ratio while still seeing patterns or events in the visual field. Duplicating some of that ability in silicon could be critical to designing computer systems of the future that use less power, while still being capable of delivering conventional, full-fidelity results.

Now Read: Computer scientists develop new approach to sort cells up to 38 times faster

0 comments:

Post a Comment

Get every new post delivered to your Inbox.

 

Copyright © 2015 Tracktec. All rights reserved.

Back to Top