13 November 2013

AMD reveals Kaveri details, release date: Can the first heterogeneous APU save AMD?

At long last, at its APU13 developer conference, AMD has announced that Kaveri, its first truly heterogeneous chip a CPU and GPU on the same die that can communicate directly and share the same RAM will be available on January 14,2014. The flagship part, dubbed the A10-7850K, will be clocked at 3.7GHz, feature four cores (two Steamroller modules) built on GlobalFoundries’ 28nm process, and will require the new FM2+ motherboard socket. There’s no word on cost, but AMD will probably price it just below a comparable Intel part.

HSA is awesome on paper, but in practice…

Despite anything else you might hear about Kaveri, the only feature that really matters is that it implements version two of AMD’s Heterogeneous System Architecture (HSA). If you’re looking for the gritty, low-level details of HSA 2.0, we’ve got you covered. If you don’t want to read thousands of words on the topic, here’s the basic gist. Kaveri has a CPU (two Steamroller modules) and a GPU (GCN 1.1, 512 cores/8 CUs) on the same piece of 28nm silicon. The CPU and GPU are connected via a HUMA (heterogeneous unified memory access) controller, giving them shared access to the same RAM and more importantly, the ability to pass memory pointers directly between each other.

AMD APU GPU size, relative to Intel’s latest efforts. I love this because AMD has clearly made the GPU look larger than 47%.

In the past, the CPU would have to perform a calculation, save it to memory, and then tell the GPU that the data was ready to be used then the GPU would have to copy the data from main system RAM into its own memory, perform some calculation, and output it to the display. With HSA 2.0 and HUMA, this process is massively simplified: There’s only one pool of RAM, and the CPU can just pass a memory pointer directly to the GPU, allowing the GPU to very quickly pick up where the CPU left off. It also works in both directions: Software can decide to use the GPU instead of the CPU, without worrying about additional complexity, latency, and so on.

In theory, this is a graceful and potentially very powerful setup. It gives software developers the freedom to use the right tool for the job, rather than simply using the CPU for almost everything. Where software such as web browsers and Photoshop have finally started to use the GPU for graphics-related tasks, AMD’s HSA 2.0 should allow for the wide-scale, easy, regular offloading of everyday tasks to the GPU. The hope is that, by leveraging the GPU, AMD can make up for the huge performance deficit between its CPUs and Intel’s. Back in early 2012, AMD engineers boosted APU performance by 20%, by compiling code to make better use of the GPU. That was with Llano or Trinity with Kaveri, the boost could be much larger.

AMD’s HSA 2.0, explained

In practice, due to AMD’s tiny market share, the real-world performance boost will probably be minimal. AMD is promising a better developer toolchain to try and drive HSA adoption, including a Linux GCC/HSA compiler, but we really have no idea how good this compiler is, or whether developers will bother to optimize their programs for HSA 2.0. It’s worth noting that HSA is an open standard that other components DSPs, ARM coprocessors, etc. can also piggyback on, but again it will probably be years before we see significant adoption. HSA is essentially AMD’s big play to try and circumvent Intel’s insurmountable lead in CPU performance but I fear it won’t be enough.

It’s worth noting that both the PS4 and Xbox One have APUs that are related to Kaveri, but it’s not known if they feature the full implementation of HSA 2.0 or not. It’s possible that there will be a trickle-down effect caused by games being developed for next-gen consoles, but it’s really too early to say.

Kaveri tech specs

Of course, there’s more to Kaveri than just HSA it’s also the debut of AMD’s new Steamroller CPU. Again, we’ve spoken about Steamroller in detail, but the basic gist is that it’s meant to be a lot faster. Best-case, though, we’re still probably only talking about Thuban (K10/Phenom II) levels of performance. On the CPU front, a quad-core Kaveri (two Steamroller modules) might just be able to keep up with a dual-core Core i7 from Intel.

AMD’s Steamroller CPU Architecture with several design tweaks to improve efficiency.

On the graphics side, AMD says that the A10-7850K will have 512 shader cores, which equates to 8 CUs. The architecture is “revised” GCN, or GCN 1.1 as it’s sometimes called. This roughly equates to the Radeon HD 7750, but the upgrade to GCN 1.1 should mean the inclusion of at least two ACEs (asynchronous command engines), and possibly two tessellation engines. With a stipulated clock speed of around 700-800MHz, there are still some doubts about whether main system DDR3 RAM will have enough bandwidth to feed the GPU, but we’ll have to wait for some actual benchmarks to see if that’s the case.

Kaveri also includes AMD’s new Mantle API, which could boost game performance significantly, and TrueAudio, which could improve game immersion.

There are a lot of coulds and shoulds in this story but over the last few years, that has unfortunately been AMD’s modus operandi. It’s not that AMD doesn’t have good ideas, it’s just that they don’t come to market soon enough and worse, they also tend to underperform. The matter of the fact is that, as with any human endeavor, the chip making industry is getting to the point where it’s very hard to keep up with the big boys. In just 12 years, we have gone from 19 different companies operating chip fabs at the bleeding edge (130nm), to just five that are capable of 22/20nm (Intel, TSMC, STM, Samsung, and GloFo). AMD is trying to stay in the game with HSA and other smaller efforts, but realistically it needs something magical to happen. Maybe Kaveri is that miracle, but I doubt it.

Now read:Setting HSAIL: AMD explains the future of CPU/GPU cooperation


Post a Comment

Get every new post delivered to your Inbox.


Copyright © 2018 Tracktec. All rights reserved.

Back to Top