• Happy holidays, folks! Thanks to each and every one of you for being part of the Tom's Hardware community!

AMD: Kaveri is Still a Go

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


Integration of at least part of the south bridge doesn't seem far-fetched to me. I do agree that it seems rather unlikely that Intel will basically screw over much of the desktop industry by not making removable CPUs anymore.
 
This is why you don't believe everything you read on the internet. All these rumors were just that rumors anyone can start a rumor but actually linking that rumor to fact is another story altogether. So unless it's confrimed by the company with a source behind it then don't read into the fluff, because that's what rumors are nothing more then fluff.
 
I'm still waiting for a good 1600x900 or better AMD-powered Win8 tablet to hit the market. The WinRT tablets are currently underpowered with Tegra 3 SoCs, and I'm not interested in heavy Core i5 tablet with noisy fans either. There are the Atom SoC-powered Win8 tabs, but I'd rather have the better graphics performance from an AMD solution, not to mention the lower price.
 

If you'd be so kind, I'd love to hear about this - I've seen you discuss the cache problem before, but not about ZRAM and TRAM. Also, is this not an issue that can be fixed by simply overclocking the CPU-NB (or would these alternative technologies have added benefits as well)?
 
[citation][nom]mousseng[/nom]If you'd be so kind, I'd love to hear about this - I've seen you discuss the cache problem before, but not about ZRAM and TRAM. Also, is this not an issue that can be fixed by simply overclocking the CPU-NB (or would these alternative technologies have added benefits as well)?[/citation]

Overclocking the CPU/NB frequency helps, but it does not fix the problem. It improves bandwidth and latency by a percentage maybe roughly equal to the percentage of frequency increase. Intel's L3 cache is already something like three to five times lower in latency, slightly lower than AMD's L2 cache in latency.

Z-RAM is technically a little slower per cell than SRAM, but it is incredibly dense- about twice as dense as even eDRAM. TRAM is pretty much equal in performance per cell to the SRAM used in the main CPU caches, but also much denser, about half as dense as Z-RAM.

You could literally say make AMD's L3 out of Z-RAM(cutting L3 cache die area by like a factor of ten) and make the L2 out of TRAM (cutting area by like a factor of five). Total CPU die area (including cores and such) could easily be well under 50% of current die area. That's a whole lot of power saved. The greatly decreased distance between any given cache cell and the CPU modules could easily decrease latency enough for Z-RAM's decreased overall performance to be counter-acted and TRAM could improve performance, all while cutting costs and power consumption by huge margins.

AMD could use some serious performance improvements beyond this, maybe their cache controllers suck or something and that's what' their issue is (or at least one of them), but this would be a great help at least in power efficiency. Combine that with Steamroller's improvements and the process shrink to 28nm (at least on Kaveri IIRC) and you could end up with an incredibly small die, or enough room for some serious architectural improvements beyond even fixing the front end. It's not a bad design at all, but simply beefing it up like Intel is doing (Haswell is still the same basic architecture as Core 2 as are all of Intel's other main architectures between Core 2 and Haswell, just with some beefed up and improved components here and there with a vastly improved cache system) wouldn't hurt.

For example, maybe throw in an extra ALU in each core and/or increase L1 data cache capacity (16KiB is really not much for that and might be a bottle-neck). Beefing up the FPU wouldn't hurt either. Instead of regular FPUs, something like culminating the Fusion initiative by implementing a modified GPGPU for the FPU functionality of the CPU would work out for that.

There's so much that AMD can do, IDK why they are spreading themselves so thin lately and not putting some focus into markets that they could do far better in. Since AMD seems intent on not making hand-designed die masks for their CPUs anymore, it shouldn't even be all too difficult to make significant changes like all of that. They're letting automated tools do a lot of the designing, so they should get something done.

Back on the cache - AMD might also be able to use some of the extra space to make larger cache interfaces for much higher bandwidth. I don't know what to do about the latency, but bandwidth should at least be something more easily fixed, especially since it is something that you can literally just throw more transistors at to improve (transistors would be something that AMD would have plenty of spare space with even with all of the above considered).
 
Thanks for the awesome explanation, mate. It feels like there's so much more AMD can be doing to push their products past Intel's, but we're not really hearing anything about it. I'd really love to see some of this newer technology implemented by AMD, for them to be the ones making advancements; I can't imagine their management wouldn't want that either, so I can only assume they simply lack the manpower or funds to do more research on this.
 
AMD has brilliant ideas, they just lack the funds!

What they need is better product marketing and more impressive boardroom presentations.
 
As far as I remember, Steamroller is getting the larger, improved L1 cache plus the trace cache. Excavator is getting more IPC along with the high density library treatment which is really where AMD can make some money back.

I looked at Scali's comments on Steamroller's improvements and he seems to think that a faster, better L1 cache would make more sense than merely duplicating what you have, unless you can't improve its speed enough to make a difference. Also, by adding to the decoder hardware, you are technically removing some of the sharing which is very prevalent in xDozer and thus going slightly against CMT.
 


I don't think that management has a clue, hence the screw ups. Examples include sending out Bulldozer as a finished product despite it really being a proof of concept more than a true modern architecture, making far more Llano APUs than there were chipsets for motherboards (meaning that once they had high supply of the Llano APUs, they couldn't sell them much anyway for a while!), ditching top-notch engineers for a bunch of far less skilled engineers to stop doing hand-made transistor-by-transistor designs that have been proven to just about always have better performance than automated designs (compare Phenom to Phenom II for a fairly recent example), and much more.

Management might want better products, but they don't seem to understand how to get to that point. AMD is going all over the place right now, so they have to have money to work with. For example, getting tablet/smart phone APUs out and more in the works, getting into the memory market, getting into the software optimization market, getting into the SSD market, and so much more. The R&D money has to come from somewhere. Maybe AMD would do better if they completely revamped their entire business strategy as a whole.
 


I agree with improving the cache, including the L1 cache and all that.

Adding to the decoders and such may be technically reducing sharing, but it seems that some things are better off not being shared. I wouldn't say that it goes against AMD's CMT as much as it rethinks how to implement it for improvements.
 
They could do what Intel back in 2005 when Intel release the Pentium D because they didn't have a competitive product against the pending athlon 64 X2 release. Slap 2 Trinity A10-5700 together and have a product with 8 cores / 768 stream processors and 125W TDP (or less). Brand it A10-FX and sell it for $200-$250. CPU performance would be at least competitive (i.e. in the ballpark) with Intel while GPU performance would be phenomenal for an integrated solution (most older games in 1080p with medium details should be playable). Also, it'd finally have enough power to run OpenCL applications effectively.
 
Ah, but you'd end up with Hybrid Crossfire and we know how that doesn't always work out. :) Also, you'd need that fatter memory interface to avoid choking the thing to death... or throw stacked memory on board.
 
The 2 die would be on the same package, connected by HT 3.1 or something faster, so not Hybrid Crossfire unless a discrete card is installed... Though you would be right about needing more memory bandwidth.
 
[citation][nom]Amen2That[/nom]They could do what Intel back in 2005 when Intel release the Pentium D because they didn't have a competitive product against the pending athlon 64 X2 release. Slap 2 Trinity A10-5700 together and have a product with 8 cores / 768 stream processors and 125W TDP (or less). Brand it A10-FX and sell it for $200-$250. CPU performance would be at least competitive (i.e. in the ballpark) with Intel while GPU performance would be phenomenal for an integrated solution (most older games in 1080p with medium details should be playable). Also, it'd finally have enough power to run OpenCL applications effectively.[/citation]

That would be interesting... However, gaming is still very limited by performance per core and that is something that it probably wouldn't do well in. It'd probably be around an FX-8150 in performance. The graphics would only be around a Radeon 7750 or 7770 overall, but it'd be good for what it is there.

[citation][nom]silverblue[/nom]Ah, but you'd end up with Hybrid Crossfire and we know how that doesn't always work out. Also, you'd need that fatter memory interface to avoid choking the thing to death... or throw stacked memory on board.[/citation]

It wouldn't be hybrid Crossfire at that point because they'd be identical GPUs and VLIW4 Crossfire works pretty well, so it would probably scale very well in most modern games and stutter and such should be very low if they're on the same die. Even better, on the same package, you might be able to treat it as a single GPU instead of a Crossfire setup. At that point, giving it CF compatibility with Radeon 77xx would probably make sense as a far superior Hybrid Crossfire option compared to current FM2 Trinity.

You'd have four DDR3 channels, so it should be okay so long as they're all enabled. It shouldn't be any more memory-bandwidth-constrained than a single A10-5700.
 
the sooner the new xbox and ps4 come, the better. (both rumored to use amd parts).

Im sure millions of those being sold every year would really help AMDs bottom line
 
[citation][nom]Amen2That[/nom]The 2 die would be on the same package, connected by HT 3.1 or something faster, so not Hybrid Crossfire unless a discrete card is installed... Though you would be right about needing more memory bandwidth.[/citation]
That's true. Still, without the expanded memory interface or the stacked memory (double it if you're talking two chips), it'll be dead in the water.
 
[citation][nom]silverblue[/nom]That's true. Still, without the expanded memory interface or the stacked memory (double it if you're talking two chips), it'll be dead in the water.[/citation]

Each die already has a dual-channel DDR3 memory controller setup. Just use both of them for a quad channel DDR3 memory system like how Interlagos uses two Valencia dies that each have dual-channel memory to get quad-channel memory. You wouldn't need any exotic memory, although it'd probably help if you did have something else.
 
[citation][nom]silverblue[/nom]Stop me if I'm wrong here, but wouldn't the board need to be custom to handle the extra number of memory channels?[/citation]

Oh sure, you'd need a different board (and socket), I'm just saying that you wouldn't need any special type of memory. Given the nature of what Amen2That said, I thought that a new socket was implied all along.
 
[citation][nom]TeraMedia[/nom]Micro-servers: Is that a drobo or something? Or a blade? How does AMD excel in this market?Gaming devices: XBOX 720 GPU - check. PS4 GPU - check (I think?). Wii U - who cares? AMD owns parts of this, for now.Industrial (embedded) solutions: if this is for imaging technologies such as MRI, then with GCN that makes a lot of sense. But this is not a huge market and cannot support a company as large as AMD without shared use of work products. They can only be successful with this if they develop the technology first for something else, and then re-apply it here.Communications: smartphones? They're behind, and falling further behind. They need to find a manuf that can provide competitive power consumption levels.Tablet APUs: Also falling behind, except for Windows 8 (non-RT) tablet applications. They need to find a manuf that can provide competitive power consumption levels.[/citation]
cmmunications != smartphones
 



oh.. well, that idea sucks... looks like intel's desktop processor will become a laptop pocessor... i hope amd should create a processor capable of superior single threaded performance, because piledriver architecture is capable of multi-threaded performance, very low power consumption..and of course fast!!
 

Yeah now that you mention it, actually could do some integration, perhaps the networking component, for example. Can't think of anything else.
 


SATA, USB, and more could be integrated. For example, on LGA 1155, unless I'm mistaken, all of the chipsets have four SATA 3Gb/s ports, so the SATA 3Gb/s interface could be integrated. The USB ports could also be integrated. If Intel finally goes to an all SATA 6Gb/s configuration like AMD did a while ago, that could be thrown in in place of the SATA 3GB/s controller. The south bridge's PCIe x4 slot can also be integrated. I'm sure that there's more too.
 
Status
Not open for further replies.