AMD Hints at Possible ARM Partnership

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
[citation][nom]lradunovic77[/nom]Fusion turned out to be utter crap as many reviews showed, something i said it will be a crap before it was even released. If i was AMD or Intel i would slap Athlon XP into mobile device or let's say Intel Pentium IV. With the current technology they can make those cpus so small and power draw couple W.x86 is future for mobile market. Why re-invent the wheel when wheel is already there and plenty to choose from!!!Having x86 cpu in mobile device you get limitless options.ARM -> CRAP, Nvidia is wasting their time with it.[/citation]
1) that was no fusion, that was CPU+GPU, and for what intended to do, it did very well. It did not intend to be the new flagship.
2) Tic-Toc of Intel: same tech + new process = some thermal benefits but similar performance (except for netburst 31 stades 90 nm jump, but that was P4 era, it was bad for all of us); new tech + same process = some thermal benefits also, but better performance = better IPC overall in comparison to just shrink.
3) Reinvent the wheel because that x86 wheel is designed to produce the most completed instructions per thread, bloated as some other readers said, not needed for running the mobile OS, and the heavier instructions will go to the GPU anyway. Also the GPGPU trend is bringing SIMD instructions to be more efficiently executed there with proper recompile.
 
p.s. Nvidia did try to get x86, even if they won't admit it. They failed due to licensing problems. Their Via partnership would not work unless Via was completely under Nvidia, and in that case, change in management and ownership, Via would loose the x86 license. Also Via or AMD can't resell the license, only Intel does.

Intel already has ARM license, AMD has not yet. But Intel owns x86 and has a tight grip on it, controlling competition in the good spirit of corporation politics to maximize profits. Both Intel and AMD can go as simple as add an instruction set recognizing ARM instructions to their CPUs. It can be done, but Intel is not a fan of it, as it cuts down the value of x86 even further. Currently it is going down with the mobile OSes being good enough for casual usage, both productivity and entertainment, with proper big screen and keyboard mouse, (being an OS designed for mobile, but working everywhere, like the Windows 8 worst scenario predictions). That is why Intel is porting Android to x86.
 
My vision of fusion lays with ARM more than with x86.
x86 has a strong presence in single thread executing speed, but has hit it's limit long ago and multi-core CPUs are the proof for it. Programmers are implementing ways to take full advantage of multi-core, and are getting to the fully scalable implementations that take advantage of every single core available. In that spirit, GPGPU already produces more completed instructions per watt than x86. So, if the hard work can be put to the GPGPU and the CPU will just be overlooking it, ARM got it's edge.

Execution units can be shared between front ends for ARM (or any other basic instruction set) and front ends for graphics. For example clusters of 8 to 16 of execution units might have 1 or 2 front ends for CPU instructions and 4 to 16 front ends for graphics instructions. If the same cluster is active for both CPU and GPU, the instructions will be concurrently using the execution units, like Intel's Hyper Threading. But on a desktop fusion processor, clusters would be shifted dynamically on what they would be processing. A 128 cluster processor could have 4 clusters doing CPU work, 60 clusters rendering graphics, and 64 clusters sitting idle. The front ends and back ends not used would be switched off to save power in active clusters, while the idle clusters would be switched off completely (C stades). And then, when another workload appears, in a fraction of a second, the processor could switch to 30 clusters doing CPU work and 10 clusters rendering the Aereo desktop and running program. The simple tasks that can't be multi-threaded to this extreme would still use very little resources anyway and be good with it. The complex tasks would be optimized to use all the virtual CPUs available in fully scalable fashion.

That is how I see the future of fusion. In this view, both Intel and AMD are on par, but Intel is trying to maximize the value of x86, because it can and because is good for the bottom line. Maybe Intel is even a few steps ahead but does not wish to disclose what has managed. Larabe was a failure of a GPU, but what Intel has learned from it is that x86 is not for GPU or even GPGPU. There is little or no need for SIMD on each core (with proper scalable programming, not the traditional way) if you got 32+ cores available. Intel got all the experience needed, has all the licenses needed, and will be ready to enter the market when AMD+ARM and Nvidia+ARM will have struggled enough to create such market.
 
ARM has some pretty big limitations when you start playing with large amounts of memory and multiple I/O bus's. It was an architecture designed for small tight dedicated devices, the OS must be closely coupled with the HW. It also has glaring issues once you start scaling the CPU count. Now that doesn't me an that someone like AMD couldn't put an "ARM decoding engine" into the CPU that allows it to process ARM instructions. It would be trivial for them to extend their instruction set to include ARM like instructions. It's already been done several times with MMX / SSE / SSE2 / 3D Now / Padlock and so forth. The biggest example is when they designed their own 64-bit instruction set and made it into an extension of the x86 set. Intel tried for a year to duplicate this but ultimately failed and ended up cross-licensing EMT64 from AMD, in return AMD is given unrestricted access to x86. Via's license comes from its purchase of Cyrix. Intel has to walk a very fine line about controlling the x86 "compatible" ISA. If it tries too hard to keep people out then it'll have another anti-trust lawsuit thrown at it. Their strategy has been to make it unprofitable for anyone else to step in, Nvidia ~could~ of got a license from Intel if they really wanted to. With the way the market is, it would not of been profitable enough to be worth their time, so instead their going with ARM for their low power platform.

If you guys really want to see an architecture for large scale multiple node systems then take a long look at SPARC. It was designed from inception to be highly scalable and modular. Its to the point that you can add / remove / replace CPU nodes while the system is operational, same with memory and axillary devices. SUN has done a very bad job of marketing SPARC for general use. With SUNPCI you can run an x86 based OS simultaneously with the SPARC based OS. I never understood why they didn't expand upon this concept and add the ability to couple x86 CPU's alongside their SPARC CPUs. With Windows "8" being made ARM compatible, there really is a hope that if Oracle can get in there they might be able to get a SPARC compatible version of a Windows OS, that would open the flood gates for sure.

For an example of what I mean I'll show the new flagship SUN SPARC CPU, the T3
SparcV9, Sun4v architecture (64-bit with CMT)

1.67ghz
16 cores
2 integer execution units per core (32 per CPU)
1 FPU per core (16 per CPU)
1 MMU per core (16 per CPU)
8 hardware threads per core (128 per CPU), This is like Intel HT on crack
6MB L2 cache shared across all cores (each core can read any part of it)
4 DDR3 SDRAM channels (4 memory controllers basically)
2 10g Ethernet controllers embedded per CPU
6 coherency links
16 Crypto Acceleration Engines (think Padlock) per CPU.
2.4Tb/s (yes a big T) aggregate throughput per socket

And the big winner, you can use up to four of these CPU in SMP without any special glue / IO circuitry. Otherwise you can just keep stacking them up to 64 or beyond CPUs, SUN actually has a system built like that.
 
I've been trying to tell people, SPARC is an amazing architecture. Single thread performance has never been high, but when we start talking about running 4~60+ demanding applications / threads simultaneously, its the SPARC that will end up winning. Only issue with it is that its systems are all specialized / expensive as hell to buy. Nothing is really expensive to build, but because each system is complex and unique Sun / Oracle can't get the economy of scale action needed to lower the per unit price. That and SUN totally boned up Solaris 10 initially, their crazy concept to go with JavaVM running everything nearly sunk their company. Running a memory / cpu demanding single threading (Java is hard to multi thread) application on a platform optimized for insane SMT.

I'd really like to see better Linux support, if RHEL or one of the big names would get behind a serious Sparc V490/890 and T2/ T3 port, you could do amazing things.
 
Status
Not open for further replies.