AMD CPU speculation... and expert conjecture

Page 733 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

con635

Honorable
Oct 3, 2013
644
0
11,010
^That's from chiphell, think its a guess, scores differ on same cards from previous ch leaks. edit If it really has 4096 cores, a new gcn and memory architecture I would expect a bit more 40%, maybe the aio means more than a ghz clockspeed as well?
 
If the 390X really does outpace the Titan X I'll be pretty impressed. The Titan X is full GM200 with 12gb (!) or memory. nVidia aren't going to be able to release a faster card this time.

I think maybe it's not a bad move on AMD's part to go second this time round. To be honest though I'm more curious to see what happens with the rest of the line up, by all accounts there's going to be a few new parts rather than mainly rebrands this time.
 
Most Of Apple's A9/A9X Chips To Be Manufactured On TSMC's 16nm FinFET Process
http://www.tomshardware.com/news/apple-a9-a9x-16nm-finfet,28748.html
this is less about apple and more about tsmc. if tsmc has good yields from it's 16nm finfet process, apple might take up majority capacity. this'll squeeze others and the only loser will be nvidia. both amd and qualcomm have glofail and samsung as backups. i wouldn't be surprised if glofo makes some of the low power skybridge socs.
 

jdwii

Splendid
If those benchmarks are true then Amd won this time around in a way. 390X seemed to be 5% faster then the titan X while using 5% more power. Now people a simple 10% clock boost would put a 980Ti(edit sorry titan X) above or around the 390X. Seems like again for the several last generations Amd and Nvidia will be equal in performance but this time around Amd might actually have similar performance per watt. To bad it took HBM memory to pull this off when Nvidia is still stuck on GDDR5 memory.
http://wccftech.com/amd-r9-390x-nvidia-gtx-980ti-titanx-benchmarks/
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


At 1600p the 390X is 2% faster than Titan X
At 4K the 390X is 2% faster than Titan X

The difference is statistically insignificant and easily inverted with a mere driver update.

The 390X consumed 13% more power than Titan X.

The 390X providing more performance than the 290X at roughly the same power consumption implies AMD has increased the efficiency of the top card by about 40%. I think the cards are made on 28nm and the efficiency comes from some muarchitectural improvements plus the use of HBM.

AMD needing 4GB HBM to match ~90% of Nvidia efficiency (12GB GDDR5) implies that when Nvidia jumps to HBM the efficiency gap will increase again.

The 1600p performance numbers look ok, but I expected bigger gaps at 4K. I will wait to techpowerup review.
 

Ranth

Honorable
May 3, 2012
144
0
10,680


Yet Nvidia is very likely to price the titan X at $1000 or above, while I assume that to be less likely with AMD.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


It would probably be difficult to pitch a 4GB card vs a 12GB card, which makes the 8GB rumor interesting.

As for pricing that's even more difficult to say. They're dealing with a large interposer and the AIB's haven't worked with those before. AMD has to charge a lot more than previously for the parts because it includes the RAM. On the other hand the AIB's end up designing a much simpler PCB since there are significantly fewer signals to deal with.

Ultimately it depends on yields as both of these die are huge.

I wouldn't be surprised if they both launch around $999 or more. Both are much faster than their predecessor and if they launch too low they'll cannibalize sales.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780

Yeah, and the rest of the article is talking about making compiler adjustments in an OS that has significantly less market share than Linux. The three instructions per clock tick is the optimal goal that they are trying to achieve via compiler changes. It's theoretical.

The whole thing is basically "our version of amd64 is theoretically capable of 3 instructions per clock tick, how do we get our compiler to actually achieve that"

I suggest everyone else actually read the content I posted where Juan cites his thoughts on his graph and come to your own conclusions.

IPC, CPI, etc are all measurements that have a theoretical limit of the architecture and then their real world values depend on the compiler.

To make matters worse, the software they're using to measure is not a benchmark at all, it's a scientific simulation software used as a benchmark.

It's like I said previously yet you refuse to address. That graph is measuring how close the compiler can get to theoretically perfect IPC of an architecture. IPC is far more dependent on the compiler than the hardware.

Example is if a compiler poorly optimizes for caches of a certain CPU architecture and it constantly has cache misses, the IPC will be bad. It's not the CPU's fault, it's the compiler's.

I'm trying to help you understand the data you presented and you seem to not want to have anything to do with it. I hope that the others appreciate it.

IPC and all related measurements are simply a benchmark of the compiler and how well the compiler uses an architecture.

We have a real world example of this, compare CLANG/LLVM to GCC benchmarks (Phoronix does this quite often)

http://www.phoronix.com/scan.php?page=article&item=gcc49_compiler_llvm35&num=2

Same software, same hardware, different performance. Thus, by your argument, different IPC. The only variables changing here are the compiler. Look at GraphicsMagick, GCC 4.8.2 is almost twice as fast as CLANG in the same OS, same hardware.

It doesn't matter the software, hardware, or OS. All that matters is the compiler's quality and how well it can optimize for a certain architecture. CLANG is clearly not coming close to theoretical IPC limit of the 4770k in GM, and that benchmark has nothing to do with 4770k's instructions per clock, CPI, etc.

This is exactly the thing your IPC benchmarks are showing, yet you insist it has nothing to do with compiler at all and is all about the hardware.

IPC is strictly a compiler issue. If you want to talk IPC of a specific architecture, go create a perfect compiler that will extract the absolute best performance out of every relevant x86 CPU. I'll be waiting...

Also, you can safely ignore the time compilation benchmarks, they only benchmark how long it takes the compiler to generate code and are not relevant to this discussion.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


All leaks suggest that Titan X will be $999 or so, but I don't get why people assumes that 390X will be cheap. The die is huge, the arch is new, HBM/interposer is more expensive than mainstream GDDR5, and the hybrid cooler ads to the cost.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The "performance" term in the basic CPU equation that I have given before applies to any piece of software. It doesn't have to be a "benchmark". The same applies to IPC. Moreover IPC is not a "measurement", as you claim, IPC is a computer science concept, and like any other concept it can either computed theoretically using a mathematical model of the CPU or measured on a piece of hardware. Engineers work with different analytical models and cycle simulators to predict the IPC of a given architecture before building it. Engineers don't work by trial and error approach.

It is untrue that "IPC is strictly a compiler issue". IPC also applies to non-compiled software measurements. Moreover IPC is a well-known aspect of hardware.

http://en.wikipedia.org/wiki/Instructions_per_cycle

As I have explained before >>here<<



I illustrated the difference between maximum IPC allowed by a given AMD CPU and the average IPC obtained by a concrete piece of software running on the same AMD CPU >>here<<

Your example of cache illustrates the close relationship between hardware and software. One thing is the optimization that the compiler does of the cache resources of the CPU, but another thing is the specific hardware devoted to cache management. Compilers are not magic, they work pre-run and cannot predict the future; a compiler cannot predict with 100% accuracy what info will be on cache at a given time on a given branch of the code because has limited access to dynamic information. This is why modern CPUs devote hardware to cache management. An example are the hardware prefecthes, which try to identify data-patterns to load data on cache before the data is used. Those prefecthes work with regular and predictable data biut fail on irregular and impredictable loads. For this reason modern SS/OoO cores use the ROB and other related structures to hide the latency associated to impredictable cache misses.

I will add that most of Keller effort since his return is on cache management. He and his group have developed new hardware methods since 2012, including a novel stacked cache method. We will see how much of that is finally used in K12/Zen.

We don't need to create perfect compiler to talk about the IPC of a given architecture. We can say that Excavator provides 5% more IPC than Steamroller, that Haswell provides about 60% more IPC (integer) than Piledriver, and that the new A72 core has higher IPC than the core used in AMD Seattle.

You are asking me stuff I already addressed before. Moreover this is going off-topic and I will stop here, if you have some question that I have not addressed before send me a PM.
 

Ranth

Honorable
May 3, 2012
144
0
10,680


I'm not assuming it to be cheap, unless you go by "cheaper than Titan X" being cheap. I personally think/hope it'll be priced between 980 and Titan X, probably closest to Titan X if performance is as benchmarks point at. However I don't consider similar pricing to Titan X to be impossible.
 

con635

Honorable
Oct 3, 2013
644
0
11,010
AMD-Radeon-R9-390X-WCE-900x491.png


http://videocardz.com/55124/amd-radeon-r9-390x-wce-could-this-be-real

I'm gona say 50% at least over 290x, has to be.
 


It's a nice upgrade to the A8-7600 which is a very good value APU. The big benefit is that you can overclock the CPU_NB which should have a very significant impact on memory and iGPU performance.
 

Very Nice!
my points of interest are the hardware h.265 decoding, 8GB HBM and enhanced zerocore. i am specially looking forward to seeing how the last one works in a real card. all of these are with NaCl, ofc. :)

depending on the actual configuration, hopefully amd can cut the fiji into 1024 shaders + 2GB, 2048+4GB or 3072+4GB or 512+1GB versions. sure would be nice to buy an entry level 512 shader gfx card with single slot, low profile, (blower cooler optional :p)h.265 HW decoding that can run circles around other entry level gfx cards today. i have a dream(s)!
 
Fiji won't get cut like that, it would be way too expensive compared to just offering smaller chips. There will likely be 3 version. 8GB version, 4GB version and 4GB cut down version. Maybe a 2GB cut down version if the interposer can be partly savaged.
 

yeah. smaller ones will be too expensive with HBM. hopefully amd has non-HBM versions for lower price points.
 

con635

Honorable
Oct 3, 2013
644
0
11,010

If the ch guy really has one to bench, wouldnt it be an es and not the final spec/drivers? So it could be more by release? What do you think of hbm in terms of performance, like if we put hbm on a 290x would there be a performance increase along with better power consumption? Will this be gddr3 vs gddr5 again?


 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The frequencies suggest that are models close to production. Some models are even overclocked. A driver update can change the situation in both ways. It could increase the gap 390X--TitanX or could invert the gap. That is why 5--10% differences in benchmarks are statistically insignificant.

HBM provides tons of bandwith and reduces the power consumption compared to GDDR5. HBM on a 290X would reduce the power consumption of the card by double digit percents and increase a bit the performance at highest resolutions, but not a lot of because the 290X is not precisely bandwidth limited. The 390X has lots more cores and will use the extra bandwidth. In fact it matches the 295X2.

HBM is expensive, this we will see them only on high-end cards this year. Then in future will become mainstream.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
About one year ago I predicted that AMD would abandon discrete GPUs and replace them with APUs. I showed, with numbers, that future APUs will be much more powerful than dGPUs and that dGPUs will be killed by about 2020. I wrote an article on APUsilicon explaining why the APU is faster and giving details of the architecture of the future HPC APU that AMD engineers are designing. Well, AMD has just confirmed my thoughts
005l.jpg

006l.jpg
 


Uhm... I can't see what you're saying anywhere... All I see is AMD saying they'll get APUs to Pro market and that APUs will continue to get stronger alongside dGPUs.

Cheers!
 

cemerian

Honorable
Jul 29, 2013
1,011
0
11,660



somehow i must be going blind, since what you are saying and what the slides are showing are two different things
 
Status
Not open for further replies.