AMD Piledriver rumours ... and expert conjecture

Page 115 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
Sorry for multiple posts but I want to break these up into distinct messages.

Why should you case about the impartiality of independent reviewer? Because they are the eyes, ears, judges and jury's that represent you the customer. Vast majority of customers won't have the time and money to purchase each venders products and test them to determine their relative value, their merits and weakness's. We rely on independent third party reviewers to do this for us and we trust that they will be honest with their judgement. If a company can manipulate those independent reviewers then they can manipulate the perspective the customers receive and thus the buying decisions of those customers.

We should never tolerate bias or manipulation from our independent review sites.
 
^ IT nerds at their finest

So... anyone know the release date for Piledriver? I heard it was...summer?
Piledriver = Marginal improvements on GPU and minor fixes on CPU??


Seems to be early summer but I'm expecting it to be held back, half empty and all that.

If their using resonate clock mesh technology then I expect a major performance improvement from increased clock speeds alone. RCM reduces power consumption by 10~25%, so you can expect at least a 10~25% increase in clocks and thus relative performance.

All I care about is performance vs price vs power usage. That last is important as the country I live in has an escalating power utility cost, it gets prohibitively expensive to run AC during the summer. Talking about $400+ USD a month power bill to cool a 43pyung (about 1500sq feet) apartment.
 
^ IT nerds at their finest

So... anyone know the release date for Piledriver? I heard it was...summer?
Piledriver = Marginal improvements on GPU and minor fixes on CPU??

Actually, and I've been thinking about this, the way code is made is rather important to how a cpu runs programs. Maybe we, as enthusiasts essentially in a community leadership role of sorts, need to give this more attention.

Not to distract from the hardware oriented nature of this thread, but to ignore the code the cpu runs won't do either.
 
I actually wish windows 8 gets launched with trinity but thats never going to happen.

Im expecting 10% average cpu increase over llano for trinity. As for gpu, I would imagine 30% average.

I just want a hopefully cheap trinity ultrathin that I can replace this atom netbook. I hate this thing so much...
 
Sorry for multiple posts but I want to break these up into distinct messages.

Why should you case about the impartiality of independent reviewer? Because they are the eyes, ears, judges and jury's that represent you the customer. Vast majority of customers won't have the time and money to purchase each venders products and test them to determine their relative value, their merits and weakness's. We rely on independent third party reviewers to do this for us and we trust that they will be honest with their judgement. If a company can manipulate those independent reviewers then they can manipulate the perspective the customers receive and thus the buying decisions of those customers.

We should never tolerate bias or manipulation from our independent review sites.
Couldn't agree more with this. This is one reason I don't just look at one review and make my decision based on one viewpoint. With AMD being the underdog its easy to get away with bashing it to death and not even give it a second thought.

I will say this. playing with BD has been a lot of fun and a whole lot more insightful than just reading the one-sided reviews (from both sides)

One thing I will say, If AMD can get their ducks in a row, the modular approach can be a beast, but early stage its touch and go with some points.

One part that I just noticed how much effect it has is the shared front end. A module itself is pretty close to what AMD said, 80% of a dual core for performance. While that itself is an improvement over HT, in itself can actually hurt performance quite a bit.

While one half of the module is being used, it runs at 100%. Turn on the other half and both halves run at 80%, not 100% + 60%. This I discovered myself when playing with prime 95. Stress testing everything runs even across all cores. one core drops out after a while sometimes ... what happens is interesting. The now un-shared core speeds up and starts finishing calculations faster.

Here is where it BD gets part of its bad reputation.

CPU_Low.png


Note core 2 is running 100%. this is on an Intel with HT. Here is where optomizations play in vs non optomized. This is also just one of many possibilities as palladin has been pointing out, there is a lot more than just one thing going on.

BD doesn't shut down 1/2 modules in skyrim, load up cpu 3 on bd, and even if its only 40%, its still being used. What happened to cpu 2? its still at 100% usage, but 80% efficiency.

The end result is it runs SLOWER than Phenom II cpus, because they don't suffer that penalty. It will be interesting to see what AMD does to try and solve this issue, but what I would think could be ingenious is if the cpu itself can detect load on a module and if its at 100% and only 30%, then it shuts off half of that module in order to boost performance through the one set and sends that 30% to another module. probably just wishful thinking but who knows.

This is what reviews usually don't cover is the "strange" details, mainly because they don't have enough time to play with the product, instead just crunch numbers and post what they saw. Most people wouldn't even bother asking why.
 
MAN THIS THREAD HAS GOTTEN ENTERTAINING.... :lol:
been away and wow....!
everybody's IP Address in a MMA ring and we can packet loss for the champ...

anyways, where are you in your decision.?
I'm pretty set on BD at this point. I would question myself, again, but It just seems like the right decision to me. Ordering parts today or tomorrow.

I should make a good example for BD haters. If 8 cores can be used, then BD wins, because, like I said before, 2 BD cores are not worse than 1 SB core.
 
So, I have a quick question on compilers. I noticed Folding at Home mentions the core compiler and options.

Compiler: Intel (R) C++ MSVC 1500 mode 1200
Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT

So, is this the Intel Compiler? And second, do all of the options/flags force it to do all optimizations on all processors (ie AMD) that support them?
 
So, I have a quick question on compilers. I noticed Folding at Home mentions the core compiler and options.

Compiler: Intel (R) C++ MSVC 1500 mode 1200
Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT

So, is this the Intel Compiler? And second, do all of the options/flags force it to do all optimizations on all processors (ie AMD) that support them?

Looks like it.

Also, in case you were wondering:

/TP: Compiles all source or unrecognized file types as C++ source files

/nologo: Do not display compiler version information

/EHa: enable asynchronous C++ exception handling model

/Qdiag-disable: Suppresses messages by number list, where num-list is either a single message or a list of message numbers separated by commas and enclosed in parentheses

/Ox: The compiler enables maximum optimizations by combining the following options:
• /Ob2
• /Og
• /Oy
• /Ot
• /Oi

-arch:SSE: Optimizes for Intel® Streaming SIMD Extensions (Intel SSE) [I assume this was used instead of using -arch:SSE3 so the app could be used by any processor with even basic SSE support?]

/QaxSSE2: Can generate Intel® SSE2 and SSE instructions for processors, and it can optimize for Intel® Pentium® 4 processors, Intel® Pentium® M processors, and Intel® Xeon® processors with Intel® SSE2. [I thought that simply specifying /QaxSSE4.2 would have generated seperate SSE code paths for down level SSE versions, but it appears each one was specified explicitly. There should be separate code paths for each SSE level either way].

/QopenMP: Enables the parallelizer to generate multi-threaded code based on the OpenMP* directives.

/Qrestrict: Determines whether pointer disambiguation is enabled with the restrict qualifier.

/MT: Changes the default target rule for dependency generation.

The /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 would seem to indicate each level of SSE has its own code path, and should be running fine on AMD [and yes, VIA] processors.
 
hey noob2222
you can try this experiment
load up all cores using prime95 and then disable second core of every module and perform benchmark on remaining core by setting affinity to those cores and note down the result, then close p95 and run that bench again (without changing affinity) and compare this with previous one.
Capisci
you can share them with us too😉
 
The /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 would seem to indicate each level of SSE has its own code path, and should be running fine on AMD [and yes, VIA] processors.

It will only execute up to SSE2 on non-Intel CPUs. SSE3/4 will both be only be used if there is an Intel Vender ID with a matching Family ID or corresponding feature flag. There is nothing that ~you~ the customer can do to force the dispatcher to run SSE3+ code paths on non-Intel hardware. Only way to do that without switching to GNU GCC / MSC is to patch your code to fool the dispatcher.

Agner has instructions on how to do this and some of his own libraries that allow you to get around Intel's limitation.

http://agner.org/optimize/

So for,

So, I have a quick question on compilers. I noticed Folding at Home mentions the core compiler and options.


Compiler: Intel (R) C++ MSVC 1500 mode 1200
Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT


So, is this the Intel Compiler? And second, do all of the options/flags force it to do all optimizations on all processors (ie AMD) that support them?

Yes that's Intel's compiler. And no those flags will no allow optimizations on AMD's CPUs, only SSE2 code will be used on AMD CPUs, Intel's will have SSE3/4.

http://agner.org/optimize/blog/read.php?i=49

Near the bottom he tells you what code will be executed on which CPU.

The only way to force the Intel Compiler to allow SSE3+ on non-Intel CPU's is to override it's generic code paths.

There is an option for setting the generic level higher or lower. For example, the options /arch:SSE3 /QaxSSE4.1,AVX will set the generic level to SSE3 and generate three versions of the code for the SSE3, SSE4.2 and AVX instruction sets. Non-Intel processors can only get the generic version, which will be SSE3 in this example. Code compiled with the /Qx option, for example /QxSSE4.1 will fail to run on non-Intel processors and processors without the specified instruction set.

This has one seriously negative side effect, it sets the absolute lowest level of code generated and if the code is run on a CPU that doesn't support those instructions then it will fail. Setting SSE3/4 through /arch: will enable optimizations on an AMD / VIA CPU but only on the newer ones, attempting to run your code on an older CPU and it'll crap out. Basically your raising your lowest common denominator.
 
The /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 would seem to indicate each level of SSE has its own code path, and should be running fine on AMD [and yes, VIA] processors.

It will only execute up to SSE2 on non-Intel CPUs. SSE3/4 will both be only be used if there is an Intel Vender ID with a matching Family ID or corresponding feature flag. There is nothing that ~you~ the customer can do to force the dispatcher to run SSE3+ code paths on non-Intel hardware. Only way to do that without switching to GNU GCC / MSC is to patch your code to fool the dispatcher.

Agner has instructions on how to do this and some of his own libraries that allow you to get around Intel's limitation.

http://agner.org/optimize/

So for,

So, I have a quick question on compilers. I noticed Folding at Home mentions the core compiler and options.


Compiler: Intel (R) C++ MSVC 1500 mode 1200
Options: /TP /nologo /EHa /Qdiag-disable:4297,4103,1786,279 /Ox -arch:SSE /QaxSSE2,SSE3,SSSE3,SSE4.1,SSE4.2 /Qopenmp /Qrestrict /MT


So, is this the Intel Compiler? And second, do all of the options/flags force it to do all optimizations on all processors (ie AMD) that support them?

Yes that's Intel's compiler. And no those flags will no allow optimizations on AMD's CPUs, only SSE2 code will be used on AMD CPUs, Intel's will have SSE3/4.

http://agner.org/optimize/blog/read.php?i=49

Near the bottom he tells you what code will be executed on which CPU.

The only way to force the Intel Compiler to allow SSE3+ on non-Intel CPU's is to override it's generic code paths.

There is an option for setting the generic level higher or lower. For example, the options /arch:SSE3 /QaxSSE4.1,AVX will set the generic level to SSE3 and generate three versions of the code for the SSE3, SSE4.2 and AVX instruction sets. Non-Intel processors can only get the generic version, which will be SSE3 in this example. Code compiled with the /Qx option, for example /QxSSE4.1 will fail to run on non-Intel processors and processors without the specified instruction set.

This has one seriously negative side effect, it sets the absolute lowest level of code generated and if the code is run on a CPU that doesn't support those instructions then it will fail. Setting SSE3/4 through /arch: will enable optimizations on an AMD / VIA CPU but only on the newer ones, attempting to run your code on an older CPU and it'll crap out. Basically your raising your lowest common denominator.

According to the latest documentation (http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/main_cls_lin.pdf):

-arch:

Code generated with the values IA32, SSE, SSE2, or SSE3 should execute on any compatible non-Intel processor with support for the corresponding instruction set.

Which is what I figured; specifying '-arch:SSE' basically ensures that any Pentium or better processor can run the program.

/Qax:

Tells the compiler to generate multiple, processor-specific auto-dispatch code paths for Intel processors if there is a performance benefit.

So basically, the application is running with SSE 4.2 for Intel processors, and baseline SSE for everyone else.
 
Can I just be honest...I do not know .5 of the stuff you guys say! 🙁

Sucks being ignorant and retarded.

A compiler takes the code written by programmers, in this case c++ code, and generates machine code which a cpu can execute. Compilers are really cool programs btw. A compiler has command line switches that help control how the machine code is produced. They are just discussing how the Intel compiler produces machine code with regards to certain instructions that have been added to the basic X86 instruction set.
 
A compiler takes the code written by programmers, in this case c++ code, and generates machine code which a cpu can execute. Compilers are really cool programs btw. A compiler has command line switches that help control how the machine code is produced. They are just discussing how the Intel compiler produces machine code with regards to certain instructions that have been added to the basic X86 instruction set.

I understood that part...

I don't understand ANYTHING the guys here talk about...regarding the immense knowledge of engineering and neuroprocessing that occurs on these forums. I feel stupid almost.
 
I'm pretty set on BD at this point. I would question myself, again, but It just seems like the right decision to me. Ordering parts today or tomorrow.

I should make a good example for BD haters. If 8 cores can be used, then BD wins, because, like I said before, 2 BD cores are not worse than 1 SB core.


I think its about the same 1 module pretty much equals a 2500K core while being clock 300 mhz faster and while having a turbo. Under hand brake its around 10-15% faster then the 2500K while having twice as much cores and a 10-15% higher clock rate. This is based on these results well not really but here is some and hand brake does use all 8 cores.

http://www.overclockersclub.com/reviews/amd_fx8150/6.htm

Here the 8150 is equal to a 920, But i often see it at 2500K-2600K levels.
 
Yes that's Intel's compiler. And no those flags will no allow optimizations on AMD's CPUs, only SSE2 code will be used on AMD CPUs, Intel's will have SSE3/4.


Sounds like AMD needs to pursue this further with the FTC then. That shouldn't be happening after the lawsuit settled, unless Intel has some additional time to comply.

 
According to the latest documentation (http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/main_cls_lin.pdf):

-arch:

Code generated with the values IA32, SSE, SSE2, or SSE3 should execute on any compatible non-Intel processor with support for the corresponding instruction set.

Which is what I figured; specifying '-arch:SSE' basically ensures that any Pentium or better processor can run the program.

/Qax:

Tells the compiler to generate multiple, processor-specific auto-dispatch code paths for Intel processors if there is a performance benefit.

So basically, the application is running with SSE 4.2 for Intel processors, and baseline SSE for everyone else.

Yep, which is exactly what I said. Unless you force it with -arch:SSE3 it will only run SSE2 on any non-Intel CPUs. All those /Qax flags ~ONLY~ apply to Intel CPUs.
 
Sounds like AMD needs to pursue this further with the FTC then. That shouldn't be happening after the lawsuit settled, unless Intel has some additional time to comply.


SSE2 is being run on non-Intel CPUs, Intel consider's it's job done. Cast was started in 2005, settled in 2009, FTC finalized it's investigation afterwards. Intel didn't implement the changes until 2010. We can expect SSE3/SSE4 support for non-Intel CPU's sometime in 2014, by then Intel would of convinced people to switch to AVX. Funny thing about the Intel Compiler, AVX instructions produced by the compiler won't run on non-Intel CPU's (basically only AMD) even if you force it and bypass the dispatcher. Seeing as AMD has implemented the AVX instruction set as specified by Intel, people are calling shenanigans. Smells like the FMA3/4 debacle all over again.
 
Status
Not open for further replies.