Discussion AMD Ryzen MegaThread! FAQ and Resources

jdwii · Mar 9, 2017

Juan i really do think we are having a issue with SMT or some other bios/driver issue i'm looking forward to seeing a patch plus Unix/Linux is getting a patch soon too

I'll actually wait a bit longer but if things don't change then yeah Ryzen talking PURE performance seems to be around where we thought a long time ago Sandy-ivy bridge level which as i said a long time ago isn't that bad for a new architecture

For like the 10th time my issue was never with the product but with the way Amd was dealing with it and their fanbase. The stuff Amd did would never fly if Intel tried to do the same and once again i would not agree with Intel doing such things.

juanrga · Mar 9, 2017

Nope 1151 :

Accessing main memory is costly in both latency and power terms. The idea of cache is just to avoid accessing main memory by storing a copy of the data in the cache. If you are accessing the actual ram sticks to get the data, then you are missing the advantages of caching.

As explained above, server designs use unified last level cache. AMD weird choice of a split L3 is not motivated by server needs, because no other server chip does that. It is still soon to know why AMD chose this weird and unusual design, but probably it is related to their problems with the IMC.

Zen will perform bad in latency-sensitive server applications and will perform ok in throughput-optimized server applications. It is not strange that AMD is giving marketing slides of Naples using seismic applications and then comparing a 64-core Naples with faster memory (2400MHz) against 44-core Xeons with slow memory (1866 MHz).

AMD is again crippling performance on its competitor. The Xeon E5-2699 v4 supports 2400MHz, but AMD is using 1866MHz for slides. Wow!

Forrest_Naples_Embargoed%20Until%203_7_17-page-015_575px.jpg

Forrest_Naples_Embargoed%20Until%203_7_17-page-018_575px.jpg

salgado18 · Mar 9, 2017

-Fran- :

Also, they did show the same test using the same configuration. The second slide is to show that Naples can run a faster config than the Xeon.

dgothi · Mar 9, 2017

TechyInAZ :

Yes, I do aware that. I did research it long enough. I think I might pick Ryzen over Intel also I really like Ryzen 1700 because it is only 65 watts TDP. but 1800x is tempting, too. I don't know.

Nope 1151 · Mar 9, 2017

Hmmm. Thanks for the explanation Juan.
Theoretically speaking, would disabling copying across CGX reduce latency but decrease cache write enough it would be negligible (or detrimental)?

juanrga · Mar 9, 2017

salgado18 :

Wow, thanks. I missed that slide. It is not in the front-end of Anand covering

http://www.anandtech.com/show/11183/amd-prepares-32-core-naples-cpus-for-1p-and-2p-servers-coming-in-q2

I have now checked Tomshardware's covering of Naples event which includes the slides of three demos: demo1, demo2, and demo3

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9LL0QvNjU3ODA1L29yaWdpbmFsLzA2LkpQRw==

Fair the analysis that Tomshardware does:

AMD was light on the details of its custom seismic workload, and although we do know that it employs AVX instructions, it is impossible to compare the results to standardized workloads or provide a detailed analysis of the tests. AMD indicated that the workload is a computationally intensive analysis involving iterations of 3D wave equations that stresses the CPU, memory, and I/O subsystem. We also weren't provided with more detailed system specifications or settings, so take the results with a grain of salt.

The first workload consisted of 10 iterations of a 1 billion sample grid. AMD restricted its core count and memory speed to match the Intel system, yet still managed to complete the workload in roughly half the time.

For the second test, AMD conducted the same test but brought all 64 cores to bear and bumped its memory speed up to 2,400MHz while the Intel system remained at 1,866MHz. Once again, AMD's carefully selected workload completed faster on the Naples system, yielding a 2.5X advantage. It's impossible to derive any useful scalability comparisons between the workload completion time of the 44-core Naples configuration and the 64-core native configuration due to a lack of information on the workload.

Finally, AMD provided a demo specifically designed to highlight its memory capacity advantage. The company increased the dataset to 10 iterations of a 4 billion sample grid, which simply couldn't run on the Intel system due to its memory capacity disadvantage.

The Takeaway

AMD's Naples design is impressive, and although the benchmarks are obviously very limited and designed to cast Naples in a favorable light, our initial Ryzen tests indicate the Zen architecture is well-optimized for HPC workloads. It will be interesting to see the actual Naples silicon in action in a wide range of industry-standard workloads.

It seems evident that AMD chose a memory bound workload. 44-core Naples is ~2x faster than 44-core Broadwell because Naples has twice more DDR4 channels: eight for Naples vs four for the Xeon.

EDIT: Dan Bounds, senior director of enterprise products at AMD, tells "We wanted to create a scenario where cores matter, but what really matters is the memory, both the bandwidth and the capacity"

Second EDIT: PCPER got similar conclusions:

AMD claims that its Naples platform offers up to 45% more cores, 122% more memory bandwidth, and 60% more I/O than its competition. For its internal comparison, AMD chose the Intel Xeon E5-2699A V4 which is the processor with highest core count that is intended for dual socket systems (there are E7s with more cores but those are in 4P systems). The Intel Xeon E5-2699A V4 system is a 14nm 22 core (44 thread) processor clocked at 2.4 GHz base to 3.6 GHz turbo with 55MB cache. It supports four channels of DDR4-2400 for a maximum bandwidth of 76.8 GB/s (19.2 GB/s per channel) as well as 40 PCI-E 3.0 lanes. A dual socket system with two of those Xeons features 44 cores, 88 threads, and a theoretical maximum of 1.54 TB of ECC RAM.

AMD's reference platform with two 32 core Naples SoCs and 512 GB DDR4 2400 MHz was purportedly 2.5x faster at the seismic analysis workload than the dual Xeon E5-2699A V4 OEM system with 1866 MHz DDR4. Curiously, when AMD compared a Naples reference platform with 44 cores enabled and running 1866 MHz memory to a similarly configured Intel system the Naples platform was twice as fast. It seems that the increased number of memory channels and memory bandwidth are really helping the Naples platform pull ahead in this workload.

https://www.pcper.com/news/Processors/AMD-Prepares-Zen-Based-Naples-Server-SoC-Q2-Launch

juanrga · Mar 9, 2017

Nope 1151 :

For workloads using less than four cores, disabling copying across CCX would reduce the effective L3 amount to one-half, because cores within one CCX only could access the cache in their own CCX. If the workload fits inside 8GB of size then latency will be greatly reduced (from 90--100 ns to 40--50 ns) and performance would improve a lot of if the workload is very sensitive to latency. If the workload needs more than 8GB then the CCX would have to access main memory increasing a lot of latency and reducing performance.

Workloads using more than four cores have to run on both CCX and have to move data.

Rogue Leader · Mar 9, 2017

Gentlemen, this should go without saying but remember the rules outside here are the same as inside here. NO links to your personal websites or websites you work for are allowed. If you are hosting a photo for illustration that is fine, but any direct linking is forbidden. Thanks for your cooperation.

jaymc · Mar 9, 2017

juanrga :

salgado18 :

Wow, thanks. I missed that slide. It is not in the front-end of Anand covering

http://www.anandtech.com/show/11183/amd-prepares-32-core-naples-cpus-for-1p-and-2p-servers-coming-in-q2

I have now checked Tomshardware's covering of Naples event which includes the slides of three demos: demo1, demo2, and demo3

Fair the analysis that Tomshardware does:

AMD was light on the details of its custom seismic workload, and although we do know that it employs AVX instructions, it is impossible to compare the results to standardized workloads or provide a detailed analysis of the tests. AMD indicated that the workload is a computationally intensive analysis involving iterations of 3D wave equations that stresses the CPU, memory, and I/O subsystem. We also weren't provided with more detailed system specifications or settings, so take the results with a grain of salt.

The first workload consisted of 10 iterations of a 1 billion sample grid. AMD restricted its core count and memory speed to match the Intel system, yet still managed to complete the workload in roughly half the time.

For the second test, AMD conducted the same test but brought all 64 cores to bear and bumped its memory speed up to 2,400MHz while the Intel system remained at 1,866MHz. Once again, AMD's carefully selected workload completed faster on the Naples system, yielding a 2.5X advantage. It's impossible to derive any useful scalability comparisons between the workload completion time of the 44-core Naples configuration and the 64-core native configuration due to a lack of information on the workload.

Finally, AMD provided a demo specifically designed to highlight its memory capacity advantage. The company increased the dataset to 10 iterations of a 4 billion sample grid, which simply couldn't run on the Intel system due to its memory capacity disadvantage.

The Takeaway

AMD's Naples design is impressive, and although the benchmarks are obviously very limited and designed to cast Naples in a favorable light, our initial Ryzen tests indicate the Zen architecture is well-optimized for HPC workloads. It will be interesting to see the actual Naples silicon in action in a wide range of industry-standard workloads.

It seems evident that AMD chose a memory bound workload. 44-core Naples is ~2x faster than 44-core Broadwell because Naples has twice more DDR4 channels: eight for Naples vs four for the Xeon.

EDIT: Dan Bounds, senior director of enterprise products at AMD, tells "We wanted to create a scenario where cores matter, but what really matters is the memory, both the bandwidth and the capacity"

Second EDIT: PCPER got similar conclusions:

AMD claims that its Naples platform offers up to 45% more cores, 122% more memory bandwidth, and 60% more I/O than its competition. For its internal comparison, AMD chose the Intel Xeon E5-2699A V4 which is the processor with highest core count that is intended for dual socket systems (there are E7s with more cores but those are in 4P systems). The Intel Xeon E5-2699A V4 system is a 14nm 22 core (44 thread) processor clocked at 2.4 GHz base to 3.6 GHz turbo with 55MB cache. It supports four channels of DDR4-2400 for a maximum bandwidth of 76.8 GB/s (19.2 GB/s per channel) as well as 40 PCI-E 3.0 lanes. A dual socket system with two of those Xeons features 44 cores, 88 threads, and a theoretical maximum of 1.54 TB of ECC RAM.

AMD's reference platform with two 32 core Naples SoCs and 512 GB DDR4 2400 MHz was purportedly 2.5x faster at the seismic analysis workload than the dual Xeon E5-2699A V4 OEM system with 1866 MHz DDR4. Curiously, when AMD compared a Naples reference platform with 44 cores enabled and running 1866 MHz memory to a similarly configured Intel system the Naples platform was twice as fast. It seems that the increased number of memory channels and memory bandwidth are really helping the Naples platform pull ahead in this workload.

https://www.pcper.com/news/Processors/AMD-Prepares-Zen-Based-Naples-Server-SoC-Q2-Launch

Here's really good article on AMD's attempt to re-enter the server market... Including brief history of the company and IT in general and an interview with Forrest Norrod, senior vice president of AMD... Give breakdown of percentages an revenue Intel is getting from it... And how AMD can hurt their profits:
https://www.nextplatform.com/2017/03/07/naples-opterons-give-amd-second-chance-servers/

eathdemon1 · Mar 9, 2017

dgothi :

if you can, wait til april when windows gets ryzen support.

juanrga · Mar 9, 2017

jaymc :

juanrga :

salgado18 :

Wow, thanks. I missed that slide. It is not in the front-end of Anand covering

http://www.anandtech.com/show/11183/amd-prepares-32-core-naples-cpus-for-1p-and-2p-servers-coming-in-q2

I have now checked Tomshardware's covering of Naples event which includes the slides of three demos: demo1, demo2, and demo3

Fair the analysis that Tomshardware does:

AMD was light on the details of its custom seismic workload, and although we do know that it employs AVX instructions, it is impossible to compare the results to standardized workloads or provide a detailed analysis of the tests. AMD indicated that the workload is a computationally intensive analysis involving iterations of 3D wave equations that stresses the CPU, memory, and I/O subsystem. We also weren't provided with more detailed system specifications or settings, so take the results with a grain of salt.

The first workload consisted of 10 iterations of a 1 billion sample grid. AMD restricted its core count and memory speed to match the Intel system, yet still managed to complete the workload in roughly half the time.

For the second test, AMD conducted the same test but brought all 64 cores to bear and bumped its memory speed up to 2,400MHz while the Intel system remained at 1,866MHz. Once again, AMD's carefully selected workload completed faster on the Naples system, yielding a 2.5X advantage. It's impossible to derive any useful scalability comparisons between the workload completion time of the 44-core Naples configuration and the 64-core native configuration due to a lack of information on the workload.

Finally, AMD provided a demo specifically designed to highlight its memory capacity advantage. The company increased the dataset to 10 iterations of a 4 billion sample grid, which simply couldn't run on the Intel system due to its memory capacity disadvantage.

The Takeaway

AMD's Naples design is impressive, and although the benchmarks are obviously very limited and designed to cast Naples in a favorable light, our initial Ryzen tests indicate the Zen architecture is well-optimized for HPC workloads. It will be interesting to see the actual Naples silicon in action in a wide range of industry-standard workloads.

It seems evident that AMD chose a memory bound workload. 44-core Naples is ~2x faster than 44-core Broadwell because Naples has twice more DDR4 channels: eight for Naples vs four for the Xeon.

EDIT: Dan Bounds, senior director of enterprise products at AMD, tells "We wanted to create a scenario where cores matter, but what really matters is the memory, both the bandwidth and the capacity"

Second EDIT: PCPER got similar conclusions:

AMD claims that its Naples platform offers up to 45% more cores, 122% more memory bandwidth, and 60% more I/O than its competition. For its internal comparison, AMD chose the Intel Xeon E5-2699A V4 which is the processor with highest core count that is intended for dual socket systems (there are E7s with more cores but those are in 4P systems). The Intel Xeon E5-2699A V4 system is a 14nm 22 core (44 thread) processor clocked at 2.4 GHz base to 3.6 GHz turbo with 55MB cache. It supports four channels of DDR4-2400 for a maximum bandwidth of 76.8 GB/s (19.2 GB/s per channel) as well as 40 PCI-E 3.0 lanes. A dual socket system with two of those Xeons features 44 cores, 88 threads, and a theoretical maximum of 1.54 TB of ECC RAM.

AMD's reference platform with two 32 core Naples SoCs and 512 GB DDR4 2400 MHz was purportedly 2.5x faster at the seismic analysis workload than the dual Xeon E5-2699A V4 OEM system with 1866 MHz DDR4. Curiously, when AMD compared a Naples reference platform with 44 cores enabled and running 1866 MHz memory to a similarly configured Intel system the Naples platform was twice as fast. It seems that the increased number of memory channels and memory bandwidth are really helping the Naples platform pull ahead in this workload.

https://www.pcper.com/news/Processors/AMD-Prepares-Zen-Based-Naples-Server-SoC-Q2-Launch

Here's really good article on AMD's attempt to re-enter the server market... Including brief history of the company and IT in general and an interview with Forrest Norrod, senior vice president of AMD... Give breakdown of percentages an revenue Intel is getting from it... And how AMD can hurt their profits:
https://www.nextplatform.com/2017/03/07/naples-opterons-give-amd-second-chance-servers/

Nice reading. This latter article is exclusive for Naples and the demos

https://www.nextplatform.com/2017/03/08/amds-naples-x86-server-chip-stacks-intels-xeons/

NickatNight8320 · Mar 9, 2017

If i missed this elsewhere in the thread, I apologize

Win 10 Scheduler issues.
A- doesn't differentiate between physical and logical processor, assigning wrong work loads and
B-"Adding up the amount of L2 and L3 cache Windows 10’s scheduler “thinks” is there totals to an insane 136MB of cache"

Win 7 does not have this cache problem.

8350rocks · Mar 9, 2017

Nope 1151 :

Basically...that is what the windows scheduler update will do.

It will make windows recognize Ryzen as 2 separate 4core 8 thread processors. It should lead to substantial gains.

salgado18 · Mar 9, 2017

8350rocks :

But then 8-thread workloads would need cache info to be duplicated, which would reduce effective L3 by half in many heavy benchmarks.

8350rocks · Mar 9, 2017

http://www.anandtech.com/show/11170/the-amd-zen-and-ryzen-7-review-a-deep-dive-on-1800x-1700x-and-1700

AnandTech revised their review, recommending the 1800X for workstation/productivity.

They noted that their review was pending software optimizations, and that they would update after those optimizations come out.

8350rocks · Mar 9, 2017

salgado18 :

Not really...because each thread will mostly have different resources. Some may require duplication, but the IMC will load the required data into the local cache.

8350rocks · Mar 9, 2017

https://i.redd.it/35lx8xfdxfky.png

Interesting game benchmark results...

With a 1080Ti, Ryzen is mostly ahead of everything across the board (1080p/1440p/4K), even the 7700K.

EDIT: Here is a 4.0 to 4.0 test between a Ryzen with 4C disabled against a 7700K.

http://www.zolkorn.com/en/amd-ryzen-7-1800x-vs-intel-core-i7-7700k-mhz-by-mhz-core-by-core-en/view-all/

At same clocks, there is at most about a 7-8% gap between kaby lake and Ryzen. In some cases, Ryzen is better.

vvacenovski · Mar 9, 2017

8350rocks :

I'd be interested to read the reddit discussion as well. Could you, please, provide a link?

ps. If I was getting a new PC last year, my CPU of chice would've been 5820K - for both my gaming and workstation needs. How does the R7 1700 fare against 5820K, both stock and OC'd? Will look it up for sure, however, I would love to hear first hand opinion if any. Thanks!

jaymc · Mar 9, 2017

salgado18 :

Not necessarily each ccx's has two logical cores per physical core... so eight threads can be ran on one ccx no problem.. Even 10/12 threads or more can be ran on one ccx no problem the scheduler can just cue up threads on the least busy cores an prevent cache copy's across ccx's.. Apparently the code is already there in win10 as stated in one of the video's... windows just needs to see the cpu's as numa cpu's I think it was..

Although I suspect that code will be written specifically for Ryzen. Edit & Naples !!

jaymc · Mar 9, 2017

Nice reading. This latter article is exclusive for Naples and the demos

https://www.nextplatform.com/2017/03/08/amds-naples-x86-server-chip-stacks-intels-xeons/
[/quotemsg]

Yes looks like another good article, I saw that one too yesterday....skimmed over it. Gonna give it good read now.

He's a good author that guy, by the way his name is:
Timothy Prickett Morgan
Co-founder and co-editor (of the next platform) Timothy Prickett Morgan brings 25 years of experience as a publisher, IT industry analyst, editor, and journalist for some of the world's most widely-read high-tech and business publications including The Register, BusinessWeek, Midrange Computing, IT Jungle, Unigram, The Four Hundred, ComputerWire, Computer Business Review, Computer System News and IBM Systems User. Most recently, he was the Editor in Chief of EnterpriseTech.

Good site has to be said, pleasant to read an very informative...

He say's "AMD calls this zen approach “die-NUMA” ...
So I take it that video was right and the code for the Numa processor's that's already in win10 will sort out Ryzen an just needs to be applied.
This was specified in one of the video's posted here by someone, can't remember who right now...but anyhow.. the info seems to be spot on..
So Microsoft need to stop draggin their heels an sort it out ... come on april what are yis like.. an the code's already written.

Just release the patch already will yis please !!

jdwii · Mar 9, 2017

8350rocks :

That's what i like to see, i knew something was wrong also do you think games that can use 8 cores like watch dogs 2 will start to look a lot better with the scheduler update coming to windows? I ask cause technically Ryzen would look like 2 quad cores in stead of a true 8 core does that matter?

I knew something seemed off from the beginning as all benchmarks showed Ryzen to be doing really well even in single core work loads even Lame benchmark.

Also thanks for showing those benchmarks been trying to find benchmarks like this myself i'd be real happy for them to do a wide range of CPU architectures in the past and compare them to Ryzen in IPC.

TMTOWTSAC · Mar 9, 2017

vvacenovski :

The benchmarks came from the review here:

http://www.eteknix.com/nvidia-gtx-1080-ti-cpu-showdown-i7-7700k-vs-ryzen-r7-1800x-vs-i7-5820k/

Having a little trouble comparing the results to other sites though, which is probably due to settings. More specifically they state:

"However, in the interest of fairness, any technology which favours either AMD or NVIDIA is disabled. More specifically, this refers to PhysX, Hairworks and more. Additionally, we also disable all forms of AA to gauge performance levels which aren’t impacted by sophisticated AA."

Not entirely sure what Hairworks and "more" encompasses or whether or not it's affecting CPU load, but overall their fps numbers for both Ryzen and Intel vary significantly both up and down compared to benchmarks from other sites. Here's another review from Extreme tech comparing the 1800X to a 6900k with the 1080ti:

https://www.extremetech.com/gaming/245604-review-gtx-1080-ti-first-real-4k-gpu-drives-better-amd-intel

Even for the same game, such as Hitman, Eteknix had results that were dramatically lower (120 fps avg @ 1080p vs 175) but also higher (72 fps avg @ 2160p vs 64) with the 1800X.

And Tom's review of the 1080ti with a 7700k:

http://www.tomshardware.com/reviews/nvidia-geforce-gtx-1080-ti,4972.html

Again, with very different results at the same resolution and quality settings with the 7700k. Overall you'd expect higher results across the board with AA off. Since that isn't the case, has to be more going on.

8350rocks · Mar 9, 2017

jdwii :

8350rocks :

That's what i like to see, i knew something was wrong also do you think games that can use 8 cores like watch dogs 2 will start to look a lot better with the scheduler update coming to windows? I ask cause technically Ryzen would look like 2 quad cores in stead of a true 8 core does that matter?

I knew something seemed off from the beginning as all benchmarks showed Ryzen to be doing really well even in single core work loads even Lame benchmark.

Also thanks for showing those benchmarks been trying to find benchmarks like this myself i'd be real happy for them to do a wide range of CPU architectures in the past and compare them to Ryzen in IPC.

It should, because the performance hit is from threads hopping CCXs and the cache miss hurts the FPS. It should really sort out all of the gaming issues pretty much. I am sure there will still be some poorly optimized games here and there, but most should be a big up tick in performance.

juanrga · Mar 10, 2017

8350rocks :

As mentioned in other forums, those results don't agree with rest of sites using same cards or better cards. I copy and paste some posts:

I don't trust those. The 1800x was showing a nearly 20% deficit in Tomb Raider against the 7700k with a Titan or 1080 in nearly all reviews last week.

Except no other review with the 1080ti even comes close to showing that it's the opposite

I think you should watch Linus's tech 1080ti review, they have a Ryzen test system against a Intel 7700 system both with 1080ti, and that is not what it comes out, the Intel systems scale better.

Not a single other website or review is showing the numbers like this one website I've never heard of.

EDIT: The review is clearly weird. The guy has a 1800X at stock clocks gaming better than the 1800X overclocked at 4.1GHz in some cases

Ryzen-Deus-Ex-Mankind-Divided-1080p-Ultra-Preset-MSAA-Off.png

juanrga · Mar 10, 2017

salgado18 :

Not only reducing cache size to one half will affect performance in workloads requiring more than 8GB, but this cache duplication would need extra coherence cycles to maintain exact copies of the data in each L3 slice. This would require extra communication among the CCX and introduce even a higher performance penalty than with the current situation where information is only copied when one core access the cache in the other CCX.

Discussion AMD Ryzen MegaThread! FAQ and Resources

Splendid

Distinguished

Distinguished

Distinguished

Commendable

Distinguished

Distinguished

It's a trap!

Distinguished

Distinguished

Distinguished

Commendable

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Prominent

Distinguished

Distinguished

Splendid

Distinguished

Distinguished

Distinguished

Distinguished

Share this page