bit_user :
genz :
Fair enough. I have never even heard of seismic workloads before. Thanks for the knowledge.
That wasn't me. I didn't cite any examples. You can find them, if you care to look.
That was a thank you fyi. Take it how you want. I don't care to look because it's not my field and not worth my time. You seem very competitive.
bit_user :
genz :
I would call false on the AMD only having 64 to 96 lanes of PCI-E, not on it being a multi chip package.
Ah, but that's not what I said. I don't dispute their claims of what's on the outside of their package, but either Ryzen doesn't expose all of Zeppelin's PCIe lanes, or there's a bottleneck inside the package. I was just highlighting that fact. Do the math. It doesn't add up.
genz :
What am I, an AMD developer in disguise? Cmon.
Obviously not. Some people on here do trade stocks in these companies. That's more what I had in mind.
I would then refer you to my original statement where I said that increasing memory channels (identical use case in this test) offers negligible increases in performance. You responded with saying that that did not apply in massively multicore chips. This is the argument that you are correct with Ryzen not exposing Zeppelins lanes, right?
I would also refer you to original Ryzen documentation speaking on CCX dualcore lith having accelerated access to each other but then a slower access to cores in other CCX via L3 cache. I expect Naples to address this by providing lower latency faster Infinity Fabric and cache as the main feature over Ryzen with respect to upgrades.
https://www.techpowerup.com/231268/amds-ryzen-cache-analyzed-improvements-improveable-ccx-compromises
Looking at the benches above, the main binning sacrifice I see being made in Ryzen are there. Access to L3 from CCX is mostly balanced to exact percentages (as calculated by the rate at which access times are increased to RAM exactly at 8MB access as despite there potentially being plenty more L3 available on other CCX cores). Load balancing logic for this is almost certainly choosing RAM over other CCX core's L3 to store it's own data, but even if it isn't sometimes Zep/Naples main lith differences are going to be in the L3 and interconnects because of their provisioning for so many more cores, so it's only logical that the binning lith isn't there yet (at the release of Ryzen) hence the engineering sample's benchmarks needed faster RAM latency to simulate.
I would finally refer you to a simple bit of logic on Infinity Fabric, best told from here:
https://www.reddit.com/r/Amd/comments/5zr8lv/i_asked_amd_a_followup_question_about_infinity/df0g8bq/
The Infinity Fabric is essentially a set of tubes connecting everything to everything else, including CCX nodes to each other, to PCIe lanes, everything. The better the IMC, the faster the Infinity Fabric can run if supported by proper ram. This means a good IMC and good ram are necessary if you want to use 32 PCIe 3.0 Lanes, as each lane takes roughly .97GB/s. If we reverse the thinking, PCIe lanes (and other uncore devices) in use, the less likely the Infinity Fabric is to bottleneck.
Zepplelin is an APU. Greenland supports HBM so is going to need direct connection to the IMC and conventional link to HT/IF, but has 4 GMI links which equate to 7 PCIe 3.0 lanes of performance roughly going to the . It's old info that was improved upon before even Ryzen, and in all likelihood could be the sacrifice we see in the future APU rollout but that's definitely not Naples. In fact that's probably not even server. That's where I started feeling FUD.
I expect we're at least two generations from bottlenecking the Infinity Fabric in real world scenarios outside of the current CCX <-> CCX communications. Right now we hardly have GPUs that can fill 4x 3.0 PCIe lanes, and CCX <-> CCX bandwidth issues are mostly artificial due to errant windows scheduling.
It is a rather interesting concept.
.....
The issues with SMT reflect this shared bandwidth - windows scheduler decides it needs to move one thread from CCX1 to CCX2, and for a fraction of a second, saturates a substantial amount of the bandwidth available to everything else. That's normally not a problem for pretty much anything else other than gaming, but in gaming where your GPU regularly needs at least 4GB/s of bandwidth (4x3.0 PCIe lanes) it'll cause some hiccups. Many tasks will need that much bandwidth, but no in the constant manner that gaming does; GPU mining for example, will use that much bandwidth for less than a second, then sit and crunch numbers with the occasional update that takes <1GB/s. The same is true for CPU usage - keep the threads within the CCX and the Infinity Fabric has plenty of space to talk to everything. Regularly churn cache dumps back and forth and you start to get some slow down.
It's very interesting that it scales with ram clock. Good single rank DDR4 just got a whole lot more important.
This is also why 1866mhz to 3200mhz ram is a big deal here. Intel's Ring Bus is always at max speed regardless of RAM performance. AMD's solution clocks up and down with RAM speed. imo that will probably allow us more overclocking headroom by reducing IMC speeds on both sides of the chip on the gaming side when needed (remember the overclocking hits we took when Intel moved the IMC on-chip with first i7 - this sidesteps the problem) and the gearing of Ryzen was toward that with the benefit of easier binning meaning the product gets out sooner.
Literally speaking, that means that the update for 3200mhz Ryzen RAM is also a big boost in interconnect speeds and the improvements to make that possible HAVE to happen in microcode. It would be on Ryzen now if it was done. It will be on Naples but I guess that's gonna have to come after these engineering samples.
I would finally say that it's very unlikely that Zeppelin is actually going to perform anything like Naples as Zeppelin is an APU, which assuming it took to server benchmarks would be crossfired on 2 socket servers. Whilst a perhaps revolutionary idea for marketing (extra two GPUs close on each CPU in a 6xCF config with load balancing + HBM extending the infinity fabric across the motherboard to distribute threads closer or further from CPU/4GPUs depending on needs. Would be a monster.) the logical leap simply isn't in AMD's reach right now. Zen is all about balance after all so I see it coming much later.
bit_user :
genz :
bit_user :
That was a silly thing to do. Nobody thought it was good at 4k, and I'm sure the cut down memory controller made almost no difference in that regard.
It was the difference between 60 FPS GTA5 and 35 FPS GTA5 and that goes for every other game that used that last 512MB.
I was disputing the idea that it was a good 4k card, or that the memory controller crippling would've really changed that. If you care about gaming and framerates, then why buy a 4k monitor and pair it with a 970? Either get a lower-res monitor, a faster card, or accept that your framerates will suffer. To have a hope of decent 4k gaming, you should've had at least a 980 Ti. Otherwise, it tells me you weren't really serious about 4k gaming.
BTW, I did
not state that their crippled memory controller
never makes a difference. Just that it's not the only thing standing in the way of the 970 being a good 4k card.
genz :
I suppose if I was you this is where I'd tell you you're thinking of server apps and not desktop.
Snarky comments earn downvotes.
Oh do tell... it certainly isn't the shader performance. A small overclock puts it squarely in GTX980 territory. The 980ti wasn't out at the time I bought it and a card broke. DX12 was on the horizon with massive performance increases being shouted from the rooftops and I buy a lot of cards to pass down through systems. Do you hear yourself? I deserve a downvote? You need to calm down because I'm here to share and be shared with just like you. Also look up to see what a crippled memory controller does, then realize that GDDR has latency in the hundreds of clocks.
bit_user :
genz :
You're trying to get your likes now so you want to take on a condescending tone. I understand that but look up and you will realize the FUD is FUD.
I don't even know what the heck you're on about. I'm not here for mind games, trolling, etc. I like tech. I like to read about it, think about it, and exchange insights. I'm not really interested in arguing with someone who has a strong partisan opinion or who cares more about winning arguments than learning stuff. If you want to know why I said that about their PCIe bandwidth, I can explain it. But not if you're going to be so snarky and defensive.
I have substantiated my facts now using logic and/or citations. At the moment, you feared or wore uncertain... then defended your doubts that the bottleneck was in the chip or in Ryzen and I've explained. I also added to that by taking a moment to explain why the RAM speeds are likely different. Then that Zepplin is an APU and thus needs to dedicate more onboard PCI resources and lose die to a GPU (actually I assumed you'd realize that), especially in a crossfire + HBM capable arrangement (and what else is the point of putting APUs in a 2 socket config). The thing we forget here is that the nature of a web over a ring is that you actually have access to MASSIVE bandwidth but with linearly increasing latency (as data has to take longer paths around saturated buses) thus max bandwidth available scales up as more bandwidth is needed by any one core. Every CCX has direct access to the IMC as well so as the clocks increase through the Infinity fabric (as a byproduct of RAM speed increases) it becomes logical for the latency times to shorten and thus more routes between cores to become viable over RAM, especially remembering that in real terms ( as in adjusting for CAS etc increases)
RAM latency is on the increase and that will only get worse with DDR5 unless NVRAM shocks us.
Oh and your anandtech link qualifies the CCX lanes details.