Review AMD Threadripper Pro 3995WX Review: Ripping With 8 Memory Channels

Status
Not open for further replies.

CerianK

Distinguished
Nov 7, 2008
263
51
18,870
Probably a pointless question, but I assume the 16GB are dual-rank... I would be curious how 16GB single-rank (which I understand exist, but are the minority in the market) modules would perform in the 128GB configuration? Probably no difference, but might be worth exploring with a few select benchmarks, if possible.
 
Jan 17, 2021
1
0
10
hate to be that guy but, it's not actually the first PCIe 4.0 capable workstation on the market, that honor goes to the Talos II Secure Workstation
 

fellow

Distinguished
Feb 8, 2012
3
0
18,510
I love these, especially the 12-16 cores at 4GHz, much closer to 5900 and 5950 for lightly threaded workloads. Great solution for those wanting expandable server and workstation features.

I like the look of those Raptors too, especially the pci-4 and memory bandwidth. May get a Blackbird for testing and open source (mostly) fast hardware. See Phoenix coverage Part 2— the first were not as promising.

For Threadripper Pro, has there been any information about the socket and CPU upgrade path?

My main concern is the upcoming release of Zen3 Threadrippers. I imagine there will then be a Zen3 Threadripper Pro in a couple quarters or a year from now. The memory and pci expansion makes this an excellent platform for future growth.

Since AMD has been forward looking by using the same socket for Ryzen, is it safe to expect the Zen3 TRPro will be accepted in this new socket?

Gracias,

fellow
 

Endymio

Reputable
BANNED
Aug 3, 2020
715
258
5,270
... the most powerful workstation chip on the market - it's 64 cores easily outweigh Intel's
Emergency edit on aisle four, please.

Also, do I misunderstand the article, or has Toms yet again pronounced a verdict on a product they as yet haven't seen, or has even been released?
 
D

Deleted member 14196

Guest
Intel has nothing to touch the thread ripper so there’s nothing wrong with that statement
 

hitchhiker0

Commendable
May 13, 2020
5
2
1,510
Fantastic! I like them very much.
Picking a Threadripper Pro 3975WX, 128 GB RAM, some SSD, some NVidia GPU and make a virtual desktop infrastructure for computer-aided designing.
You can host 4-6 virtual desktops quickly.
 

Stefan Dyulgerov

Honorable
Nov 2, 2014
1
0
10,510
Hey in your benchmarks, can you include compilation of the Unreal Engine editor?
The engine is quite taxing on the cpu both c++ and the shaders.
Most people that are alone struggle with it. If you work in studio you can share cores, but at home alone:)
 

mikewinddale

Distinguished
Dec 22, 2016
291
55
18,940
Nice review, thanks.

But I just discovered something interesting that you missed in the review:

If you install six (6) dimms, applications like AIDA64, CPU-Z, etc. will recognize it as "hexa" channel, but benchmarks will reveal that the actual memory throughput is equivalent to merely dual-channel.

So you can populate four or eight DIMMs, but be careful with six.

For my application, I started a 3955WX with 4x64 GB RAM. I discovered that wasn't enough, so I upgraded to 6x64. My application now had enough RAM, but performance declined. So I had to upgrade to 8x64.
 
Oct 21, 2021
1
1
15
@mikewinddale In my testing it is even worse than sticking to 4 or 8 populated channels. Anything less than all 8 channels has a significant impact on performance. The hardware setup for these tests was: 3995wx, ASUS Pro Sage, 256GB 3200MHz, writing to 4 x Samsung 980 Pro in RAID-0. Interesting is that while throughput dropped, meaning that technically the system is doing less work, the CPU utilization increased when all 8 memory channels weren't populated. I do wonder if the different channel-to-chiplet affinity between your 16-core and my 64-core model is responsible for why you don't see as big of a hit as I do with only 4 channels populated.


138338822-5c4f591a-ad31-4891-891b-1227ce2d73ba.png
 
  • Like
Reactions: mikewinddale

Hale_JP

Prominent
Oct 3, 2022
6
0
510
"Cons:Some workloads don't benefit from 128 threads"
What a hypocricy! That's why its 10~15% faster in single-core than the fastest Xeon on the market. Btw, AVX512 seems slower when working at half clocks. But (1) it is not 4 times slower, but just around 25% at worst, because it is only the calculation with is 2-ticks, not increasing the overhead. And (2)Intel in fact reduces clocks by these 20-25% when executing AVX512 FOR THE WHOLE CPU. AMD does not do that, leaving ALUs at the top speed.
 
Last edited:

Hale_JP

Prominent
Oct 3, 2022
6
0
510
@mikewinddale In my testing it is even worse than sticking to 4 or 8 populated channels. Anything less than all 8 channels has a significant impact on performance. The hardware setup for these tests was: 3995wx, ASUS Pro Sage, 256GB 3200MHz, writing to 4 x Samsung 980 Pro in RAID-0. Interesting is that while throughput dropped, meaning that technically the system is doing less work, the CPU utilization increased when all 8 memory channels weren't populated. I do wonder if the different channel-to-chiplet affinity between your 16-core and my 64-core model is responsible for why you don't see as big of a hit as I do with only 4 channels populated.


138338822-5c4f591a-ad31-4891-891b-1227ce2d73ba.png
that is because of interleaved (concatenated) burst transfers to the central memory hub. Seems like the concatenation may be only in 2, 4, or 8 words. Adding 2 more words to polulated four, obviously, limits the performance with 2 word transfers. It is different with Intel which have 2, or 4 memory controllers, 2-channels per tile, which of course may work independantly slightly increasing distributed performance in 4-6 populated channels modes. But will never reach the same performance at 8-channels because of inter-tile UPI(new QPI) penaties. It simply cannot perform true synchronous 8-channel bursts.
That separates the Xeon w7 and Threadreaper CPUs. Xeons are good for servers with a lot of independent and isolated threads. Threadripper PRO is an ideal design for a workstation with multiple workers and one memory domain for the task.
 
Last edited:
Status
Not open for further replies.