Intel Demos 3D XPoint Optane Memory DIMMs, Cascade Lake Xeons Come In 2018

Status
Not open for further replies.

derekullo

Distinguished
The end goal was to have 3d Xpoint as the replacement for both ram and hard drives/ssd.

But as the article mentions, the endurance of 3d Xpoint isn't up to par versus our normal ddr4 ram.

For example, you may read and write a combined 100 gigabytes a day to ram loading windows, playing games and maybe some work stuff.

But you may only read 30 gigabytes and write 10 gigabytes from your ssd/hard drive.


Writes are where the endurance factor is applicable.
 

SockPuppet

Distinguished
Aug 14, 2006
257
2
18,785
0
"The end goal was to have 3d Xpoint as the replacement for both ram and hard drives/ssd."

No, it isn't. You've made that up yourself.

"But as the article mentions, the endurance of 3d Xpoint isn't up to par versus our normal ddr4 ram."

No, it didn't. You've made that up yourself.


"For example, you may read and write a combined 100 gigabytes a day to ram loading windows, playing games and maybe some work stuff."

So what?

"But you may only read 30 gigabytes and write 10 gigabytes from your ssd/hard drive."

Good thing Xpoint isn't an SSD or Hard drive, then.


"Writes are where the endurance factor is applicable."

Again, so what? We've established the "endurance factor" came directly from your anus.
 

urbanj

Honorable
Dec 27, 2012
121
0
10,710
10



Please don't misquote material.
The article had its own assumptions in it, which it seems you are taking as fact.
 

DerekA_C

Prominent
Mar 1, 2017
177
0
690
1
I don't see how it is going to be more affordable then dd4 is anything Intel sells cheap? I mean really is there anything that Intel gives an actual deal on? shoot they still charge over $110 for a dual core pile of crap with no hyper threading. and as they sell more of any one unit they give a sale for a day or two that is no more then $15 whoopy freaking doody price locking and strong holding the market has always been Intel's strategy they just find more loopholes and ways of doing it until they get caught and sued again.
 

derekullo

Distinguished


At first I was going to quote where you were wrong, but then i realized you have 212 trolls/responses and no solutions.

My anus is still more helpful than you.
 

manleysteele

Reputable
Jun 21, 2015
286
0
4,810
14
This is the usage case presented by Intel. It is not the only usage case that can be imagined, nor is the best that can be imagined.

I'm not saying that it won't be useful for the people who need to extend their memory space far beyond what is possible or affordable using ram. It obviously would be very useful in this problem domain.

What I want is 1-2 TBytes of bootable storage sitting on the memory bus. I don't know if or when I'll be able to buy that, but unless someone smarter than me, (Not that steep a hill to climb when it comes to this stuff), comes up with a better usage case that I don't even know I want yet, that's what I'm waiting for.
 

genz

Distinguished
Oct 8, 2012
630
1
19,165
69
One of the things that people reading this article would realize when they compare the DC P4800 which is the only device on the market right now, is that it is a completely viable replacement for RAM.

RAM max bandwidth is largely a factor of clockspeed. Clockspeed is useless in most applications which is why DDR 3200 vs 1866 vs 4000 shows negligible difference in most games and apps outside of server markets. Latency matters more.

This is also why a 90MB/s read SSD will perform so much better than a 110MB/s HDD as a system drive. It still takes way more time for the data to start reading from the harddrive as the disk has to spin into position.

XPoint RAM has 60x lower latency than an SSD. Below 10ms, which is 100,000hz or 0.1Mhz. If you factor in the controller as we are talking about a minimum 375GB device that is an array of 16GB microchips in 'RAID' like array, and make a standard assumption of a 1Ghz ARM cpu running the controller, you need to push the data from about 25 to 32 chips, and defrag, arrange, etc etc without ever using more than 100 CPU operations per transfer minus the clock delay of the actual Xpoint chip itself and the controller and PCI-E interface and the software.

Let's compare this to DDR4. The fastest DDR4 4000 comes in at CAS 10 or 11. That's 4000mhz which is 0.25NS times 10 which is 2.5ns, but any time you actually measure this with SIS sandra you get 10 to 30ms... about 40 to 120 times slower. Why? The memory controller is on the Northbridge and isn't part of the equation as it's latency isn't counted in the math for RAM, but is counted in the math of XPoint as the "memory" controller on that is on teh PCI card. There are no Xpoint memory controllers in CPUs today, so we are looking at the slowest element being completely ignored from the DDR side of the race.

I would put money down that once the 'controller' becomes an actual XPoint slot and IMC on the board, that will go down to DDR4 levels of latency. I would put more money down that at that point, all that swap memory having latency that isn't your SSD + your RAM will mean that your actual user experience will be far better without extra bandwidth. Why? Look at the first 3.5inch SSDs and how they kicked HDDs butt with almost the same read write performance.
 

derekullo

Distinguished


The phrase you are looking for is IOPs.

A typical hard drive is between 60 and 100 IOPs depending on queue depth.
http://www.storagereview.com/wd_black_4tb_review_wd4001faex

A high end ssd such as the Samsung 850 Pro has around 10000 iops at a queue depth of 1 with a read speed of about 500 megabytes a second.
http://www.storagereview.com/samsung_ssd_850_pro_review

So even though the drive itself in total bandwidth is only about 5 times faster ... 100 megabytes a second versus around 500 megabytes a second, the responsiveness which is directly related to the total IOPs is over 1000 times faster, 60-100 IOPS versus 10000 IOPs.

IOPs is highly similar to the latency, but IOPs is much easier for some people to understand than saying 1 instruction every 10 milliseconds versus 10000 IOPs.
 

alextheblue

Distinguished
Apr 3, 2001
3,078
106
20,970
2

Who says the goal of 3D Xpoint is to replace RAM? Supplement, integrate into addressable RAM pool, yep. But replace it? Who says that they're aiming for (snort) "RAM endurance"? They're aiming to bridge the gap between RAM and NAND. Maybe there's a future NV memory that can replace both, but Xpoint is not it, and never was. All the slides clearly show RAM is still faster (but less dense) and therefore not in danger of being replaced by this technology.

Also, random out-the-tailpipe figures aside, endurance only needs to reach "good enough" levels for the workloads targeted, not near-infinite levels. If they achieve even 1/10th of their long-term goal, that's enough for today. Especially since it does not replace RAM, and thus the heavy write workloads and performance sensitive programs should and can still be resident in the DDR of such a system whenever possible. I mean they already use NAND in much the same fashion.
 

SockPuppet

Distinguished
Aug 14, 2006
257
2
18,785
0


Considering you don't know your anus from a hole in the ground, that would be most surprising.

 

genz

Distinguished
Oct 8, 2012
630
1
19,165
69


Almost. IOPs is constant latency, so it factors sustained read/writes mainly. A second is a MASSIVE amount of time in CPU land (3 billion clocks a second in a 3Ghz CPU vs 9 clock latency RAM). It's useful in storage but I was trying to address input latency for the purposes of RAM, as SSD input latency is rather irrelevant as it's so far from the CPU cache level wise.

Lets say I'm talking about CAS latency, or the amount of time it takes from the moment the CPU requests a single bit to the moment the storage device has retrieved that bit from the storage and put it on the end of the output pins. XPoint CAS is probably lower than even DDR but it's max bandwidth or IOPs is not. Queue depths be damned because as I mentioned the latency of XPoint right now is after a controller is bolted to DMI or the IMC, doubling or tripling the figure from what it would be if it had a DDR style parallel connection to the CPU IMC.

Your CPU mostly asks for very small bits of data sporadically. If it asks for 10KB, then waits and gets it, then calculates, then asks for 10KB more based on the result it has, the amount of time it takes for those two messages to get into the RAM and the RAM to give that data back are as important as how fast the RAM is at getting data repeatedly, and even with the fastest DDR4 RAM. IOPs is like who can run the furthest in 1 second, latency is who moves first when the gun goes off. For most purposes RAM is tested without the impact of the controller, because the controller is either in your CPU or north-bridge and that depends on your model.

This is why GDDR doesn't really work as a PC main RAM system. It's tremendously fast and 2 way so can input and output quickly, but it's extremely latent (for the same $$$) and thus all those small requests a modern OS constantly makes to the CPU slow the CPU down as it spends more time waiting. This is the original reason why SMT works well in computers: the CPU waits for RAM so much, but if both threads are waiting, your CPU has to sit and count it's thumbs doing nothing.

XPoint has significantly less latency than everything it seems. Even if it's several times slower than DDR4, if it has lower latency then it will actually perform much faster to the user, like an early SSD that was slower than HDDs of the same era but gave faster and better user experience due to no spin-up times.
Anyone with a DDR2 computer still running Windows knows that RAM bandwidth is mostly a marketing tool. Especially if they are one of the many that bought SSDs.
 
Status
Not open for further replies.

ASK THE COMMUNITY

TRENDING THREADS