derekullo :
genz :
One of the things that people reading this article would realize when they compare the DC P4800 which is the only device on the market right now, is that it is a completely viable replacement for RAM.
RAM max bandwidth is largely a factor of clockspeed. Clockspeed is useless in most applications which is why DDR 3200 vs 1866 vs 4000 shows negligible difference in most games and apps outside of server markets. Latency matters more.
This is also why a 90MB/s read SSD will perform so much better than a 110MB/s HDD as a system drive. It still takes way more time for the data to start reading from the harddrive as the disk has to spin into position.
XPoint RAM has 60x lower latency than an SSD. Below 10ms, which is 100,000hz or 0.1Mhz. If you factor in the controller as we are talking about a minimum 375GB device that is an array of 16GB microchips in 'RAID' like array, and make a standard assumption of a 1Ghz ARM cpu running the controller, you need to push the data from about 25 to 32 chips, and defrag, arrange, etc etc without ever using more than 100 CPU operations per transfer minus the clock delay of the actual Xpoint chip itself and the controller and PCI-E interface and the software.
Let's compare this to DDR4. The fastest DDR4 4000 comes in at CAS 10 or 11. That's 4000mhz which is 0.25NS times 10 which is 2.5ns, but any time you actually measure this with SIS sandra you get 10 to 30ms... about 40 to 120 times slower. Why? The memory controller is on the Northbridge and isn't part of the equation as it's latency isn't counted in the math for RAM, but is counted in the math of XPoint as the "memory" controller on that is on teh PCI card. There are no Xpoint memory controllers in CPUs today, so we are looking at the slowest element being completely ignored from the DDR side of the race.
I would put money down that once the 'controller' becomes an actual XPoint slot and IMC on the board, that will go down to DDR4 levels of latency. I would put more money down that at that point, all that swap memory having latency that isn't your SSD + your RAM will mean that your actual user experience will be far better without extra bandwidth. Why? Look at the first 3.5inch SSDs and how they kicked HDDs butt with almost the same read write performance.
The phrase you are looking for is IOPs.
A typical hard drive is between 60 and 100 IOPs depending on queue depth.
http://www.storagereview.com/wd_black_4tb_review_wd4001faex
A high end ssd such as the Samsung 850 Pro has around 10000 iops at a queue depth of 1 with a read speed of about 500 megabytes a second.
http://www.storagereview.com/samsung_ssd_850_pro_review
So even though the drive itself in total bandwidth is only about 5 times faster ... 100 megabytes a second versus around 500 megabytes a second, the responsiveness which is directly related to the total IOPs is over 1000 times faster, 60-100 IOPS versus 10000 IOPs.
IOPs is highly similar to the latency, but IOPs is much easier for some people to understand than saying 1 instruction every 10 milliseconds versus 10000 IOPs.
Almost. IOPs is constant latency, so it factors sustained read/writes mainly. A second is a MASSIVE amount of time in CPU land (3 billion clocks a second in a 3Ghz CPU vs 9 clock latency RAM). It's useful in storage but I was trying to address input latency for the purposes of RAM, as SSD input latency is rather irrelevant as it's so far from the CPU cache level wise.
Lets say I'm talking about CAS latency, or the amount of time it takes from the moment the CPU requests a single bit to the moment the storage device has retrieved that bit from the storage and put it on the end of the output pins. XPoint CAS is probably lower than even DDR but it's max bandwidth or IOPs is not. Queue depths be damned because as I mentioned the latency of XPoint right now is after a controller is bolted to DMI or the IMC, doubling or tripling the figure from what it would be if it had a DDR style parallel connection to the CPU IMC.
Your CPU mostly asks for very small bits of data sporadically. If it asks for 10KB, then waits and gets it, then calculates, then asks for 10KB more based on the result it has, the amount of time it takes for those two messages to get into the RAM and the RAM to give that data back are as important as how fast the RAM is at getting data repeatedly, and even with the fastest DDR4 RAM. IOPs is like who can run the furthest in 1 second, latency is who moves first when the gun goes off. For most purposes RAM is tested without the impact of the controller, because the controller is either in your CPU or north-bridge and that depends on your model.
This is why GDDR doesn't really work as a PC main RAM system. It's tremendously fast and 2 way so can input and output quickly, but it's extremely latent (for the same $$$) and thus all those small requests a modern OS constantly makes to the CPU slow the CPU down as it spends more time waiting. This is the original reason why SMT works well in computers: the CPU waits for RAM so much, but if both threads are waiting, your CPU has to sit and count it's thumbs doing nothing.
XPoint has significantly less latency than everything it seems. Even if it's several times slower than DDR4, if it has lower latency then it will actually perform much faster to the user, like an early SSD that was slower than HDDs of the same era but gave faster and better user experience due to no spin-up times.
Anyone with a DDR2 computer still running Windows knows that RAM bandwidth is mostly a marketing tool. Especially if they are one of the many that bought SSDs.