Intel and Lenovo team up to advance laptops hardware and software in Shanghai.
Intel and Lenovo Develop Future of PCs in Shanghai : Read more
Intel and Lenovo Develop Future of PCs in Shanghai : Read more
Intel is out of the storage business! There's no way they would do a 180, this soon after completely divesting from it.I wish Intel would allocate much more resources to the development of low latency, low power (High Bandwith Memory (HBM)) Non-Volatile Memory (NVM) VG-SOT-MRAM (or VCMA MRAM) of at least 64GB/128GB.
Ideally, they should find a way of doing so re-using as many 3D NAND flash manufacturing tools as possible to lower manufacturing costs.
This would be REALLY disruptive for all computing devices, especially IoT devices to finally usher the era of low-power « Normally-Off Computing ».
Intel is out of the storage business! There's no way they would do a 180, this soon after completely divesting from it.
As for HBM, you don't need that for such power-constrained IoT devices.
Training big models requires not just a lot of fast memory, but also a lot of compute, and that takes power.Regarding HBM in a mobile device, if the power consumption was low enough (using VCMA MRAM), then I am sure new use cases could emerge (maybe more on-device machine learning training) :
Training big models requires not just a lot of fast memory, but also a lot of compute, and that takes power.
Let's distinguish between the IoT cases where you want persistent memory for power-optimization purposes vs. mobile devices like phones. I can see why a phone would want HBM, since it offers the most bandwidth per Watt of any DRAM technology. Probably the main reason we don't already have it is cost. Maybe Apple will lead the way, here.I don’t know exactly what would be the new opportunities, but I am confident that some new (maybe yet unforeseen) ones would emerge with HBM available at scale on a mobile devices…
Let's distinguish between the IoT cases where you want persistent memory for power-optimization purposes vs. mobile devices like phones. I can see why a phone would want HBM, since it offers the most bandwidth per Watt of any DRAM technology. Probably the main reason we don't already have it is cost. Maybe Apple will lead the way, here.
The only source I have on it is this:What I am wondering is, as of 2023, what the the cost difference between 16GB LPDDR5x memory versus 16GB HBM2E or 16GB HBM3 memory ?
The only source I have on it is this:
"Compared to an eight-channel DDR5 design, the NVIDIA Grace CPU LPDDR5X memory subsystem provides up to 53% more bandwidth at one-eighth the power per gigabyte per second while being similar in cost. An HBM2e memory subsystem would have provided substantial memory bandwidth and good energy efficiency but at more than 3x the cost-per-gigabyte and only one-eighth the maximum capacity available with LPDDR5X."
If you want better than that, you could try doing your own "web research". Let us know if you find any good info.
Yeah, as you said yourself this isn't going to fly on a consumer market, but they are doing what you are saying on datacenter CPUs.I wish Intel would allocate much more resources to the development of low latency, low power (High Bandwith Memory (HBM)) Non-Volatile Memory (NVM) VG-SOT-MRAM (or VCMA MRAM) of at least 64GB/128GB.
You missed the part about "Non-Volatile Memory (NVM)". @Diogene7 was talking about an HBM-like, die-stacked version of Optane.Yeah, as you said yourself this isn't going to fly on a consumer market, but they are doing what you are saying on datacenter CPUs.
They own the optane IP, they have done optane, they have done this.You missed the part about "Non-Volatile Memory (NVM)". @Diogene7 was talking about an HBM-like, die-stacked version of Optane.
Not with a HBM-like stack. Their Optane DIMMs aren't even as fast as regular DDR4 DIMMs.They own the optane IP, they have done optane, they have done this.
No Optane fabs, though. Most of the people who knew anything about the design and manufacturing of Optane are probably now gone, as well.I'm not saying that it will happen but I'm saying that they have all the parts they would need to do it.
Doesn't matter, though. There's no usecase for HBM Optane. @Diogene7 was starting with a Frankensteinian mashup of technologies, and then trying to find a problem it could solve.
Depends a lot on how often they're triggered and how much work they do, when they are. Mobile devices, like a phone or watch, also spend most of their time asleep and processing asynchronous events. You could study how they optimize battery life. It does involve things like shutting down cores and cache slices, as well as throttling back clocks all the way out to DRAM.it could drastically lower energy consumption of any IT systems when idle, especially the ones that react when sensors are triggered.
Cache is probably the worst thing to use it for, since cache is very latency-sensitive and has a very high turnover. You might find that the read/write power of your MRAM makes it less efficient to use for cache than SRAM that's simply flushed out and powered down when not needed.It also happens that VG-SOT-MRAM (from European research center IMEC) take smaller die area than an equivalent size SRAM cache, and so I would think it would first be manufactured for replacing part (all ?) SRAM cache in CPU/GPU/…
People talk about "computing in memory", meaning they indeed want to mix memory and computational elements to avoid wasteful & expensive data shuffling. This is very relevant to AI. The first problem is how to get the computational energy of AI down low enough that loading weights from PMEM to DRAM even matters. Then, if the inferences are sufficiently infrequent that holding them in DRAM uses too much power, perhaps you'd have a case.for AI inferencing (ex: Google TPU), the model parameters needs to be loaded in the (HBM) memory, and from there, I would think that a Non-Volatile (HBM) Memory would make technical sense, in order to continuously keep the parameters in the memory (without any power consumed to keep the memory on like DRAM), instead of spending energy to shuffle the parameters from storage to memory…
Depends a lot on how often they're triggered and how much work they do, when they are. Mobile devices, like a phone or watch, also spend most of their time asleep and processing asynchronous events. You could study how they optimize battery life. It does involve things like shutting down cores and cache slices, as well as throttling back clocks all the way out to DRAM.
Cache is probably the worst thing to use it for, since cache is very latency-sensitive and has a very high turnover. You might find that the read/write power of your MRAM makes it less efficient to use for cache than SRAM that's simply flushed out and powered down when not needed.
People talk about "computing in memory", meaning they indeed want to mix memory and computational elements to avoid wasteful & expensive data shuffling. This is very relevant to AI. The first problem is how to get the computational energy of AI down low enough that loading weights from PMEM to DRAM even matters. Then, if the inferences are sufficiently infrequent that holding them in DRAM uses too much power, perhaps you'd have a case.
A negative point about Optane is that I think its active power (i.e. read/write) is pretty high, compared to NAND. According to this, an Optane DIMM uses 12 - 18 W:
Report: Optane DIMMs Provide Only Modest Performance Improvements
Intel's Optane DIMMs may not have as much upside as the company is hoping. Performance benefits and gains are modest for now.www.extremetech.com
The point of cache is to be low-latency and high-bandwidth. If MRAM can't do that, then it's a nonstarter.From there, you would shut down cores and the SRAM cache, but the great advantage is that all the information would stay in cache : just provide power again, and the core is ready to work : no need to shuffle/retrieve data from memory to bring it again in cache.
The point of cache is to be low-latency and high-bandwidth. If MRAM can't do that, then it's a nonstarter.
As I said, SRAM cache can be powered down, when utilization is low. When utilization is high, it's actually more energy-efficient than going straight to DRAM.
I wonder whether the limiting factor is latency or bandwidth.It seems that the European research center IMEC has demonstrated that VG-SOT-MRAM is fast enough to be used to replace (at least some part of) the cache (L3),
If it has enough endurance, that's not a problem. It can simply be refreshed like DRAM.However, I don’t recall if the article speaks about retention time though (days ? weeks ? months ? years ?)
That's good, but a hard-requirement is probably something more like 5 years. It's basically whatever maximum time you want the thing to be able to sit, completely powered-down with no battery or anything. In the case of automotive, we have to consider individual components sitting in the supply chain and parts warehouses + distribution.NXP Semiconductors already announced that they are planning to use MRAM with 20 years retention time in 2025 in an automotive MCU
I think that in the IMEC article, it states that the VG-SOT-MRAM works fine with a latency as low as ~ 300ps (so ~3.3Ghz speed).I wonder whether the limiting factor is latency or bandwidth.
I agree but the less often you have to refresh it, the less energy it consumes when idle.If it has enough endurance, that's not a problem. It can simply be refreshed like DRAM.
That's good, but a hard-requirement is probably something more like 5 years. It's basically whatever maximum time you want the thing to be able to sit, completely powered-down with no battery or anything. In the case of automotive, we have to consider individual components sitting in the supply chain and parts warehouses + distribution.
Beware of trivial analysis. The devil is usually in the details.I think that in the IMEC article, it states that the VG-SOT-MRAM works fine with a latency as low as ~ 300ps (so ~3.3Ghz speed).
I would therefore think it could be fast enough for all type of cache in most mobile phones application processor (and surely most IoT devices) that have a speed in the ~3Ghz, but may be suitable for only some cache levels in desktop/server processors that could reach 5Ghz…
It's not the time between triggers that matters, but the time between having enough power to complete a refresh cycle.In industrial environments, it may be that some sensors are triggered only once every 10 years / 20 years or more (ex: airbags in a car,…)
If your solution is a poor fit or uncompetitive with current approaches, it doesn't matter how big the market is. You need to solve the problems, before deciding you have a solution that truly fits a problem. To supplant DRAM, for AI applications, you've got to be certain the density, cost, and active power are all roughly the same or better.The major issue is cost, and that’s why I think it would be better to first try to address a not too cost sensitive, and also high revenue, high growth market, in order to generate a bootload of cashflow that could then be invested to improve MRAM/spintronics manufacturing tools to accelerate cost reduction.
As AI datacenter servers for at least 2025 - 2030 seems to fit that description,
The article from the European research center IMEC (which is likely the world’s most important research center for developing semiconductor manufacturing tools and processes) seems to indicate that VG-SOT-MRAM has all the requirements to replace (at least some level) cache (at least L3 and maybe further down).Beware of trivial analysis. The devil is usually in the details.
In this case, cache also needs to implement CAM, for tag lookups. That multiplies the amount of reads you have to do per-access, as well as stacking another set of read latencies. So, perhaps what you should be worried about is approximating the raw latency of SRAM.
In the end, replacing SRAM is probably out of reach for MRAM. Maybe you'll have better luck with DRAM.
It's not the time between triggers that matters, but the time between having enough power to complete a refresh cycle.
20 years would probably be long enough for something not to require self-refresh capability, but this would have to apply to the entire non-operating temperature range.
If your solution is a poor fit or uncompetitive with current approaches, it doesn't matter how big the market is. You need to solve the problems, before deciding you have a solution that truly fits a problem. To supplant DRAM, for AI applications, you've got to be certain the density, cost, and active power are all roughly the same or better.