News TeamGroup Aims to Push DDR5 Kits Above 9,000 MT/S With Signal-Boosting Tech

DavidLejdar

Prominent
Sep 11, 2022
240
140
760
Nice. What the actual latency of it is, that is a bit different matter though. E.g. in case of DDR4, 3600 kits exist, which have a lower latency than DDR4-4800. And in case of the DDR5-6000 I have here, that's a lot of transfer capability even with a Gen4 SSD and without DirectStorage v1.1. So at least for me, the final latency question would be more of a selling point to eventually upgrade.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
Now if we can only get IBM's OMI to be accepted and move the Memory Controller to be near or directly next to the DIMM Sockets, then you would have less traces between the memory controller and the DIMM slot and a nice fast Serial Connection between the Memory Controller and the CPU.
 

bit_user

Polypheme
Ambassador
Now if we can only get IBM's OMI to be accepted
I think CXL.mem has basically killed off any chance of that happening.

move the Memory Controller to be near or directly next to the DIMM Sockets, then you would have less traces between the memory controller and the DIMM slot and a nice fast Serial Connection between the Memory Controller and the CPU.
This will probably come at the expense of latency, power, and bandwidth. The main selling point would be scalability, which you especially get with a switch fabric, but that would come at the expense of yet more latency and power.

I think the current approach of integrating memory controllers directly into the CPU is pretty much optimal for client-oriented CPUs.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
I think CXL.mem has basically killed off any chance of that happening.
CXL.mem is a seperate thing from OMI, they're related, but not quite covering the same Domain.

This will probably come at the expense of latency, power, and bandwidth. The main selling point would be scalability, which you especially get with a switch fabric, but that would come at the expense of yet more latency and power.

I think the current approach of integrating memory controllers directly into the CPU is pretty much optimal for client-oriented CPUs.
Microchip already designed a controller for OMI that only adds on 4 ns of Latency to the Serial link.
So the Latency part isn't that big of a deal, 4 ns compared to existing DIMM latency of 50-100 ns is manageable.

As for power, it was within the same power envelope of DDR4/5, only a bit better since you have less parallel traces running super long paths and more Serial Traces going from the Memory Controller to the CPU.

Also Bandwidth went UP, not down. You would get more Channels and for less traces once you routed everything through the serial connection linking the Memory Controller to the CPU. Or you can save on traces from the CPU Package by having the same amount of Channels and using the extra contacts for more PCIe lanes or other connections.

That gives your CPU designer more package flexibility.

Also, since the Memory Controller is detached, you can probably pull the same Infinity Cache trick by stacking SRAM on top of the Memory Controller.
Something AMD has already done on the Graphics Side, but can do again on the CPU side which can actually help since it can pre-cache certain Reads/Writes and have MUCH lower latency when referencing certain Cache Lines.

And given that TSMC & AMD love shoving SRAM on top, it offers the ability to truly lower the latency bridge from RAM to CPU by taking care of the most important problem.

Once the request is given, if the data is already in the SRAM on top of the Memory Controller, the data gets fed back MUCH faster.

SRAM latency is in the 9-25 ns range depending on size of the cache and how far it is compared to the 50-100 ns range that it is for RAM.
 

bit_user

Polypheme
Ambassador
CXL.mem is a seperate thing from OMI, they're related, but not quite covering the same Domain.
Did you not hear that development of OpenCAPI has been discontinued? Who is going to adopt a dead-end standard?

Microchip already designed a controller for OMI that only adds on 4 ns of Latency to the Serial link.
Even so, the serial link will also add latency.

As for power, it was within the same power envelope of DDR4/5, only a bit better since you have less parallel traces running super long paths and more Serial Traces going from the Memory Controller to the CPU.
More traces at a lower frequency is actually more efficient. That's part of HBM's secret.

Also Bandwidth went UP, not down.
This is incorrect. Why do you think DDR5 uses a parallel bus?

That gives your CPU designer more package flexibility.
The only reason anyone would ever use OpenCAPI or CXL is for scalability. So, you won't find them in a client CPU, unless it's a high-end client with in-package DRAM and it just uses CXL as a way to add extra capacity.

Also, since the Memory Controller is detached, you can probably pull the same Infinity Cache trick by stacking SRAM on top of the Memory Controller.
You'd want your cache inside the CPU package, for it really to do much good. The overhead of CXL or OMI could be multiple times the latency you normally get with L3 cache.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
Did you not hear that development of OpenCAPI has been discontinued? Who is going to adopt a dead-end standard?
Yes, I have, that means they're being absorbed into CXL and becoming part of the standard and all technology gets moved under the CXL umbrella.
OMI is part of that.

A Statement from the OpenCAPI Consortium Leadership
As announced on August 1, 2022 at the Flash Memory Summit, the OpenCAPITM Consortium (OCC) and Compute Express LinkTM (CXL) Consortium entered an agreement, which if approved and agreed upon by all parties, would transfer the OpenCAPI and Open Memory Interface (OMI) specifications and other OCC assets to the CXL Consortium.


The members of both consortiums agreed that the transfer would take place September 15, 2022. Upon completion of the asset transfer, OCC will finalize operations and dissolve. OCC member companies in good standing will be contacted with details of their specific membership benefits in CXL.


The OCC leadership extends its gratitude to its members and supporters for six years of effort producing the specifications for a cache-coherent interconnect for processors, memory expansion, accelerators and for a serial attached near memory interface providing for low latency and high bandwidth connections to main memory (OMI).


We are excited to witness the industry coming together around one organization to drive open innovation. We expect this will yield the finest business results for the industry and for the members of the consortia.


Bob Szabo, OpenCAPI Consortium President
It's not a "Dead End" Standard when it's part of CXL.



Even so, the serial link will also add latency.
Yes, I know, that extra SerDes process inherently adds latency because it's another step.
But I've already factored that in and I know how long it will take and how that compares to what currently is out there.


More traces at a lower frequency is actually more efficient. That's part of HBM's secret.
It also makes HBM absurdly expensive, all those extra traces makes HBM "SUPER EXPENSIVE" to implement.
Less Traces = Cheaper for the masses.
And DIMM's have only gone up in Traces over time, not gone down, I forsee future DDR specs and DIMM's to go up in the # of traces, not go down.


This is incorrect. Why do you think DDR5 uses a parallel bus?
Because it's part of Memory Standards Legacy from the early days of RAM & DIMM.
Have you noticed that DDR 1-5 has used the "Exact Same" physical width for the PCB and similar type of interface.
But the # of Contact Pins have only gone up.

Everything else in computing has gone "Serial over Parallel" over time:
  • We went from Parallel Ports & various other proprietary Parallel connections -> USB & ThunderBolt.
  • We went from ISA & PCI -> PCIe.
  • We went from IDE & SCSI -> SATA & SAS
  • Nearly every aspect of modern PC / Computing has kicked the Parallel Connections to the curb and gone with a "Serial Connection" to replace it.


The only reason anyone would ever use OpenCAPI or CXL is for scalability. So, you won't find them in a client CPU, unless it's a high-end client with in-package DRAM and it just uses CXL as a way to add extra capacity.
Or DDR6 and it's updated DIMM standard calls for even more Contact PINs which exacerbates the issue.



You'd want your cache inside the CPU package, for it really to do much good. The overhead of CXL or OMI could be multiple times the latency you normally get with L3 cache.
L3$ isn't going anywhere. L3$ sits before the Memory Controller Step.
CwgBuZV.jpg
The SRAM Cache layer I want to add is on top of the Memory Controller, similar to how AMD has done it with RDNA3. That will also help alleviate Latency & Bandwidth.

They already tested the OMI Latency.
https://www.electronicdesign.com/te...icle/21808373/attacking-the-memory-bottleneck
Using the OMI approach brings advantages with it, such as higher bandwidth and lower pin counts. Normally, load/store operations are queued by the memory controller within the processor. In this case, the memory controller is integrated within the SMC 1000 8x25G. Microchip’s product has innovated in the area of device latency, so that the difference in latency between the older parallel DDR interface and this newer OMI serial interface is under 4 ns when compared to LRDIMM latency
VZhqgnr.png
That "< 4 ns Latency Penalty" is well worth increasing the bandwidth by multiple folds when you plan on keeping the same # of Contacts on the platform.

IBM engineers wouldn't waste their time if they weren't trying to solve a real problem, and this is the next major step.
Serializing Main Memory!
 
Last edited:

bit_user

Polypheme
Ambassador
Yes, I have, that means they're being absorbed into CXL and becoming part of the standard and all technology gets moved under the CXL umbrella.
OMI is part of that.


It's not a "Dead End" Standard when it's part of CXL.
The Anandtech article explains that further development on OpenCAPI & OMI was ended. Administration of the existing standard is being managed by the CLX consortium and they inherit all of the IP.

So, yes it's finished. There will be no new versions of OMI. And it's not part of CXL in any sense other than CXL is free to use their IP. It's not as if there will be any CXL.mem devices that support OMI, or anything like that. OpenCAPI saw the writing on the wall and realized that CXL had all of the industry buy-in. So, as the saying goes: "if you can't beat 'em, join 'em."

It also makes HBM absurdly expensive, all those extra traces makes HBM "SUPER EXPENSIVE" to implement.
I dunno. Somehow Radeon VII had 16 GB of it for $700, back in 2019. If it's in-package, then the chiplets get a lot cheaper and easier to connect.

Anyway, that wasn't my point. Rather, my point was that HBM2 runs at frequencies around 1 GHz. 1024-bits per stack. And yet, when you compare 4096-bit (4 stacks) against GDDR6 @ 384-bit, it's a net power savings for HBM! Granted, some of the power-savings is from being in-package, but a lot of it's due to simply running at a lower frequency.

Less Traces = Cheaper for the masses.
You don't think JEDEC considered costs, when they decide all the DIMM standards? They know about serial link technology - it's been around for decades. They also know its downsides and decided the best approach was to continue with parallel.

Because it's part of Memory Standards Legacy from the early days of RAM & DIMM.
DDR5/LPDDR5 was a fresh chance to rethink these assumptions. They changed quite a bit, yet decided to stick with parallel.

Everything else in computing has gone "Serial over Parallel" over time:
  • We went from Parallel Ports & various other proprietary Parallel connections -> USB & ThunderBolt.
  • We went from ISA & PCI -> PCIe.
  • We went from IDE & SCSI -> SATA & SAS
  • Nearly every aspect of modern PC / Computing has kicked the Parallel Connections to the curb and gone with a "Serial Connection" to replace it.
Again, you're not hearing the answer. It's scalability. Serial is easier and cheaper to switch. So, when you don't need the bandwidth and don't mind taking a hit on latency or power, then yes - go serial. As for the cabled standards, they went serial for further reasons that don't apply to memory modules.

L3$ isn't going anywhere. L3$ sits before the Memory Controller Step.

The SRAM Cache layer I want to add is on top of the Memory Controller, similar to how AMD has done it with RDNA3. That will also help alleviate Latency & Bandwidth.
You're missing the point, which is that L3 needs to be as close to the CPU as possible. AMD's decision to put it on a separate chiplet was worse, and only done for cost reasons. What you're proposing is way worse, by putting a big serial link in between! The latency between the CPU core and L3 should be as low as possible. Do the math.

They already tested the OMI Latency.
https://www.electronicdesign.com/te...icle/21808373/attacking-the-memory-bottleneck

That "< 4 ns Latency Penalty" is well worth increasing the bandwidth by multiple folds when you plan on keeping the same # of Contacts on the platform.
Ha! They cheated by comparing it with LRDIMMs. Client machines don't use LRDIMMs for good reasons - it adds cost and latency, and is only really needed for the sake of scalability.

But, as it regards your L3 cache idea, the problem is they're comparing end-to-end latency between LRDIMMs (and probably not a good one) vs. their solution. They claim the round-trip got longer, but they're not saying it's only 4 ns from the CPU package to their controller. They're saying they added 4 ns to that part. So, probably it was already like 6 ns (if we're being generous) and now it's 10. That's doubling the L3 latency, which is a non-starter.

And nowhere in that article did they talk about power (other than the ISA), which is the elephant in the room. That article is basically a marketing piece for Microchip's controller IC.

IBM engineers wouldn't waste their time if they weren't trying to solve a real problem, and this is the next major step.
The problem they're trying to solve is scalability. They only make servers, and scalability is a huge issues there. If this approach were generally applicable, then JEDEC would've adopted it long ago. It's not as if nobody thought about this, 'till now.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
The Anandtech article explains that further development on OpenCAPI & OMI was ended. Administration of the existing standard is being managed by the CLX consortium and they inherit all of the IP.

So, yes it's finished. There will be no new versions of OMI. And it's not part of CXL in any sense other than CXL is free to use their IP. It's not as if there will be any CXL.mem devices that support OMI, or anything like that. OpenCAPI saw the writing on the wall and realized that CXL had all of the industry buy-in. So, as the saying goes: "if you can't beat 'em, join 'em."
CXL.mem covers a COMPLETELY different problem/solution.

OMI isn't dealing with that problem at all. OMI is dealing with the scaling of extra DIMM's in PC/Server architecture.


I dunno. Somehow Radeon VII had 16 GB of it for $700, back in 2019. If it's in-package, then the chiplets get a lot cheaper and easier to connect.
And there's a DAMN good reason why HBM has been relagated to Enterprise products, it's too damn expensive, not that it doesn't perform.

Anyway, that wasn't my point. Rather, my point was that HBM2 runs at frequencies around 1 GHz. 1024-bits per stack. And yet, when you compare 4096-bit (4 stacks) against GDDR6 @ 384-bit, it's a net power savings for HBM! Granted, some of the power-savings is from being in-package, but a lot of it's due to simply running at a lower frequency.
Yes, it's a net power savings, at the cost of $$$ & fixed non-modular implementation.
Alot of it is due to running traces that are SUPER Short, compared to how far normal circuitry for DIMMs have to traverse to the Memory Controller.


You don't think JEDEC considered costs, when they decide all the DIMM standards? They know about serial link technology - it's been around for decades. They also know its downsides and decided the best approach was to continue with parallel.
It was decided ALONG time ago that DIMM's were to be dumb boards that hosted the RAM Packages.
That the Memory Controller wouldn't be on the DIMM.
At that time, the parallel connection was the standard due to simplicity and what was around.


DDR5/LPDDR5 was a fresh chance to rethink these assumptions. They changed quite a bit, yet decided to stick with parallel.
Because of Sunk Cost reasons, it's easier to use what already exists and ramp it up instead of making major changes like going Serial.


Again, you're not hearing the answer. It's scalability. Serial is easier and cheaper to switch. So, when you don't need the bandwidth and don't mind taking a hit on latency or power, then yes - go serial. As for the cabled standards, they went serial for further reasons that don't apply to memory modules.
And IBM's researchers are running into walls for scalling DIMM slots in Server / PC land.
288 contacts per DDR4/5 DIMM channel. That's only going to grow in the future.

You're missing the point, which is that L3 needs to be as close to the CPU as possible. AMD's decision to put it on a separate chiplet was worse, and only done for cost reasons. What you're proposing is way worse, by putting a big serial link in between! The latency between the CPU core and L3 should be as low as possible. Do the math.

You're missing MY POINT. L3$ isn't moving anywhere on the CCX/CCD, it's staying where it is.
The Memory Controller in the cIOD is getting moved out and replaced with a OMI interface and L4$ or SRAM is getting shoved on top of the memory controller.

Ha! They cheated by comparing it with LRDIMMs. Client machines don't use LRDIMMs for good reasons - it adds cost and latency, and is only really needed for the sake of scalability.
OMI was designed for Servers first, LRDIMMS are quite common in server infrastructure.

But, as it regards your L3 cache idea, the problem is they're comparing end-to-end latency between LRDIMMs (and probably not a good one) vs. their solution. They claim the round-trip got longer, but they're not saying it's only 4 ns from the CPU package to their controller. They're saying they added 4 ns to that part. So, probably it was already like 6 ns (if we're being generous) and now it's 10. That's doubling the L3 latency, which is a non-starter.
Again, L3$ isn't being moved or touched in any way/shape/form. You misunderstand what I'm trying to do.
The Memory Controller is getting moved out of cIOD and being MUCH closer to the DIMMs. A layer of SRAM is getting slapped on top of the Memory Controller to function as L4$.

And nowhere in that article did they talk about power (other than the ISA), which is the elephant in the room. That article is basically a marketing piece for Microchip's controller IC.
Do you think Microchip are liars?

The problem they're trying to solve is scalability. They only make servers, and scalability is a huge issues there. If this approach were generally applicable, then JEDEC would've adopted it long ago. It's not as if nobody thought about this, 'till now.
That's kind of my point, scalability with DIMMs, especially if my prediction with DDR6 DIMMs come true we're going to run into scaling issues.

And OMI has been submitted to JEDEC, it's only a matter of time before Server side sees the issue and decides on what to do.
Should they continue business as usual, or how are you going to reliably scale out Memory?

It would be easier to support more DIMMs & Higher speeds if the Memory Controller was PHYSICALLY closer to the DIMMS then farther away.

A serial link lowers the number of traces needed to feed data back to the CPU or adds more bandwidth for the same amount of traces.
Take your pick.
 

bit_user

Polypheme
Ambassador
CXL.mem covers a COMPLETELY different problem/solution.
Perhaps a superset of OMI. You neglected to state the non-overlapping set of use cases.

And there's a DAMN good reason why HBM has been relagated to Enterprise products, it's too damn expensive,
Sure, it's more expensive, but that's for a variety of reasons. Anyway, my point was about power - and that was the only reason I brought it up - because it's the most extreme example and really drives home that point!

It was decided ALONG time ago that DIMM's were to be dumb boards that hosted the RAM Packages.
Rambus came along and challenged that, but the industry said "no, thank you" and just took their DDR signalling ideas (which Rambus has been milking the patent royalties from, ever since).

Because of Sunk Cost reasons, it's easier to use what already exists and ramp it up instead of making major changes like going Serial.
The thing you keep refusing to see is that every new DDR standard represented another opportunity to revisit these decisions and they kept going with parallel for very good reasons. Serial has benefits and drawbacks. If you don't need its benefits, then it's foolish to take on its drawbacks.

I don't understand where you seem to get the idea that you're smarter than everyone else.

You're missing MY POINT. L3$ isn't moving anywhere on the CCX/CCD, it's staying where it is.
The Memory Controller in the cIOD is getting moved out and replaced with a OMI interface and L4$ or SRAM is getting shoved on top of the memory controller.
Okay, so you're keeping L3 inside the CPU package and putting L4 in the DIMMs? That sounds expensive and would have a lower hit-rate than having unified/centralized L4.

OMI was designed for Servers first, LRDIMMS are quite common in server infrastructure.
That's my point. You're trying to take a server technology and blindly apply it to client machines. It didn't happen with LRDIMMs and even CXL.mem won't be a direct substitute for existing DDR DIMMs!

Do you think Microchip are liars?
That's not what I said. I said it was like a marketing piece, because it highlighted all of the selling points and none of the drawbacks. Power being chief among them. Also, they didn't specify the timing of the LRDIMM they were comparing against, which is a bit shady.

Furthermore, they're comparing against DDR4, which made sense given when it was written. However, we should re-evaluate their speed comparisons against DDR5.

it's only a matter of time before Server side sees the issue and decides on what to do.
Should they continue business as usual, or how are you going to reliably scale out Memory?
They've seen the issue and the industry has coalesced around CXL.mem.

It would be easier to support more DIMMs & Higher speeds if the Memory Controller was PHYSICALLY closer to the DIMMS then farther away.
PCIe has shown that you hit a wall in frequency scaling, hence the need for PAM4. That makes serial interface more complex and expensive. So, even if we look beyond the power issue, it's not as if serial doesn't introduce problems of its own. It's not a magic-bullet solution.

FWIW, PCIe is addressing these issues and CXL is piggybacking off their work. So, that's the natural solution for the industry to take.

A serial link lowers the number of traces needed to feed data back to the CPU or adds more bandwidth for the same amount of traces.
Take your pick.
You act like you're the first person to see that. You should understand that the industry has been dealing with interconnecting devices for a long time. Not just CPUs and DRAM, but PCIe, multi-CPU fabrics, and even GPU fabrics.

IMO, 12-channel (though actually 24), 768-bit in a 6096-pin socket is getting pretty insane! Do you really think nobody at AMD saw the trend lines of increasing channel counts and package sizes and considered whether it made sense to switch to a serial standard like OMI? Do you think they never heard of OMI? The issue they're facing is a power-efficient way to add not just capacity but also bandwidth. Once they move to having some in-package DRAM and use memory-tiering, their external DRAM bandwidth needs will lessen and then CXL.mem starts to look more appealing. Plus, the cache-coherency it offers with accelerators and other CPUs is attractive as well.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
Perhaps a superset of OMI. You neglected to state the non-overlapping set of use cases.
CXL.mem's problem & solution doesn't even cover the same thing as what OMI is trying to solve.
They're tackling COMPLETELY different problems and have no relation to each other at this point in time.
They both sit under the CXL consortium at this point and talk about memory.
But largely, they don't interact AT ALL with each other.

Sure, it's more expensive, but that's for a variety of reasons. Anyway, my point was about power - and that was the only reason I brought it up - because it's the most extreme example and really drives home that point!
And that's factored into my idea.

Rambus came along and challenged that, but the industry said "no, thank you" and just took their DDR signalling ideas (which Rambus has been milking the patent royalties from, ever since).
Rambus was also greedy and wanted a royalty with their tech, something JEDEC wasn't going to do.
That's why we haven't seen ODR (Octal Data Rates) or Micro Threading, because Rambus patent on those technologies haven't expired yet.

The thing you keep refusing to see is that every new DDR standard represented another opportunity to revisit these decisions and they kept going with parallel for very good reasons. Serial has benefits and drawbacks. If you don't need its benefits, then it's foolish to take on its drawbacks.
Again, you think that I'm going to copy OMI 1-to1?
No, I said move the Memory Controller closer to the DIMM, not ONTO the DIMM.
That's the No-Go / No-Sale point for most vendors, they don't want to add in a ASIC or specialty chip on their DIMM or change the Parallel interface.
That has been decided by the industry for "Quite a While" now.
Even I saw that writing on the wall.
Doesn't mean I don't want to move the Memory Controller to be "Physically close" to the DIMM slots.
One of the OMI's lesser talked about implementation is keeping the Memory controller on the MoBo, but PHYSICALLY very close to the DIMM slots.
It could be right underneath it, on the opposite side of the MoBo, or right next to the slots, the location choices are limited by the engineers imagination on where to place it.
As long as it's not on the DIMM itself and remains on the MoBo.
This way, you're only paying for the Memory Controller once, for the time you buy the MoBo.

The DIMM form factor largely remains untouched.

I don't understand where you seem to get the idea that you're smarter than everyone else.
Or maybe I see what IBM sees and sees the upcoming problem with every iteration of new DIMM form factor, the Pin-Count gets higher each generation when you do a major revision. That's not sustainable long term. We can't just keep uping the # of contacts on our CPU Socket just to account for new larger DIMMs.

What if DDR6 has 544-Pins per DIMM Channel?
What then, what will you do when Dual Channel takes 1088 Pins just for RAM/Main System Memory?

The whole Point of OMI is to make things manageable, and to reduce pin count to the CPU via Serial connection.

Where you locate the Memory Controller doesn't have to be on the DIMM, IBM just wants it that way because it's their "Preferred Way" to do things.
It's easiest on IBM and MicroChip gets to sell more Memory Controllers.

Doesn't mean it's what is best for the industry. Their alternate implementation where the Memory Controller sits on the MoBo is the better solution IMO.
The cheaper solution and one that is more manageable in the long term along with powering/cooling the Memory Controller properly.

Okay, so you're keeping L3 inside the CPU package and putting L4 in the DIMMs? That sounds expensive and would have a lower hit-rate than having unified/centralized L4.
Again, that's not what I'm saying.
The Memory controller isn't on the DIMM like in the main OMI implementation.
I'm using their other implementation where the Memory Controller sits on the MoBo.
The L4$ will sit on top of the Memory Controller, similar to how RDNA3 has a Memory Controller with SRAM on top that they market as "Infinity Cache".
It's realistically going to be L4$ since it'll save you many cycles by caching the data straight onto SRAM locally.

That's my point. You're trying to take a server technology and blindly apply it to client machines. It didn't happen with LRDIMMs and even CXL.mem won't be a direct substitute for existing DDR DIMMs!
Again, you don't seem to underst what CXL.mem does and implie that it even covers the same domain.
They aren't covering the same domain at all.

IBM is using OMI in their POWER servers, and we should pay attention to what IBM does, they're at the "Fore Front" of technological innovation.


That's not what I said. I said it was like a marketing piece, because it highlighted all of the selling points and none of the drawbacks. Power being chief among them. Also, they didn't specify the timing of the LRDIMM they were comparing against, which is a bit shady.
I know there's a power cost, but the amount you spend is worth it.


Furthermore, they're comparing against DDR4, which made sense given when it was written. However, we should re-evaluate their speed comparisons against DDR5.
Your Speeds are relative to how fast your Memory Controller can run the DIMMs, and it only gets easier if they are "Physically Closer".


They've seen the issue and the industry has coalesced around CXL.mem.
Again, CXL.mem is a seperate solution to a different problem.
Go read up on what they're trying to do and understand what problem they're solving vs what OMI is solving.

PCIe has shown that you hit a wall in frequency scaling, hence the need for PAM4. That makes serial interface more complex and expensive. So, even if we look beyond the power issue, it's not as if serial doesn't introduce problems of its own. It's not a magic-bullet solution.
But it's a solution to ever increasing # of parallel connections, that's a real problem.

Adding in more Memory Channels gets harder every generation, especially with the MASSIVE pin-counts per DIMM Channel.

FWIW, PCIe is addressing these issues and CXL is piggybacking off their work. So, that's the natural solution for the industry to take.
And OMI, being now part of CXL, is a solution that we can copy and do better within both Intel & AMD.
Both can move the Memory Controller off of the Die, use a Serial Connection between the CPU Die and the Memory Controller.

The Memory Controller doesn't have to be on the DIMM.
The DIMM can remain the cheapo dumb board that it is.
The way everybody loves it.

You act like you're the first person to see that. You should understand that the industry has been dealing with interconnecting devices for a long time. Not just CPUs and DRAM, but PCIe, multi-CPU fabrics, and even GPU fabrics.
And everybody within the industry has eventually gone Serial, after being Parallel for so long.
Now it's the venerable DIMM slot / Memory Channels turn.
Move the frigging memory controller off the CPU die, and into it's own dedicated die, sitting directly next to the DIMM slot.

Not on the DIMM, but next to the DIMM slot.
That's how you maintain a reasonable cost.

IMO, 12-channel (though actually 24), 768-bit in a 6096-pin socket is getting pretty insane! Do you really think nobody at AMD saw the trend lines of increasing channel counts and package sizes and considered whether it made sense to switch to a serial standard like OMI? Do you think they never heard of OMI? The issue they're facing is a power-efficient way to add not just capacity but also bandwidth. Once they move to having some in-package DRAM and use memory-tiering, their external DRAM bandwidth needs will lessen and then CXL.mem starts to look more appealing. Plus, the cache-coherency it offers with accelerators and other CPUs is attractive as well.
And what's next after 12-channel, 16-channel, 20-channel?
Or we can reduce the # of contacts needed and re-use some of those extra contacts for more PCIe lanes.
Everybody LOVES having more PCIe lanes. Who doesn't love having them?

OMI wasn't finalized until very recently, platform development has very long lead times.

And OMI wasn't part of CXL until VERY Recently.
Also OMI was using tech implemented by another company.

AMD and Intel won't be using Microchip's Memory Controller, they'll be doing it themselves.
That takes time to setup their own internal standards to copy OMI and implement it's core functionality.
But the BluePrint is there. The Memory Controller is the secret sauce. Every company has their own version.

in-package DRAM isn't nearly as good as in-package SRAM.
The latency savings you get from SRAM is HUGE and well worth it.

External Bandwidth needs will always go up.

& Cache Coherency is great and all, but CXL.mem is all about running the processing locally, where the data sits, if it's on the local accelerators memory, so be it.

If the data sits on my servers RAM, and somebody needs it, it just accesses it through CXL.mem and performs any adjustment locally on my side through remoting in via the CXL.mem protocol. This way there's less overall movement of data. Ergo saving on energy.

That's a seperate issue from what OMI is solving.
 
Last edited:

bit_user

Polypheme
Ambassador
CXL.mem's problem & solution doesn't even cover the same thing as what OMI is trying to solve.
They're tackling COMPLETELY different problems and have no relation to each other at this point in time.
Explain.

They both sit under the CXL consortium at this point and talk about memory.
But largely, they don't interact AT ALL with each other.
Because OpenCAPI is dead. Nobody wants to run or pay dues to a consortium for a dead standard. So, they donated the IP from it to CXL, in exchange for CXL administering the existing standard (i.e. licensing, etc.).

That's why we haven't seen ODR (Octal Data Rates)
DDR3 has an interface speed 8x that of the DRAM chips on the DIMM. Depending on exactly what you mean by ODR, it was already achieved a decade ago.

No, I said move the Memory Controller closer to the DIMM, not ONTO the DIMM.
...
One of the OMI's lesser talked about implementation is keeping the Memory controller on the MoBo, but PHYSICALLY very close to the DIMM slots.
It could be right underneath it, on the opposite side of the MoBo, or right next to the slots, the location choices are limited by the engineers imagination on where to place it.
As long as it's not on the DIMM itself and remains on the MoBo.
This way, you're only paying for the Memory Controller once, for the time you buy the MoBo.
That either has the disadvantage of one controller per DIMMs slot (at which point why not just put them on the DIMMs?) or multiple DIMMs per channel, which wastes some of the potential advantage you could get.

Worse, by being on the motherboard, you have to replace the entire board if one controller fails. And there's usually not great cooling on the underside of the board.

Also, in increases board costs by having memory controllers even for memory slots or channels that are unoccupied.

with every iteration of new DIMM form factor, the Pin-Count gets higher each generation when you do a major revision.
That's not really true. DDR2 & DDR3 both had 240 contacts, while DDR4 & DDR5 both have 288 contacts.

Anyway, the problem becomes moot when the industry switches mostly to CXL.mem for expansion.

What if DDR6 has 544-Pins per DIMM Channel?
Why would it? It's not as if the designers don't understand the cost tradeoffs of adding more.

The L4$ will sit on top of the Memory Controller, similar to how RDNA3 has a Memory Controller with SRAM on top that they market as "Infinity Cache".
It's realistically going to be L4$ since it'll save you many cycles by caching the data straight onto SRAM locally.
But, that's in dies that are on the same substrate and physically adjacent. In your case, putting the cache out-of-package and on the other side of a pair SERDES will seriously limit the usefulness. Worse yet, it'll be limited to caching just what's on that channel, rather than a unified L4 cache.

IBM is using OMI in their POWER servers, and we should pay attention to what IBM does, they're at the "Fore Front" of technological innovation.
Which is why they're dominating the cloud & server markets? LOL, IBM isn't even an afterthought for Intel or AMD. They're irrelevant.

I know there's a power cost, but the amount you spend is worth it.
Without knowing the cost/benefit, how can you say it's worth it?

Your Speeds are relative to how fast your Memory Controller can run the DIMMs, and it only gets easier if they are "Physically Closer".
Again, you've got it backwards. There's a limit to how fast you can run a serial interface. PCIe 6.0 had to use expensive PAM4 encoding, because they physically couldn't clock it any higher.

Parallel interfaces run at much lower clock speeds and therefore have more headroom.

Again, CXL.mem is a seperate solution to a different problem.
Go read up on what they're trying to do and understand what problem they're solving vs what OMI is solving.
You're the one making the claim that OMI is different and superior than CXL.mem, so I think the onus is on you to justify your claim.

But it's a solution to ever increasing # of parallel connections, that's a real problem.
So is CXL.mem. Especially if you use a switch fabric and pair it with fast in-package memory and tiering.

One cool thing about CXL is that you get flexibility between the number of memory channels vs. I/O devices. Another cool thing is cache-coherency between multiple CPUs or between CPUs and devices.

And OMI, being now part of CXL,
It's not a part of the CXL standard. It's just being administered by the same organization.

The Memory Controller doesn't have to be on the DIMM.
The DIMM can remain the cheapo dumb board that it is.
You could have a CXL.mem host card that has the controller + some DIMM slots. I don't expect these will be very common, however.

And everybody within the industry has eventually gone Serial, after being Parallel for so long.
Not quite. Phones, GPUs, cheap SoCs, etc. will still use soldered DDR memory. Then, we have a middle tier, where there's in-package DDR memory.

And what's next after 12-channel, 16-channel, 20-channel?
At the high-end, you'll get in-package DDR memory or HBM + CXL for expansion.

Intel's Xeon Max and AMD's MI300 are variations on this theme that use HBM. Nvidia's Grace CPU instead uses in-package LPDDR5. I'm not sure about their support for CXL.mem, which might be too new. Anyway, Intel at least supports external DDR5 for expansion.

Anyway, my point is: it's happening. The high end is finally embracing in-package DRAM for CPUs. That helps solve the scaling problem of needing ever more external DRAM bandwidth.

Or we can reduce the # of contacts needed and re-use some of those extra contacts for more PCIe lanes.
That's the main idea of CXL. The connection is the same for both memory and I/O.

OMI wasn't finalized until very recently, platform development has very long lead times.
And nobody cares, because everyone had already jumped on the CXL bandwagon by then.

in-package DRAM isn't nearly as good as in-package SRAM.
The latency savings you get from SRAM is HUGE and well worth it.
SRAM is very expensive and power-hungry. To get the most benefit, it needs to be in-package with the CPU. That's why nobody would put it on the motherboard, like you're saying.

Cache Coherency is great and all, but CXL.mem is all about running the processing locally, where the data sits,
It's literally not. CXL.mem is for devices that are just DRAM. Don't confuse it with the broader CXL standard. Anyway, this tells me you don't really understand what you're arguing against.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
https://en.wikipedia.org/wiki/Compute_Express_Link#Protocols
CXL.mem / CXL.IO / CXL.cache.
They're all protocols for a remote device to access another machines or add-in device's Memory/Cache/IO in a consistent low latency manner without moving the data around.
The benefit is not moving data around unnecessarily and doing the process as locally as possible and getting the results you want.

It has NOTHING to do with what I'm solving, which is a PHY problem with the number of connections needed for PARALLEL DRAM to the memory controller using DIMMs.

Because OpenCAPI is dead. Nobody wants to run or pay dues to a consortium for a dead standard. So, they donated the IP from it to CXL, in exchange for CXL administering the existing standard (i.e. licensing, etc.).
What is it about "OMI (Open Memory Interface)" that you don't understand?
There is no licensing, it's a "Open Standard" for everybody to use.

DDR3 has an interface speed 8x that of the DRAM chips on the DIMM. Depending on exactly what you mean by ODR, it was already achieved a decade ago.
I'm not talking about how fast the DRAM is processing, I'm talking about the communications link between the DIMM & the Memory Controller.

That either has the disadvantage of one controller per DIMMs slot (at which point why not just put them on the DIMMs?) or multiple DIMMs per channel, which wastes some of the potential advantage you could get.
Go read up in detail on what OMI is doing.
Microchip has 1x Controller Chip taking care of 2x DIMMs as two independent Memory Channels.
You don't have to use Microchip's Memory Controller, I'm sure AMD / Intel has their own Memory Controller designs that perform WAY better.

You gain flexibility by having a generic SerDes Serial link to the Memory Controller that can accept ANY type of RAM through a standardized OMI PHY/Serial Protocol that feeds the main memory calls back to the cIOD.
In the end, it's not a disadvantage, it's a advantage in terms of modern "Chiplet Design" & Modularity, along with manufacturing.
Plus you can update the Memory Controller on it's own timeline, that is seperate from the CPU & cIOD's timeline.

By seperating out the Memory Controller from the cIOD in Intel or AMD designs, you literally make the system "Memory Agnostic".
- That means you can easily attach DDR# / GDDR# / LPDDR# / HBM# / Optane / etc, just by changing out memory controllers.
No need to redesign the entire cIOD, just literallly swap connections.
Enterprise customers get flexibility in how many Memory Channels they can implement, along with what type of memory.
You also get flexibility in significantly reduced amount of Pins necessary for each link of each DRAM channel, down to what is minimially viable.
This allows the Memory Controller to be moved MUCH closer to the source of Memory, the DIMMs.
This would increase the possible frequency via significantly better signal integerity and less power needed to run those parallel connections because you're not having the signals run that far from the DIMM to the Memory controller.

Worse, by being on the motherboard, you have to replace the entire board if one controller fails. And there's usually not great cooling on the underside of the board.
If the Memory Controller fails on the CPU, you'd have to replace the CPU, how often do you see a CPU's memory controller fail?
If you don't want it on the underside of the MoBo, then move it to the top where everything else exists.
That's also a viable option. In the end, it's up to you on how your platform / case is designed.

Also, in increases board costs by having memory controllers even for memory slots or channels that are unoccupied.
And you decrease the cost of the cIOD/CPU by moving the Controller out of it.
As for unoccupied DIMM Slots / Memory Channels, that's true for any memory controller on the CPU, it still has to connect to all the DIMM Slots & Memory channels that are unoccupied.
There's no downside for a localized Memory Controller that sits on the CPU vs on the MoBo in it's own chip.
It faces the same problems. I'm just moving the location to somewhere else.

That's not really true. DDR2 & DDR3 both had 240 contacts, while DDR4 & DDR5 both have 288 contacts.

Anyway, the problem becomes moot when the industry switches mostly to CXL.mem for expansion.
Again, you don't understand what CXL.mem is for.

Why would it? It's not as if the designers don't understand the cost tradeoffs of adding more.
Because if you want long term increases in Memory performance, given what JEDEC is trying to do.
Adding more Sub Memory Channels will eventually need more Contacts on the DIMM, it's better to pull off the band-aid now, then later and be ready for it.
DDR5 finally split the DIMM into 2x Memory Sub-channels per DIMM.
DDR6 is currently planning to split the DIMM into 4x Memory Sub-channels per DIMM.
As for the amount of memory channels per module, that will also be doubled for DDR6, with four 16-bit channels joined by 64 memory banks.
I predicted that DDR6 would double the Memory Sub-Channels from 2x to 4x.
But I didn't think they were going to keep the 16-bit channels.
That's literally trying to use LPDDR style 16-bit channel interfaces for throughput without changing the current # of contacts

That's a mistake IMO, going with the current 32-bit/40-bit wide channels is better for long term performance throughput.
That would require massively increasing the Contact/Pin count on the DIMM.

The current 288-pin DIMM, theoretically can have 300-pins in it's current form if all the area was filled with contacts on the data connection edge.
The practical way to get 4x 32-bit/40-bit wide Memory Sub-channels is to add in another row of Contacts, slightly shifted by half a contact space below the DIMM.
Similar to how old CPU slot connectors used to be implemented.
yhGX4RW.jpg
The concept is very old, but it can be done again, but on DIMMs.

I literally have a Long term Road Map for JEDEC from DDR6 -> DDR10 on how to increase the # of Memory Sub-Channels on each DIMM so they get better total throughput of RAM and increase performance over time while standardizing on the DIMM format and increasing sales of RAM Chips over time as well. Something they aim to do.

But, that's in dies that are on the same substrate and physically adjacent. In your case, putting the cache out-of-package and on the other side of a pair SERDES will seriously limit the usefulness. Worse yet, it'll be limited to caching just what's on that channel, rather than a unified L4 cache.
If you're worried about having a "Unified L4 $", we can place SRAM on both the cIOD & on top of the Memory Controllers.
Then we can have a L5$ on top of the independent memory controllers along with L4$ for the Main Memory Access.

Problem solved, more SRAM / Cache is being used to mitigate any latency issues.

Which is why they're dominating the cloud & server markets? LOL, IBM isn't even an afterthought for Intel or AMD. They're irrelevant.
Not for leading edge technology implementations and cutting edge designs.

Without knowing the cost/benefit, how can you say it's worth it?
Because of the flexibility it'll offer the platform / system designers.
The opportunities that it opens up.
The modularity in the chiplet world that everybody is racing towards literally is about thinking about modern IC designs in a modular "Lego Piece" style implementation.

The brilliance of the CCX/CCD model was that it was one CPU Core Complex that can be reused for Server & Consumer.
Design Once, reuse everywhere in a modular fashion.

IBM just got there first for the Memory Controller / DIMM problem, and I agree with their idea. They're ahead of the curve.
Just like AMD was ahead of the curve in implementing chiplets and making it a reality.
IBM is ahead of the curve in solving the ever expanding Memory Channel / Bandwidth issue along with the Parallel Connections issue with having too many Memory Channels.

Just look at how many connections you need on the top end EPYC platforms.
12x DIMMs|Memory Channels × 288-pins per DIMM = 3456 Contacts on the CPU package that are being used just for Memory DIMMs.

A Dual Channel OMI Memory Controller needs 75 Contacts at minimum using the SerDes link
e3yi3RL.png
A single lone memory channel would need 38 contacts
A Dual Memory channel would need 75 contacts.
75*6 = 450 Contacts.

450 Contacts vs 3456 Contacts.
You know how many Pins are on EPYC SP5 for Genoa?
It has 6096 Pins.
3456/6096 = 56.69% of the contacts on the CPU package & Socket is for RAM/DIMMs.
The remaining is mostly for the 128x PCIe lanes & Power connections to the CPU along with some extra stuff.

That's 3006 Contacts left over that you can use as High Speed IO to other things on the MoBo for a EPYC platform.

You want High Speed dedicated CPU interfaces to run xGMI over, plenty of room without eating into the PCIe lane budget.

You want room for a Rear IO panel Chipset, Modern PCS (Peripherial Controller Switch) and the modern modular pair of Chipsets that AMD uses for X670.
Done, reuse the high Speed SerDes link that feeds ALOT more bandwidth than pathetic old 4x PCIe.

With that much more bandwidth, it'd be MUCH harder to bottle neck any device connected off your chipsets.

Again, you've got it backwards. There's a limit to how fast you can run a serial interface. PCIe 6.0 had to use expensive PAM4 encoding, because they physically couldn't clock it any higher.
Right now, it's about 50 GB/s at the moment.
nVIDIA proved that it can be done, they already implement it on their end for a very long length run.

Parallel interfaces run at much lower clock speeds and therefore have more headroom.
Parallel interfaces also eat up alot of Silicon Die Area and necessary Contacts, it's better to move that requirement off into it's own Memory Controller where you can move the Memory Controller to be Physically closer to the DIMMs for better DIMM Connection performance.

You're the one making the claim that OMI is different and superior than CXL.mem, so I think the onus is on you to justify your claim.
You're the one that's claiming that they're doing the same thing, when you clearly haven't been following what CXL is trying to do.
I've known since the start of this post that you're barking down the wrong path since you didn't clearly read what CXL.mem was trying to do.
You assumed it was trying to do what IBM was doing, but you clearly didn't read up about what CXL.mem was trying to accomplish.


So is CXL.mem. Especially if you use a switch fabric and pair it with fast in-package memory and tiering.
Again, you don't get what CXL.mem is doing. See above.

One cool thing about CXL is that you get flexibility between the number of memory channels vs. I/O devices. Another cool thing is cache-coherency between multiple CPUs or between CPUs and devices.
Again, CXL.mem isn't about what you're writing at all.

It's not a part of the CXL standard. It's just being administered by the same organization.
I literally showed you the link, they moved into the CXL family.
They're part of the greater CXL standard, and the tech doesn't even conflict with each other, in fact, they will benefit each other in the long term.

You could have a CXL.mem host card that has the controller + some DIMM slots. I don't expect these will be very common, however.
CXL.mem is a protocol for remote processing & data accessing, not a hardware spec.
You keep conflating the two.

A host card w/ controller + DIMM slots; sounds like you would be better off to shove another MoBo & CPU that already has DIMM slots on board.
Then you can remote access the memory. Far better to use what already exists instead of creating a one-trick pony as a Add-in card solution.
We don't need more Intel NUC Cards that are proprietary as eff.
The industry HATES Proprietary, they want open standardized hardware.
Intel NUC isn't really standard, it's proprietary as hell.

Not quite. Phones, GPUs, cheap SoCs, etc. will still use soldered DDR memory. Then, we have a middle tier, where there's in-package DDR memory.
What you're talking about is a solution for mobile devices and devices that need to be as thin / light weight as possible.

OMI is a solution for modular computers and Enterprise where we use DIMMs for expandable memory.

At the high-end, you'll get in-package DDR memory or HBM + CXL for expansion.
CXL is just a protocol, see above for your misunderstanding, it's not a hardware spec, nor is it trying to do what OMI is doing.

Intel's Xeon Max and AMD's MI300 are variations on this theme that use HBM. Nvidia's Grace CPU instead uses in-package LPDDR5. I'm not sure about their support for CXL.mem, which might be too new. Anyway, Intel at least supports external DDR5 for expansion.
Intel's Xeon Max is a specialized SKU.
AMD's MI300 is a specialized APU for HPC in Data centers who can afford exotic solutions in mass.
nVIDIA's Grace is completely proprietary to nVIDIA, they do what they want.

Most of the server industry still uses DIMMs, for good reason.

Anyway, my point is: it's happening. The high end is finally embracing in-package DRAM for CPUs. That helps solve the scaling problem of needing ever more external DRAM bandwidth.
The high end is using on-package DIMMs because it's a exotic high performance solution.
Just like you have High Performance Sports cars in the world.
There are High Performance memory solutions for computing.
On-package memory with HBM is one of them.

That's the main idea of CXL. The connection is the same for both memory and I/O.
Again, you fail to understand what CXL is trying to really do.
OMI is solving a COMPLETELY different problem to what CXL is doing.
Go take the time to read both in detail and understand what both are trying to accomplish.

And nobody cares, because everyone had already jumped on the CXL bandwagon by then.
So has OMI, it's part of the family now.

SRAM is very expensive and power-hungry. To get the most benefit, it needs to be in-package with the CPU. That's why nobody would put it on the motherboard, like you're saying.
Yet AMD puts SRAM in the form of Marketing Name "Infinity Cache", right on top of the Memory Controller in the Radeon RX 7000 series.
It's fully modular Memory Controller + 3D vCache SRAM on top.
AMD wouldn't be shoving SRAM on top of the Memory Controller on RDNA 3 if it wasn't beneficial in terms of Power & Performance.
6wF3lr0.jpg

It's makes R&D / manufacturing simpler & cheaper for the chiplet future.
hZ6PCzx.png
Again, you misunderstand the energy costs of accessing SRAM vs DRAM.
It's literally 2-Orders of Magnitude cheaper to access the same 32-bit of data on SRAM vs DRAM.
It's literally 128x cheaper in terms of energy cost to pull the same data from SRAM to DRAM.

My idea for using OMI, just applies the exact same principle that AMD has already done on their Radeon GPU side, but applies it for main system memory / DIMMS.

I'm just choosing the configuration of OMI that is most practical for the most benefits.

IBM prefers that the Memory Controller is on the DIMM.
I disagree, that's expensive and wasteful of precious silicon.
Having the Memory Controller placed on the MoBo is the best for:
  • Long Term Performance growth at a reasonable Cost
  • System Flexiblity
  • Environmental use of expensive Foundary/Waffer Silicon space.
  • Keeping DIMMs the simple Cheap Add-In Board that it should be
It's literally not. CXL.mem is for devices that are just DRAM. Don't confuse it with the broader CXL standard. Anyway, this tells me you don't really understand what you're arguing against.
You've got CXL.mem confused. You literally don't understand how they intend to use CXL.mem.
 
Last edited:

bit_user

Polypheme
Ambassador
I'm not going to continue this, because:
  1. I feel like I'd mostly be repeating what I've already said.
  2. It's ultimately pointless. It will have 0.000% impact on whatever the industry decides to do. Not to mention that a lot of the key decisions have already been made.
  3. I have better things I ought to be doing.
The only reason I took the time to reply, in the first place, was to try and help you understand why (from what I can tell) the industry is doing what it's doing. I don't really have a stake in this. I don't need you to believe what I'm saying - I'm just trying to explain it for your benefit.

Regarding point #3, you're smart and clearly passionate about this stuff. Have you ever considered dabbling in hardware design? There are open source CPU projects you can find and dabble with. You can prototype your designs or design changes either in simulators or on FPGA boards. I think there are better ways to contribute than writing these long posts almost nobody will ever read.

: )
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
I'm not going to continue this, because:
  1. I feel like I'd mostly be repeating what I've already said.
  2. It's ultimately pointless. It will have 0.000% impact on whatever the industry decides to do. Not to mention that a lot of the key decisions have already been made.
  3. I have better things I ought to be doing.
The only reason I took the time to reply, in the first place, was to try and help you understand why (from what I can tell) the industry is doing what it's doing. I don't really have a stake in this. I don't need you to believe what I'm saying - I'm just trying to explain it for your benefit.

Regarding point #3, you're smart and clearly passionate about this stuff. Have you ever considered dabbling in hardware design? There are open source CPU projects you can find and dabble with. You can prototype your designs or design changes either in simulators or on FPGA boards. I think there are better ways to contribute than writing these long posts almost nobody will ever read.

: )
I already understand what the industry is doing
What their doing doesn't really affect what OMI is doing in a direct way.
It's tangentially related, but they don't really overlap.
I'm trying to get you to understand that.

I care more about influencing certain companies in what direction their product design will be going.

OMI being one of them.

I'm a software dev, not hardware dev; but I can probably be PM for a hardware team if push comes to shove given my history with PC tech / hardware.

I have a very broad knowledge base and historical understanding along with good at researching any portions of history that I'm missing.
 

bit_user

Polypheme
Ambassador
I care more about influencing certain companies in what direction their product design will be going.
You can't, though. Not like this. The only ways to influence them are either from the inside, or by publishing/presenting original research which rigorously demonstrates your points.

I can probably be PM for a hardware team if push comes to shove
It won't. You don't get into those sorts of positions except by having prior experience in those industries. Most likely hardware engineering, either at a chip company or among the systems companies they sell into.

Zero, and I mean absolutely no engineers or PMs for these companies are scouring internet comment boards for input into their critical product decisions. They have their own decision-making processes for what they do, and they don't care the slightest bit whether random forum posters like you or I understand or agree with them.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
You can't, though. Not like this. The only ways to influence them are either from the inside, or by publishing/presenting original research which rigorously demonstrates your points.
Yeah, it's hard to get on the inside.

It won't. You don't get into those sorts of positions except by having prior experience in those industries. Most likely hardware engineering, either at a chip company or among the systems companies they sell into.
Yeah, that sucks because I come from a software development background, not hardware.

Zero, and I mean absolutely no engineers or PMs for these companies are scouring internet comment boards for input into their critical product decisions. They have their own decision-making processes for what they do, and they don't care the slightest bit whether random forum posters like you or I understand or agree with them.
I know they don't.
 

bit_user

Polypheme
Ambassador
Yeah, it's hard to get on the inside.

Yeah, that sucks because I come from a software development background, not hardware.
You could probably make the switch to hardware, if you want it badly enough to do the work. It won't be easy and I've seen a lot more electrical engineers transitioning to doing software than I've seen people going the other direction.

Also, you need to be very data-driven, because hardware comes down to a numbers game (i.e. in terms of making design tradeoffs) a lot more than software tends to. I get the impression they're continually juggling cost, power, and performance implications, in almost every decision. Not to mention harder-to-quantify things like layout, scalability, signal integrity, complexity, and time-to-market.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
You could probably make the switch to hardware, if you want it badly enough to do the work. It won't be easy and I've seen a lot more electrical engineers transitioning to doing software than I've seen people going the other direction.
At my age, realistically, I would go PM route. That's the most realistic route since I want to affect the designs, not be an actual "Hardware Engineer".
I'll leave that to the real Hardware Engineer Pro's.

Also, you need to be very data-driven, because hardware comes down to a numbers game (i.e. in terms of making design tradeoffs) a lot more than software tends to. I get the impression they're continually juggling cost, power, and performance implications, in almost every decision. Not to mention harder-to-quantify things like layout, scalability, signal integrity, complexity, and time-to-market.
I'm fine with juggling numbers and making trade-offs, as a PM; I could be helpful with a team of engineers on deciding what to trade-off and why.
I have a "Adam Savage" like wide breath of knowledge, it may not be very deep, but it's very wide and covers ALOT of weird bases that most people don't think of.

That's why I focus on thinks like OMI & the next major issue sfor all the major Tech Players.

Also, what's next for DDR6 and beyond, I guessed their future roadmap feature set to be partially correct before I did a little more digging on what JEDEC engineers are planning next.
I didn't expect them to go certain routes in what they were planning.
DDR6 wants 4x Memory Sub-Channels per DIMM, effectively doubling the Memory Sub-Channels from DDR5.
I saw that feature coming a mile away.

However, they want to go 16-bit per Memory Sub-Channel instead of keeping the current 32/40-bit per Memory Sub-Channel paradigm.

That I'm kind of against. I want them to keep the 32/40-bit per Memory Sub-Channel paradigm for now.
I have a bigger picture "Long Term DDR DIMM Game Plan" that runs from DDR6 -> DDR12
 

bit_user

Polypheme
Ambassador
I have a "Adam Savage" like wide breath of knowledge, it may not be very deep, but it's very wide and covers ALOT of weird bases that most people don't think of.
They don't care about weird bases, if you mess up the basics and it results in a nonviable or uncompetitive product. It's a cut-throat business. Unlike software, bad decisions in the design and specification phase are extremely costly and difficult or impossible to retroactively address.

I think you overvalue weird perspectives. You have to get all the basic stuff right, first. An old saying comes to mind: "learn the rules, before you break them". You need to have a very concrete understanding of the rationale underlying all of the current approaches, before you can effectively advocate for upturning any of it.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
They don't care about weird bases, if you mess up the basics and it results in a nonviable or uncompetitive product. It's a cut-throat business. Unlike software, bad decisions in the design and specification phase are extremely costly and difficult or impossible to retroactively address.

I think you overvalue weird perspectives. You have to get all the basic stuff right, first. An old saying comes to mind: "learn the rules, before you break them". You need to have a very concrete understanding of the rationale underlying all of the current approaches, before you can effectively advocate for upturning any of it.
That's why I need a team of engineers to work with.
 

bit_user

Polypheme
Ambassador
That's why I need a team of engineers to work with.
Any time a PM stands up in front of a team of engineers and starts pitching a lot of ideas that don't make sense, their reaction is going to be: "who is this clown?"

They have enough to do, without having to educate the PM about a lot of basics. That's why you need to already know that stuff.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,280
810
20,060
Any time a PM stands up in front of a team of engineers and starts pitching a lot of ideas that don't make sense, their reaction is going to be: "who is this clown?"

They have enough to do, without having to educate the PM about a lot of basics. That's why you need to already know that stuff.
That's why I do alot of research on various subject matter before I make proposals for changes.