Not error correction, but rather error-checking. In the cases I've looked into, there aren't nearly enough check/parity bits for error-correction.I don't anyone is thinking about the idiocy that we have error correction in nearly ALL other PC subsystems,
Again, if you just mean error-detection, BTRFS also has checksums.Even the filesystems are getting error correction with ZFS and ReFS.
It's there because NAND flash is vastly less reliable than DRAM. It's also there because it facilitates smaller cells & more bits per cell. So, having it is a net win. I guess DDR5's on-die ECC is there by the same rationale.And I think all of the current SSD products have error correction built in = making it cost parity as the market expects it to just be there.
I think in-band ECC is the best chance for DDR to do this.I have seen many market shifts that learned to adapt and absorb a change to keep the features up and cost down.
I guess I sort of wonder why we're so resistant to link parity/ECC being computed on-the-fly, now that the DDR5 dies have ECC on-die. If they would just expose error stats for that on-die ECC and maybe increase the ratio a little bit, then you could tell when a DIMM is becoming unreliable and do preemptive maintenance. IMO, that would probably be enough for most of us. If you needed further reliability, then doing in-band ECC on top of those features should be enough, even for servers. The really high-end folks can do memory mirroring, as I believe they already do.