News Bug Forces Intel to Halt Some Xeon Sapphire Rapids Shipments

Just wow... this CPU seems destined to gain a reputation up there with the most cursed projects at Intel.

On the flip side, hopefully Emerald Rapids will benefit from all the debugging and iterating that has gone into Sapphire Rapids.

The real irony is that you'd expect the XCC to be the one with the late issue(s) - not the MCC. However, maybe because the XCC is more complex, it had more scrutiny and troubleshooting early-on.
 
Credit where credit is due, at least Intel gives a flying chicken about its server chips unlike AMD which basically told all of its customers to go ..... themselves.
 
Credit where credit is due, at least Intel gives a flying chicken about its server chips unlike AMD which basically told all of its customers to go ..... themselves.
You're talking about the Rome 1044-day bug? That's apples-and-oranges.
  1. Rome is already 4-years-old and 2 generations behind, whereas Sapphire Rapids just started shipping earlier this year.
  2. The mitigation for the 1044-day bug is just reboot at least once every 2.86 years, which nearly all server operators will already be doing for software upgrades & maintenance.
  3. It's not like Intel CPUs don't have plenty of errata, including side-channel vulnerabilities on older Xeons they didn't even bother to release mitigations for.
  4. Intel has even removed features in shipping CPUs, like when they withdrew TSX via microcode updates!

People in glass houses shouldn't throw stones.
 
Last edited:
It's getting "normal" to market new stuff without extensive testing. Quality is less important than making the deadlines these days. These days it's in all the tech departments not only in chip and computer branches.
And we the customers are the ones who pay for this in my opinion by the loss we have if something goes wrong with new tech. We pay al lot of money for it but there seem to be no guarantees. It's saddening.
 
You're talking about the Rome 1044-day bug? That's apples-and-oranges.

People in glass houses shouldn't throw stones.
More like the exploding CPUs where AMD did not stop sending out their CPUs but rather opted to blame everybody else.
AMD should have recalled every CPU and fixed their microcode or temperature sensor or whatever the problem is, just like intel does here.
They opt to not send out CPUs they know have an issue and if they can they fix them them before sending them out.

This is not even making the CPUs explode it just could interrupt system operation under certain conditions , as always you are the one throwing stones around.

"We became aware of an issue on a subset of 4th Generation Intel Xeon Medium Core Count Processors (SPR-MCC) that could interrupt system operation under certain conditions
...
...
Out of an abundance of caution, we did temporarily pause some SPR MCC shipments while we gained confidence in the expected firmware mitigation and expect to release remaining shipments shortly."
 
  • Like
Reactions: rluker5
Just a correction: "Errata" is a plural, meaning "errors" or "list of errors". The singular is "erratum". Companies like Intel publish ONE Errata for each product, which gets updated with new entries over time.

To be fair most pundits and even tech company representatives get this wrong nowadays.
 
  • Like
Reactions: bit_user
Years of Bulldozer and Piledriver made Intel complacent. For the company their size, it will take some time before they resolve their issues. In case of GPU's I hope it's earlier than later. I'm looking forward to next generation. Arc A750 is a decent GPU.
 
More like the exploding CPUs where AMD did not stop sending out their CPUs
Fixed in firmware, just as Intel is stating it will do in this case.

AMD should have recalled every CPU and fixed their microcode or temperature sensor or whatever the problem is, just like intel does here.
No, the article very clearly does not say that Intel is actually recalling anything. It just says they stopped shipment, pending a fix.

They opt to not send out CPUs they know have an issue and if they can they fix them them before sending them out.
I think the difference is that Intel probably knows the problem could manifest quite frequently, if some software tickles it in just the right way.

Another key difference is that AMD was a lot quicker in diagnosing the root cause and issuing a BIOS fix. If they stopped shipment of new CPUs, it'd have only been for a few days (and who knows? maybe they did!).

This is not even making the CPUs explode it just could interrupt system operation under certain conditions,
"Interrupt system operations" = system hang, which likely means a hard reboot + potential for data corruption.

The real question is about the relative frequency. We only know of a handful of AMD CPUs that actually failed in the wild. If Intel believes its problem can manifest quite frequently, then the calculus is different.

as always you are the one throwing stones around.
No, I'm not the partisan operative, here.
 
It's getting "normal" to market new stuff without extensive testing. Quality is less important than making the deadlines these days.
What's ironic about that statement is that Sapphire Rapids was years late to market, depending on which roadmap you look at.

Even as of quite recently, a volume ramp was expected in '22, but it didn't happen until earlier this year. Here's a slide Intel published in mid-Feb '22, showing they still expected a 2022 launch:

Slide.png

 
More like the exploding CPUs where AMD did not stop sending out their CPUs but rather opted to blame everybody else.
AMD should have recalled every CPU and fixed their microcode or temperature sensor or whatever the problem is, just like intel does here.
They opt to not send out CPUs they know have an issue and if they can they fix them them before sending them out.

This is not even making the CPUs explode it just could interrupt system operation under certain conditions , as always you are the one throwing stones around.

"We became aware of an issue on a subset of 4th Generation Intel Xeon Medium Core Count Processors (SPR-MCC) that could interrupt system operation under certain conditions
...
...
Out of an abundance of caution, we did temporarily pause some SPR MCC shipments while we gained confidence in the expected firmware mitigation and expect to release remaining shipments shortly."
Is it Intel's official stance on the matter, or your personal take? As an Intel employee you should always make it clear cause this load of horsecrap wouldn't even fly on a private Slack channel.
 
  • Like
Reactions: dalek1234
More like the exploding CPUs where AMD did not stop sending out their CPUs but rather opted to blame everybody else.
AMD should have recalled every CPU and fixed their microcode or temperature sensor or whatever the problem is, just like intel does here.
They opt to not send out CPUs they know have an issue and if they can they fix them them before sending them out.

This is not even making the CPUs explode it just could interrupt system operation under certain conditions , as always you are the one throwing stones around.

"We became aware of an issue on a subset of 4th Generation Intel Xeon Medium Core Count Processors (SPR-MCC) that could interrupt system operation under certain conditions
...
...
Out of an abundance of caution, we did temporarily pause some SPR MCC shipments while we gained confidence in the expected firmware mitigation and expect to release remaining shipments shortly."


Your take is dumb.

Because noone of these chips should have been fed such large voltages by the motherboards.
Voltages that were out of specs. (like Asus delivering almost 1.5 volts ABOVE the specs)

Which to be honest.. is no surprise.
Let's remember that some motherboards used to increase voltage and remove boost limits on intel cpus to appear "better" in benchmarks under "default" motherboard settings.
The only thing for AMD to blame was how this could cascade into the chips burning themselves instead of shutting down and blocking everything.

Which many testers (like Gamers Nexus) showed that motherboards where throwing above than normal voltage in many rails even when they were configured to not do so.
And how many motherboards had the sensors and technology to detect such failures but opted to ignore and add other crap like RGB.
 
"Intel also told us that it doesn't expect the firmware mitigation to have an impact on performance."

The words "doesn't expect" probably means "There will most likely be a performance impact, but because we haven't finished the new firmware, we don't know what that impact will be"
 
  • Like
Reactions: tamalero
Your take is dumb.

Because noone of these chips should have been fed such large voltages by the motherboards.
Voltages that were out of specs. (like Asus delivering almost 1.5 volts ABOVE the specs)

Which to be honest.. is no surprise.
Let's remember that some motherboards used to increase voltage and remove boost limits on intel cpus to appear "better" in benchmarks under "default" motherboard settings.
The only thing for AMD to blame was how this could cascade into the chips burning themselves instead of shutting down and blocking everything.

Which many testers (like Gamers Nexus) showed that motherboards where throwing above than normal voltage in many rails even when they were configured to not do so.
And how many motherboards had the sensors and technology to detect such failures but opted to ignore and add other crap like RGB.
You do realize that all motherboard vendors supplied almost 1.5V to the 3D chips? Yes i also blame AMD for providing such a poor bios for manufacturers, why Asus was bashed is because of its atitude, releasing a beta bio (provided by amd) and saying that "by installing this bios warranty is voided" which is a very bad atitude. Also all vendor first beta bios published had not fixed issue (again al vendor doing 1.38 or 1.39 volts) which is an indication that all were supplied with same bad bios from amd. Stop defending AMD in this situation bcuz is the main culprit, only thing that Asus deserves bashing is the atitude not the issue with exploding cpus.