News 16-Pin Connectors Are Still Melting On RTX 4090 GPUs

PlaneInTheSky

Commendable
BANNED
Oct 3, 2022
556
759
1,760
I found the idea of a connector not fully seated causing a meltdown a weird explanation to begin with. I have never seen this happen with other connections.

Either there's a connection or there's not, there's no middle of the road where there's half a connection and the resistance somehow causes heat to build up, that seems like a fantasy explanation to blame the users.
 
*GTA:SA meme*

For real... Why can't nVidia just use 3 8pins or at least allow its partners to rework the power delivery to do so? Well, I guess they'd need 4 8pins, but you can go over the 150W limitation on the 8pins and still be well within the yellow line of 300W per, so...

Regards.
 
  • Like
Reactions: PEnns

RichardtST

Notable
May 17, 2022
236
264
960
I found the idea of a connector not fully seated causing a meltdown a weird explanation to begin with. I have never seen this happen with other connections.

Either there's a connection or there's not, there's no middle of the road where there's half a connection and the resistance somehow causes heat to build up, that seems like a fantasy explanation to blame the users.
Well, no. There are most certainly good connections and bad connections and everything in between. Roughly 90% of electrical problems are simply loose connections. The "user error" argument, however valid it might have been, is simply wrong though. The connector itself is what is prone to users causing errors, which means that the connector is not user-proof, which makes it the connector's fault. We can thank Intel for this particular connector design snafu.

The biggest problem with this connector is that no one can actually admit fault because that would make them liable for all sorts of millions in damages... I don't expect the connector to go away unless people simply refuse to buy anything that uses it. I know I won't.

Want this problem to go away?
Just stop buying devices with this connector.
Simple as that.
 
The connector itself is what is prone to users causing errors, which means that the connector is not user-proof, which makes it the connector's fault.
As I have said before, if engineers had to design products for the unending amount of buffoons' on this planet, there would be no products. You cannot "idiot-proof" something. It is an impossible task. The connectors with bad connections are user error through and through. Are there a few faulty connectors made? Sure, just ask Cablemod. Nvidia is having manufacturers make multitudes more for their products. I am sure you can make a connector that when fully plugged in by the user causes similar issues as to when it is not plugged in properly due to manufacturing being a game of tolerances, and QC pass's. Something will get through the cracks here and there.
 
  • Like
Reactions: daworstplaya

edzieba

Distinguished
Jul 13, 2016
438
431
19,060
I found the idea of a connector not fully seated causing a meltdown a weird explanation to begin with. I have never seen this happen with other connections.
I've seen it happen plenty, mainly with the connector inserted at an angle rather than straight on - the angled connector will 'jam' due to the angle (same as any item inserted into a close-fitting socket will jam when off-angle), and depending on pin vs. housing geometry this can occur with all, none, or a subset of contactor actually making electrical contact. You can have a connector jammed in so hard it's near-impossible to remove and no amount of force can seat it in further, but because it is jammed at an angle it is not actually seated correctly.

It's also notable that nobody managed to recreate the melting behaviour with fully seated connectors regardless of what damage they applied to the connector first (e.g. trying to fracture solder joints), but testing with connectors not fully inserted resulted in the overheating behavior being replicated reliably.


Ideally a connector will be designed such that the housing will jam before any pins contact when inserted off-angle, but that is a very tricky tolerance dance. If anyone is to 'blame' in this situation, it is not Nvidia or PCI-SIG, but Molex - the 16-pin PCIe connector is from their Micro-Fit line.
 

RichardtST

Notable
May 17, 2022
236
264
960
As I have said before, if engineers had to design products for the unending amount of buffoons' on this planet, there would be no products. You cannot "idiot-proof" something. It is an impossible task. The connectors with bad connections are user error through and through. Are there a few faulty connectors made? Sure, just ask Cablemod. Nvidia is having manufacturers make multitudes more for their products. I am sure you can make a connector that when fully plugged in by the user causes similar issues as to when it is not plugged in properly due to manufacturing being a game of tolerances, and QC pass's. Something will get through the cracks here and there.
Except that this is not a "here or there" problem. It's a very high percentage of users having the same problem for the connector to be considered safe. Idiot-proofing devices is not difficult. It just requires the designer to be smarter than the idiot. That is not always the case, as is shown with this example here.
 
  • Like
Reactions: PEnns

MoxNix

Distinguished
Jul 27, 2014
73
42
18,560
Well that picture makes it clear what the problem is. What idiot came up with that design and how did it make it through testing without anyone noticing and correcting the design?

The problem is those 4 smaller pins that are set back from the larger pins. It'd be very easy to snap it in with the large pins fully seated but not the little ones especially since the latch is on the side away from the little ones.

All pins should be the same and flush with each other at the mating surface. The problem isn't idiot users it's the idiots who designed and tested the thing. Any competent technician / technologist would have noticed long before it got to manufacturing!
 
  • Like
Reactions: PEnns
Except that this is not a "here or there" problem. It's a very high percentage of users having the same problem for the connector to be considered safe. Idiot-proofing devices is not difficult. It just requires the designer to be smarter than the idiot. That is not always the case, as is shown with this example here.
Show me any product and I can tell you how an idiot will break, fry, destroy, or otherwise negligently cause the demise of their product. If you design products to be idiot-proof for the ever growing population of uniquely gifted idiots, there will be no products. This is a supertask. Every product is designed with a "good enough" approach, and anything that "slips through the cracks" is a calculated risk by the designers.

Lets say there are 100 cases where a 4090 fried its connector or port. If 70 of them were user error, 12 of them were bad ports on the 4090, and the remaining 18 were faulty 12-pin connectors, you would be looking at a approximate 12/10,000s rate for bad ports and 18/10,000s for the connector. Seems like a completely reasonable design to me.
 
  • Like
Reactions: daworstplaya

USAFRet

Titan
Moderator
Show me any product and I can tell you how an idiot will break, fry, destroy, or otherwise negligently cause the demise of their product. If you design products to be idiot-proof for the ever growing population of uniquely gifted idiots, there will be no products. This is a supertask. Every product is designed with a "good enough" approach, and anything that "slips through the cracks" is a calculated risk by the designers.

Lets say there are 100 cases where a 4090 fried its connector or port. If 70 of them were user error, 12 of them were bad ports on the 4090, and the remaining 18 were faulty 12-pin connectors, you would be looking at a approximate 12/10,000s rate for bad ports and 18/10,000s for the connector. Seems like a completely reasonable design to me.
If it is simply user error, and not a design error that lets idiots do it wrong, then why have we not seen fails like this before?

The same idiots were connecting GPUs last year, apparently without this level of meltage.
 
If it is simply user error, and not a design error that lets idiots do it wrong, then why have we not seen fails like this before?

The same idiots were connecting GPUs last year, apparently without this level of meltage.
Do you mean with other connectors like the 8-Pin? If so, this is because the 8-pin connector is not a long enough connector to physically have the connections on one side cause this potential issue. It is cheaper to replace a few 10's of cards in the X0,000s of 4090s produced than to make all of them have tighter tolerances to the degree that that the connector only connects flush on either side. As I alluded to and another has mentioned, most of the recent fried 4090s were caused by third party cablemod 90 degree 12pin connectors.
 

USAFRet

Titan
Moderator
Do you mean with other connectors like the 8-Pin? If so, this is because the 8-pin connector is not a long enough connector to physically have the connections on one side cause this potential issue. It is cheaper to replace a few 10's of cards in the X0,000s of 4090s produced than to make all of them have tighter tolerances to the degree that that the connector only connects flush on either side. As I alluded to and another has mentioned, most of the recent fried 4090s were caused by third party cablemod 90 degree 12pin connectors.
I meant - Other previous connection styles from a PSU to a GPU.

This one on the 4090 is slightly different than previous?

Great. But if so....that new design needs to be tested with the aforementioned idiots.


But if it is a case of these cable mod connectors...that is a whole other issue.
 
I found the idea of a connector not fully seated causing a meltdown a weird explanation to begin with. I have never seen this happen with other connections.

Either there's a connection or there's not, there's no middle of the road where there's half a connection and the resistance somehow causes heat to build up, that seems like a fantasy explanation to blame the users.
What? Loose electrical connections are a huge cause of failures/fires!! Everything from loose/corroded connections in vehicles to worn/faulty outlets in homes have caused melted plugs and/or fires.

"Either there's a connection or there's not". 100% patently false! This isn't a digital signal we're dealing with here.
 
Apr 1, 2020
1,447
1,104
7,060
Even if the connector was plugged in completely, it has been shown that due to manufacturing cheapness the wires themselves may not stay in position. Considering the number of failures compared to units sold it makes the makes the most sense to me. Also considering the amount of wattage being pulled through one connector instead of being balanced across multiple, the stress over time could be causing these additional failures.

This is why I think it should have been designed using a coaxial cable type or BNC type connector. For a connector designed to handle 500w or more in the future, it needs to be both as strong and secure as possible and have the fewest points of failure.
 
  • Like
Reactions: helper800
With the implementation requirements in the spec being so loose I'm unsurprised this has happened. This really seems to be a case of everyone failing along the way with all of the companies sharing a bit of the blame. In theory the 16 pin connector is far superior, but it clearly needed more fine tuning before going live.
 
Even if the connector was plugged in completely, it has been shown that due to manufacturing cheapness the wires themselves may not stay in position.
That is not true. I don't know where you got that information from. Instead the reason that these problems still crop up is because people have not magically gotten any smarter in the last few months. Search "PCIe melting" .. there's nothing new about this and it'll never go away because people simply make foolish mistakes.
 
Apr 1, 2020
1,447
1,104
7,060
LOL so would properly seating the connector. The 12VHPWR connector can easily carry 3x or more power and still not fail.

Again search .. PCIe melting .. look at the images. The excess heat is caused from improper seating and resulting from arcing.

The 'RTX 50' series might add a second 12VHPWR connector. Otherwise imagine how insane 8 x 8-pin connectors would be nevermind the risk of failure. The connector is nothing that Nvidia designed.
 

Vanderlindemedia

Prominent
Jul 15, 2022
97
55
610
I found the idea of a connector not fully seated causing a meltdown a weird explanation to begin with. I have never seen this happen with other connections.

Either there's a connection or there's not, there's no middle of the road where there's half a connection and the resistance somehow causes heat to build up, that seems like a fantasy explanation to blame the users.

Ofcourse it does. The less "contact" the more the resistance and thus the more heat. The heat is causing the connectors to start melting.

I know that even removing / re-seating a PCI-E 8 pin or 6 pin cable, over time will actually errode. I have a perfect example of this where a cable i used for years start to throw the same problem.

The VRM IN Voltage would significant drop to 11V at full load, which is beyond spec. When i swapped the cable, the voltage was steady at 12.1V again, at full load. That was a RX580 peaking at 350W peak load in Furmark.

The pins are extremely sensitive, and they are smaller compared to a regular 8 pin. So yeah, there's 100% a thing going on with resistance. its not just the PSU.
 
  • Like
Reactions: MoxNix

Giroro

Splendid
Show me any product and I can tell you how an idiot will break, fry, destroy, or otherwise negligently cause the demise of their product. If you design products to be idiot-proof for the ever growing population of uniquely gifted idiots, there will be no products. This is a supertask. Every product is designed with a "good enough" approach, and anything that "slips through the cracks" is a calculated risk by the designers.

Lets say there are 100 cases where a 4090 fried its connector or port. If 70 of them were user error, 12 of them were bad ports on the 4090, and the remaining 18 were faulty 12-pin connectors, you would be looking at a approximate 12/10,000s rate for bad ports and 18/10,000s for the connector. Seems like a completely reasonable design to me.
If your design for a consumer product can burn down a house if it's a fraction of a millimeter out of place, then your design isn't "good enough".
 
  • Like
Reactions: MoxNix

aberkae

Distinguished
Oct 30, 2009
105
31
18,610
Wow. I have a similar setup 4090 suprim liquid at default since launch 10/22 and 750 watt sfx platinum psu. Although I am running on an open case and give the wires minimum bend resistance in an itx h210 case. Also While not relevant I was previously rocking the 9900ks at 5.1 ghz all cores and upgraded to 7700x to all core oc at 5.65 ghz since launch. Lastly I have only 3 out 4 of the pcie 8 cable connectors to the psu. If something goes wrong I'll definitely post it here.