News The GPU 16-pin melting fiasco is getting ridiculous — now this entire Nvidia RTX turns into a red ring of death when it is incorrectly plugged in

Prominent warning only if you use a glass box. So the design failure continues.

me -> grabbing popcorn while waiting for Nvidia class action... or should that be glass action?

-> I know the joke is bad... leaving now!
 
  • Like
Reactions: eliman and artk2219
I have commented on this so many times:

"If they are Hell bent on sticking with this configuration on the GPUs, they should be using interleaved PWM switching stages, with one power stage used per pin. In other words, do not tie the PSU pins together in parallel: Treat each pin as its own supply with its own Buck converter causing equal current draw. This allows the PWM controller to force each pin to draw a balanced load and not rely on uncontrolled parallel path resistances."

Since they are probably using interleaved Buck converters anyway for this high of a power rating, this would not incur much of a price increase for the GPU. It might require some added input filter capacitance on each Buck converter due to conducted EMI back toward the main PSU. The nutting thing here is that these interleaved Buck converters are having to create equal currents in their power stages inherently. But their inputs are getting tied together and accepting the input power from a paralleled wiring system.

This is why companies should have a seasoned consultant come in and review their designs. In this case, a power electronics guy.

Mercy, this is such as obvious and simple fix. And I mean bonehead simple.
 
  • Like
Reactions: artk2219
And some suggestions to use these in the meantime (from a previous post of mine):

1. Do not use cabling that has a wiring loom wrapped around it, which acts as to reduce and constrict the airflow. The connector datasheets are expecting decent wire cooling to draw heat away from the connector housing at rated current. So no "pretty" nylon shroud covering the wiring. This allows for reasonable convection cooling of the wire. If this loom exists, remove it and allow the wire insulation to convection cool better. If you really want the wiring looms, consider current/power derating if you want to use them (like only below ~300W).

2. The wire is expected to act as a heatsink for the connector pins and sockets. Again this is simple physics used in ALL connector systems. The pin and socket contact points are tiny and much smaller than the wire cross-sectional area bring current to the pins & sockets. For this reason, suppliers should find a way to use larger wire than is presently being used. Even tool up a larger pin that fits the housing.

3. Since the pin and socket contact points have gone through oxidation and corrosion before you even install them, you should plug and unplug them several time to knock this oxidation off. Don't get excessive with it because you don't want to create fretting failure by scrapping off too much tin plating. About 3-5 time is great. One post I read stated that these pins are only rated for 50 insertions for their entire lifetime. Make sure the last insertion is seated well. Think about how long the GPU took to get to you, sat in a transport ship with high humidity to get to you, was even assembled in a factory with high humidity when it was soldered together.

4. You want to get the heat out of the cabling ends. So mount a fast fan by the GPU connector, as well as at the PSU connector. The imbalance can be caused by both ends of the cable! The fans should be mounted close by the connectors so that the airflow is "impingement".

I hope these suggestions help people while the industry addresses these shortcomings.
 
Last edited:
  • Like
Reactions: artk2219
"If they are Hell bent on sticking with this configuration on the GPUs, they should be using interleaved PWM switching stages, with one power stage used per pin.
Exactly this. It's understandable that they made this error when the connector was first introduced. Historically this kind of fault prevention wasn't valuable enough to justify the cost/complexity (particularly on the design side). This is just because these are pretty low-margin products for the most part (for the manufacturer).

Now, the fact that they're low-margin products dictate the opposite. They can easily eat through most of their ROI with warranty replacements and bad press. And since this has been a known issues for several years, they can't exactly plead ignorance.

Nvidia guidance was initially rather vague, but it has since gotten quite specific; it's confusing to me why manufacturers are so slow to address the issue.

Not that Nvidia is blameless. The standard should probably work more like USB C such that the pins are independently and separately verified; however, anything sophisticated enough to truly verify (e.g. PSUs measuring resistance on test loads through each pin) would require a lot of added cost/complexity to PSUs, already a component famous for cutting corners on low end products. But if USB-PD can solve this problem with low-cost commodity chips, I suspect that this is a solvable problem that only Nvidia is in a position to solve.
 
  • Like
Reactions: artk2219
Deaign and shunt configuration on side of VGA's 12VHPWR and 12V2x6 is a disaster by all metrics and means, though, I've installed up to a hundred of RTX 4070/4080/Super/Ti/4090 and 5080 and 5090 cards, some had bad values on particular rails [I use clamp amp meter and measure each rail manually when it's not Astral model] and I've had 0 issues so far. Yes, some cables went up to 10A/rail so I replaced whole cable, but so far so good. I'm well asare that ~100 units is a small sample and has anecdotal information and experience value at best, but still...🤷‍♂️
 
I've got an even better solution though AIB aren't going to like it. The GPU should be an add on chip to a dedicated motherboard socket surrounded by VRAM sockets. You pick your GPU, you pick your VRAM configuration. You want to start out with a 4060 and upgrade down the road? You got it. You want to start with 8 GB and go to 16GB later? You got it. All on a minimum 256 bit bus.
 
What is this new news? From the artcicle... #To address the ongoing issue, graphics card maker Galax has introduced a new solution aimed at warning users of potential failure. Its latest Hall of Fame (HOF) series GPUs, including the RTX 5080 and RTX 5070 Ti variants, feature ARGB lighting that also functions as a debug LED.

Galax introduced this feature with 4000 series or better say 4090 HOF. For over two years ago. So thanks for yesterdays news packed in as todays news :)
 
  • Like
Reactions: Nitrate55
I have a dedicated 10,000btu air conditioner duct blowing directly on my 12V-2x6 power cables. I also have a live broadcast thermal camera monitoring the temperature of said cable that I can check from anywhere even when out of town. This way I can assure that my $3k GPU which could easily have had built in precautions and fail safes doesn't self immolate and take my whole house with it.

/s

I'm still running a 4080 super and will probably keep it until it dies. 50 series is beyond a joke.
 
  • Like
Reactions: Nitrate55
I've got an even better solution though AIB aren't going to like it. The GPU should be an add on chip to a dedicated motherboard socket surrounded by VRAM sockets. You pick your GPU, you pick your VRAM configuration. You want to start out with a 4060 and upgrade down the road? You got it. You want to start with 8 GB and go to 16GB later? You got it. All on a minimum 256 bit bus.
If you thought a motherboard was expensive just wait until you add enough layers and LGA sockets for all of that.

The cheaper solution is to just transition everyone to EPS connectors like they did in servers.
 
If you thought a motherboard was expensive just wait until you add enough layers and LGA sockets for all of that.

The cheaper solution is to just transition everyone to EPS connectors like they did in servers.

I think with the extremely high power levels we are using now, we are moving to the motherboard receiving bulk power at a higher voltage from the main PSU and bucking it down to what the motherboard needs and the peripherals deriving power off that bulk power. The 12V rails are too low for this anymore for systems using 1KW+ power. They should consider a 28VDC bulk power system, which would drop the current levels to 42% of what we have now, and stay below the UL shock hazard rules. UL requires the voltage to be below I think 33V (AC and DC?). Which I always find interesting because you can get a mild shock with sweaty skin even with only 12V.

The gotcha with the 5V/12V rails we are using now, the PC industry negotiated this with UL many moons ago when PCs did not use more than ~150W and UL agreed that 12V did not represent a shock hazard. This is why we can have non-fuse high current 5V/12V rails in a metallic box (old school PC case). The box keeps the fire inside if the 5V or 12V wiring gets incandescent. Except the industry forgot the metallic box rules, and cases are now sold independently of the PC manufacturers. With even wooden and plastic cases! The earliest PCs had a UL sticker right on them (IBM XT compatible days).

And it isn't just the PC industry that is messing up the recipe. For instance, solar panel installations with batteries and back-to-grid inverters are causing tons of fires. The home insurance industry is starting to notice and not insuring houses with solar systems (big trend in Florida). The panels catch fire as well as the batteries, inverters, and even the wiring (just like these GPU cables). In the case of solar - they also need to understand the lightning implications of installing a metal framed solar panel on a roof. That frame ground needs to be sized for lightning, and not just the solar power fault currents.

It is VERY common for people to make crucial design decisions and not realizing that they don't understand what they need to know. If the design flaw is bad enough and widespread enough, it becomes obvious. And yes - they will find a way to blame the end user that they did not plug the cable in correctly. The hair dryer industry tried that one, too.
 
I think with the extremely high power levels we are using now, we are moving to the motherboard receiving bulk power at a higher voltage from the main PSU and bucking it down to what the motherboard needs and the peripherals deriving power off that bulk power. The 12V rails are too low for this anymore for systems using 1KW+ power. They should consider a 28VDC bulk power system, which would drop the current levels to 42% of what we have now, and stay below the UL shock hazard rules. UL requires the voltage to be below I think 33V (AC and DC?). Which I always find interesting because you can get a mild shock with sweaty skin even with only 12V.
I could see that working as a more long term solution. Something like a dual output PSU that also retains 12V/5V/3.3V for all the existing fan controllers and other add-in parts and having the high power outputs for new motherboards and GPUs.

EPS only downside is the 8-pin form factor, which is easy to confuse with 8-pin PCIe. I think that is the only reason it was avoided on the consumer side.
 
Next up maybe implement the AI function utilizing the powerful AI hardware of the RTX cards, when power draw isn't ideal, a hollywood scifi sound: "Warning, immient melting of connector in 10...9....."

/s
 
Why could they the manufacturers not have designed the new GPU's that use that much power to have parts that will handle the maximum load with out burning up and have limiters in place in the GPU and power cables to save it from destroying itself and have a fail safe like a fuse with a shutdown feature before the GPU and power cables are destroyed? I know money and profit is everything these days for the card manufacturers, BUT when you are paying between $2500 and $5000 for a new GPU, you expect it to perform within it operating parameters without danger of it or the cables burning up and possibly destroying the entire PC. High end PC's are a major investment these days. Just food for thought. I hope they can find a solution that works for people who need these high end GPU's. Also the new high end PSU's should be able to provide clean power for the new GPU's with fail safe equipment enhanced sensors built in to shutdown the system before a disaster happens with the GPU and the rest of the hardware.. Cheers from an old school PC builder and gamer.
 
Last edited:
I think everyone misses something obvious..... most people don't expect cables to get even slightly hot nvm melt.

The moment you feel a cable get hot you start thinking about replacing it because common sense tells you something is wrong somewhere.... the notion that it should be is completely foreign.

People also never test the temperature of cables once a chassis is closed.... at most they use motherboard sensors.... they just never knew about any of this before this saga started.

On the other hand..... how is this anything other than the manufacturer f-ing up his design and blaming the customer? Imagine if this happened with motor vehicles.... how is it different from the last decade of phones and cars randomly catching fire because of developer overreach?
 
  • Like
Reactions: digitaldata77
If there is functionality the detect the plug is not fully connected, why in the wide world of sports would you go with changing the LED color as a warning? It seems obvious that the card should be disabled from drawing high current and the LED operate as a diagnostic error code.

I'm struggling to imagine that design/development conversation. "If the cable isn't fully connected should we disable the card and tell the user to reconnect the cable?" "No, just warn them it is going to catch on fire."