[SOLVED] Having a very specific issue with 2x RTX 2080 Ti (NVLink/SLI)

xenemorph

Reputable
Feb 5, 2017
6
0
4,510
I've pretty much narrowed down this issue to being specifically with my graphics cards only when they're in SLI (NVLink.)

Below are the symptoms:

1) I can only view my BIOS screen on Graphics card #1 and I can only view Windows on Graphics card #2 (when the monitor is plugged into this graphics card.)

2) The display blackscreens and graphics cards go to 100% fan speed, but audio is still there and I have reason to believe applications still work. Once in a while (very rarely) I may be able to make the game crash, and display comes back.

This happens when:

a) Playing graphic intensive games on max settings
b) Playing a game and accidentally launching another game
c) Playing a game while having graphic acceleration enabled on other apps like Discord
d) Playing a game that may have issues with SLI (like classic World of Warcraft.)


Issues I have ruled out:

a) The display (I have multiple displays)
b) All the other hardware (everything functions properly with SLI disabled.)
c) The cables (I've tried multiple, and also HDMI vs DisplayPort.)

Other Notes

a) Graphics card #1 does get a lot hotter than #2 due to the location, but fans kick in and it never goes above recommended temperature.
b) I suspect power supply may possibly be partially the culprit as I have a 700W powering 2x 2080 Ti and i9-9900K, but that doesn't explain everything.
c) Nothing is overclocked.
d) Graphics drivers are updated to latest version. Interesting note, WoW classic worked fine in SLI before the recent Nvidia driver update.
e) I did use the Quadro NVLink because the official RTX one did not have a 2 slot spacing. It's supposed to be the same, Nvidia just doesn't/didn't have official RTX 2080 Ti ones because of heating concerns.

System Specs:

ASRock Z390 Taichi Ultimate
i9-9900K
2x EVGA RTX 2080 Ti Black (11G-P4-2281-KR)
1TB NVMe 960 Evo SSD
2TB Micron 1100 SSD
700W Silverstone SX700-LPT
2x Lian Li PCIe Riser Cables (PW-PCI-E38-1)
1x Corsair Vengenace LPX 16GB PC4 17000 C13
Noctua NH-L12S CPU Cooler
2 Slot Quadro RTX NVLink



Could this be from Graphics card #1 being faulty? Is it just bad SLI drivers? Could it be the NVLink?

(edit) I just tried this with underclocking both cards and turning down the power limit and so far no black screen. This may be the main solution, getting a more powerful PSU. However, I'm still concerned about one card working for BIOS and the other for Windows.
 
Solution
The power supply is one of the reasons. It's simply not enough to run everything.
9900K(250w) + [2080Ti Black(280w) x2] + 50w to account for everything else = 860w.

The power supply's sockets don't even match up with your system specs...
aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9WL1YvNjE0MTU1L29yaWdpbmFsL3BzdV9yZWFyLmpwZw==

So however you managed to do it... is wrong:
-EPS 8-pin = Cpu
-24-pin = Motherboard
-2080Ti #1 = 2x pcie 8-pin
-2080Ti #2 = nothing compatible remaining
-2TB Micron SSD = 6-pin SATA
*3x 6-pin SATA left


Your build is a mess, mate! What's the purpose of this build?
1)What is your case? I see you've got 2x riser cables, so you must have a fairly roomy model...

2)A 9900k can't be...
in bios u can set which gpu to start with, or check if there is multimonitor setting, default is usualy peg1/single monitor
as for windows, u can connect monitors to any gpu, but u should want to connect it to a single card (latency reasons)
MSI Afterburner has an option to change "master graphics processor selection"
 

Phaaze88

Titan
Ambassador
The power supply is one of the reasons. It's simply not enough to run everything.
9900K(250w) + [2080Ti Black(280w) x2] + 50w to account for everything else = 860w.

The power supply's sockets don't even match up with your system specs...
aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9WL1YvNjE0MTU1L29yaWdpbmFsL3BzdV9yZWFyLmpwZw==

So however you managed to do it... is wrong:
-EPS 8-pin = Cpu
-24-pin = Motherboard
-2080Ti #1 = 2x pcie 8-pin
-2080Ti #2 = nothing compatible remaining
-2TB Micron SSD = 6-pin SATA
*3x 6-pin SATA left


Your build is a mess, mate! What's the purpose of this build?
1)What is your case? I see you've got 2x riser cables, so you must have a fairly roomy model...

2)A 9900k can't be kept cooled with a NH-L12S! And with those 2x 2080Tis in the case, just skip air cooling all together, and get a 360mm AIO - if your case doesn't support that size, then a 280mm model. 240mm won't cut it.

3)Only 1 stick of ram? WTH? Dual channel mode is a must! That's performance missing from the start.

4)
I did use the Quadro NVLink because the official RTX one did not have a 2 slot spacing. It's supposed to be the same, Nvidia just doesn't/didn't have official RTX 2080 Ti ones because of heating concerns.
The heating concerns are justifiable. First, a 280w card is going to be anything but cool. 2nd, the design of the 2 and 3 fan coolers works against the cards in multi-gpu setups, where the blower design is ideal.
^That is why Geforce RTX NVLink only has 3 and 4-slot bridges available.
You won't be able to get away with sandwiching those cards on the 2-slot bridge intended for the blower style Quadro - one is going to be suffocating/overheating.

5)What are the specs of your monitors?


There is a clear lack of research/planning done for this build...
 
Last edited:
  • Like
Reactions: dotas1
Solution

xenemorph

Reputable
Feb 5, 2017
6
0
4,510
The power supply's sockets don't even match up with your system specs...

I did end up getting a different power supply. I've also measured the usage on high load, it's around 550W maximum. So I guess if you add the inefficiency, it's possible this was the problem.

In any case, I no longer have the problem with this new power supply.

The old one did have enough sockets. I used one PCIe 8-pin per graphics card, with a cable that turned it into 2x 8-pin. I still do this on the new power supply.

A 9900k can't be kept cooled with a NH-L12S!

I actually have this overclocked to 5GHz and with maximum usage on Prime95, it hops around 77-82C. The worst I've seen it on a single core for a split second is 94C. I know that's not ideal, but it never really goes past 70C under a normal heavy load and usually it's 40 to 50C unless I'm gaming.

3)Only 1 stick of ram? WTH? Dual channel mode is a must! That's performance missing from the start.

It's a long story, but now it's back to 4x16GB.

2)A 9900k can't be kept cooled with a NH-L12S! And with those 2x 2080Tis in the case, just skip air cooling all together, and get a 360mm AIO - if your case doesn't support that size, then a 280mm model. 240mm won't cut it.

I'm actually working on this right now.

One of the graphics cards doesn't get cooled properly since it's right behind the other. I'm trying to find a 360 slim radiator. My plan is to use 3x Noctua NF-F12 iPPC 3000 PWM's instead of the fans that come with the case and then have a radiator up to 35mm thick (that's all that will fit on top of the fans, possibly less, maybe 32mm.)

The problem is making sure the radiator can cool them and remain under 60C which is the maximum the pump can handle.

Maybe I can instead do one card and the CPU.


5)What are the specs of your monitors?

It's actually an Nvidia BFGD. 65" 4K 144Hz, but I have it down to 120Hz because it has some problems at the higher rate.

in bios u can set which gpu to start with, or check if there is multimonitor setting, default is usualy peg1/single monitor
as for windows, u can connect monitors to any gpu, but u should want to connect it to a single card (latency reasons)
MSI Afterburner has an option to change "master graphics processor selection"

I'll try this out for the BIOS, to make it the same one as the one Windows uses. However, Windows doesn't seem to want to use the other card. Maybe it's because of SLI?
 

Phaaze88

Titan
Ambassador
I did end up getting a different power supply. I've also measured the usage on high load, it's around 550W maximum. So I guess if you add the inefficiency, it's possible this was the problem.

In any case, I no longer have the problem with this new power supply.

The old one did have enough sockets. I used one PCIe 8-pin per graphics card, with a cable that turned it into 2x 8-pin. I still do this on the new power supply.
Great! (y)
That is one power hungry setup you have.

I actually have this overclocked to 5GHz and with maximum usage on Prime95, it hops around 77-82C. The worst I've seen it on a single core for a split second is 94C. I know that's not ideal, but it never really goes past 70C under a normal heavy load and usually it's 40 to 50C unless I'm gaming.
Prime 95 only pushes the cpu though. If you were to stress both the cpu and SLI 2080Ti, that single core would've throttled. I'm not sure if Asus Realbench works with SLI/XF though.
The reason your gaming temps are low like they are is because of 4K resolution. The higher the resolution/graphics settings, the less of an impact the cpu makes. 4K is a pure gpu workload.

It's a long story, but now it's back to 4x16GB.
Memory frequency and using more than 2 sticks puts extra stress on the cpu's internal memory controller and increases cpu package temps.
Careful not to go too crazy on the memory OC. 3200mhz should be good enough.

I'm actually working on this right now.

One of the graphics cards doesn't get cooled properly since it's right behind the other. I'm trying to find a 360 slim radiator. My plan is to use 3x Noctua NF-F12 iPPC 3000 PWM's instead of the fans that come with the case and then have a radiator up to 35mm thick (that's all that will fit on top of the fans, possibly less, maybe 32mm.)

The problem is making sure the radiator can cool them and remain under 60C which is the maximum the pump can handle.

Maybe I can instead do one card and the CPU.
I feel that's a mistake, but I'll see if I can get a liquid cooling expert to chime in on this.
 

Eximo

Titan
Ambassador
A single 360mm will probably work to keep the system from melting, but not really be enough. With those fans, maybe, but it is going to be a very noisy system.

A single 120mm per component to be cooled is generally the minimum recommendation. And your typical EK 360mm slim brass/copper radiator can dissipate about 500W with 3000RPM (With their fans). So the GPUs alone you could probably get away with and leave the CPU air cooled, or see if you can cram an AIO just for it somewhere else.


I couldn't find my old reference materials, seem to no longer be live websites. They had a nice listing of various radiators and their capabilities.
 
  • Like
Reactions: Phaaze88

xenemorph

Reputable
Feb 5, 2017
6
0
4,510
A single 120mm per component to be cooled is generally the minimum recommendation. And your typical EK 360mm slim brass/copper radiator can dissipate about 500W with 3000RPM (With their fans). So the GPUs alone you could probably get away with and leave the CPU air cooled, or see if you can cram an AIO just for it somewhere else.

I feel that's a mistake, but I'll see if I can get a liquid cooling expert to chime in on this.

Just wanted to give you guys an update.

I ended up going with the EKWB 360 slim cooler, 140 Revo D5 pump, the Noctua 3000RPM fans and PrimoChill coolant (if that matters.) Gaming on ultra settings where I would usually get 88C on one card, and 65-70C on the other, I'm now getting 37C maximum. The CPU also seems to get help by the Noctua fans, max 45C at the most when gaming where it was probably usually maybe 50C maximum. Prime 95 and it gets to 77C-80C instead of 77-82C with some spikes further.

Idle is at 26C for the cards and 36C for the CPU. The bottleneck seems to be the radiator, since it's the same temperatures whether the fans are full speed or not.

I'm thinking of possibly incorporating the CPU into the loop since it went way better than expected. I also noticed that my case wastes around 25MM of space on the filter for the fans, so if I take that out I could put in a thicker radiator and/or do push/pull. Otherwise I might get rid of these fans because they're overkill and loud (unless I can set up auto fan control software, so far no luck.)

Thanks for the help.

EDIT: Looks like I did speak a little too soon, if I overclock a little more and turn some settings extremely higher, and take off the FPS cap of 120, one card does briefly reach 64C with the other around 56C. I'm going to probably just keep things the way they are (the CPU air cooled.)

For now I'm going to try to figure out why there's a temperature discrepancy.
 
Last edited:

Eximo

Titan
Ambassador
A little high on the GPU temps for water cooling, but certainly better than it was. But more or less to be expected. Average GPU temps I've seen for water cooled builds run between 45-50C, but that is usually with more than adequate radiator space. I am not aggressive at all with my fans, but get 55C if I let the system run for a few hours under a heavy load.

Might be just a matter of needing to tighten or re-apply thermal compound on the warmer GPU. Though the one getting the 'cool' water from the radiator might show that difference you are seeing since the second will be receiving the warmed water from the first GPU. Assuming series. If you went parallel, might be a small blockage?
 
  • Like
Reactions: Phaaze88

xenemorph

Reputable
Feb 5, 2017
6
0
4,510
Might be just a matter of needing to tighten or re-apply thermal compound on the warmer GPU. Though the one getting the 'cool' water from the radiator might show that difference you are seeing since the second will be receiving the warmed water from the first GPU. Assuming series. If you went parallel, might be a small blockage?

Right now the issue I have is the front card has some air. I've tried tilting the case, and it temporarily gets better if I lay it flat but it's still there. However, I believe this is the card that's cooler. Yes, I think the second card that's warmer in the series is the card behind the first one (I know this because on Speedfan it's the same card that used to be hotter in the past due to the fan being blocked by the other.)

For that one I haven't removed the first card to take a look and see if it also has an air pocket.

Here's an image of the air pocket if there's any advice. It's basically a photo of the left size of the Nickel-Plexi water block.
 

Eximo

Titan
Ambassador
Full rotation if possible. It can be a bit of a pain to get all the bubbles out, but it pretty much has to be done. The best way is to have the pump running while you do it, but that comes with its own risks. Basically you need to get the bubbles back the reservoir, and from the size of that one, you aren't technically done filling the loop.
 
  • Like
Reactions: Phaaze88
Feb 27, 2020
1
0
10
I see this issue has been solved but want to add one more thing that caused the exact same issues for me across 2 different systems with completely different hardware also in SLI/NVLINK. This might help someone with not the money or expenditure for troubleshooting like the rest of us.

After troubleshooting components from different PSU, MB, CPU, RAM, cabling and even changing the order of connections on the drives and using different cases, I was just stunted.

There was just one thing that was similar on both systems - the windows account I was using!

I had a suspicion it was a windows config, and probably a power related windows setting because the logs pointed to it in the crash log that sometimes also happened after the black screen. So I used a brand new reload upon the 5th or so time and immediately before using the windows account, I disabled the windows account sync setting from the cloud and never had the exact same issues again as described in both point 1 and 2 of xenemorph's description. Yes, this issue carried over onto the BIOS after the blackscreening so it sounds really just too improbable to be windows related yet it was. It might sound crazy but it does seem a setting used on the windows power settings was affecting something that carries over after the syncing of the windows account how power gets distributed on the system. The exact issue plagued me and i was 100% sure the issue was related to GPU or PSU for a year and a half. I honoustly thought the GPU was bad or MB was faulty.

Both PC's now work flawlessly with the windows sync setting off. I say both because i bought that many components to end up with 2 pc's). To add, the issue also cropped up on having just one GPU which also ruled out that it's just power deliverable problem or SLI causing this issue.