Question Tesla K80s not showing up on DL580 G7

Squiggles

Prominent
Feb 14, 2021
21
0
510
I'm trying to install 4 x Tesla K80 GPUs in an HP DL580 G7 server. It has P65 BIOS. I connected all the cables inside, I see one green light near the top of each card from the outside when I turn on the server, but when I go in the OS, I do not see any of the cards. When I try to install the Tesla drivers, I can get error saying, "No compatible hardware detected".

I've tried Windows 10 Enterprise, Windows Server 2012 R2, VMware ESXi, Ubuntu 20.04 (didn't boot successfully but not surprised as it's not officially supported), and RHEL 6.10.

In all cases, no nvidia devices show up in hardware list.

Does anyone here have any ideas about this?
 
I'm trying to install 4 x Tesla K80 GPUs in an HP DL580 G7 server. It has P65 BIOS. I connected all the cables inside, I see one green light near the top of each card from the outside when I turn on the server, but when I go in the OS, I do not see any of the cards. When I try to install the Tesla drivers, I can get error saying, "No compatible hardware detected".

I've tried Windows 10 Enterprise, Windows Server 2012 R2, VMware ESXi, Ubuntu 20.04 (didn't boot successfully but not surprised as it's not officially supported), and RHEL 6.10.

In all cases, no nvidia devices show up in hardware list.

Does anyone here have any ideas about this?
How many CPUs does this chassis have ?
 
Also, on the daughter board where the PCIe slots are, there is an LED display with numbers 00 and 08 flashing in sequence in red, but I think that is normal because I have two other DL580 G7 servers, and they have the same code flashing.
 
Last edited:
OK, 4 cpus, Windows 10 is OUT. It only supports 2 sockets unless you have Windows Workstation.
Start by simplifying your problem. Start with ONE K80. The iLO should tell you if the card shows up in the PCIe correctly.

That's why I installed Windows 10 Enterprise. It supports 4 sockets, but it isn't one of the officially supported OSes, and it was evident as the 4 onboard NICs didn't show up. So that was still a bad idea.
 
That's why I installed Windows 10 Enterprise. It supports 4 sockets, but it isn't one of the officially supported OSes, and it was evident as the 4 onboard NICs didn't show up. So that was still a bad idea.
I think that even Enterprise is a dual socket CPU. They didn't add quad socket support until Workstation.
Not having all the CPUs correctly identified will disable PCIe slots.
You need to get the block diagram and verify which PCIe slots are tied to which CPUs. Again, simplifying, put the K80 in the slot associated with CPU 1.
 
Did you install an extra power supply and connect 2x 8 pin power connectors to each card with their adapter?
You need 1200 watts minimum just for the video cards. Each stock power supply is 800watt@100v 900watt@ 120v and 1200watt @240v.

Yes. I connected all 4 x 1200W PSUs at the back of the server. I also connected the EPS-8-pin to dual PCIe 8-pin Y-cable at the back of each card, and then plugged it in to the 10-pin-to-8-pin power Y cable which is connected to the PCB right behind the PSUs inside the server.

I'm doing this at home so I have 110V which means 4 x 900W = 3600W max power. iLO shows I'm consuming 380W at idle.

I tried this in a Data Center as well where I have 208V on another DL580 G7 with exact same specs, except memory, the DC one has a lot more memory. And even there, it did not detect the Tesla cards.
 
Last edited:
I checked iLO. It shows a bunch of messages like "server reset", "PSU power lost" (when I unplugged PSUs to fiddle with the cables inside), and "System Clock Set" (when I manually corrected time in BIOS). Apart of these three messages, I do not see any other message type in the log.

There is no mention of anything to do with PCI or PCIe or even hardware other than PSU and "server reset".
 
I checked iLO. It shows a bunch of messages like "server reset", "PSU power lost" (when I unplugged PSUs to fiddle with the cables inside), and "System Clock Set" (when I manually corrected time in BIOS). Apart of these three messages, I do not see any other message type in the log.

There is no mention of anything to do with PCI or PCIe or even hardware other than PSU and "server reset".
I thought (I don't have a DL handy) that you could see the system inventory, includeing PCIe cards, via the iLO web page.
 
I thought (I don't have a DL handy) that you could see the system inventory, includeing PCIe cards, via the iLO web page.

I can post screenshots of all pages in iLO console if you'd like. I see a bunch of iLO Admin pages, then Events/Logs pages and one that says System Information.

In System Information it shows CPUs, Memory, Drives, PSUs, iLO, and they all have "OK" green icon.

I don't see anything that specifically lists PCI/PCIe devices.

I have iLO Advanced License activated on the server so it's not preventing me from accessing any pages that I know of unless there's a iLO Super Deluxe package that unlocks even more pages.
 
Some additional info, in the BIOS I enabled "Force PCIe Gen 2". DL580 G7 doesn't have Gen3 PCIe slots but Tesla K80 "should" still be detected and usable on older generation PCIe slots and run at slower speeds. Or at least, that's my understanding of how PCIe devices work. I hope I'm right.
 
Without the driver installed, does Windows server show unknown devices? Does the PCIe string match a K80 ?
Have these K80 GPUs ever worked?

In Device Manager there are no uninstalled devices. Everything shows up as installed. These K80s don't show up in that list.

I never saw these K80s work. I bought them from two different sellers on eBay. So, I'd be surprised if they're all dead.
 
In Device Manager there are no uninstalled devices. Everything shows up as installed. These K80s don't show up in that list.

I never saw these K80s work. I bought them from two different sellers on eBay. So, I'd be surprised if they're all dead.
If they don't show up as unknown devices, there is something much more basic wrong.
Looking at the quckspecs for the DL580G7 I see this note
NOTE: The HP ProLiant DL580 G7 can support up to 4 (225W) or 3(300W) Graphic
Cards and/or GPGPU Accelerator Options (x16 PCIE add-in cards in slots 11 and 9 will
down-train electrically to x8 PCIE which may result in reduced performance.)
At least HP says you can't run 4 of those cards.
NVIDIA spec sheet says the K80 is a 300W card.
 
If they don't show up as unknown devices, there is something much more basic wrong.
Looking at the quckspecs for the DL580G7 I see this note
NOTE: The HP ProLiant DL580 G7 can support up to 4 (225W) or 3(300W) Graphic
Cards and/or GPGPU Accelerator Options (x16 PCIE add-in cards in slots 11 and 9 will
down-train electrically to x8 PCIE which may result in reduced performance.)
At least HP says you can't run 4 of those cards.
NVIDIA spec sheet says the K80 is a 300W card.

Good find. So when I have just one card. That should work? Or am I reading this wrong.
 
I tried a known good graphics card that didn't require power, and that one worked. Although it still didn't show up in Device Manager. It was an older GeForce GTxxxx card. I don't remember the exact model.

I tried another graphics card which required 6-pin power, but I had never tried that card before, it was purchased as a working card. It did not show up anywhere in Device Manager just like the Tesla K80 didn't show up.

So based on this, I am going to take a multimeter and check the 8-pin cable in the chassis to confirm that it does provide power where expected. If the cable provides power, then I will take the Tesla K80s to a computer repair shop and ask them to test it on their test bench. If the K80s work on their test bench, then K80 isn't compatible with DL580 G7, if the K80s don't work on their test bench, then K80s are toast.
 
I couldn't find my multimeter so I took the K80s to a repair shop. They said all four show up in Device Manager on their test bench.

So, now either the cable in the server is not providing adequate power or K80s are not compatible with DL580 G7, or K80s aren't backwards compatible with PCIe 2.0 slots.

Either way, I think I'll probably need to find a different host for these cards.

Thanks everyone for the help in troubleshooting this.
 

TRENDING THREADS