Question How do I get the Mellanox MCX516A-CDAT 2x100Gb NIC to run at 2x100Gb?

Aug 24, 2022
2
0
10
8 Data Devices: 25Gb output with each device producing 16.4Gb/sec of data
1 Mellanox MSN2010-Onyx 18-Port Switch: 8 25G devices in / 2 100Gb Ports out
1 Mellanox MCX516A-CDAT 2x100Gb NIC: 2 100Gb Ports connected to the above switch via 2 DACs / the board is PCIe x16 Gen 4 device plugged into a PCIe x16 Gen4 slot
Windows 10 Pro
AMD Threadripper CPU

The problem we are having is that when we enable all of the devices, we are not getting 100Gb throughput for each NIC port despite all of the settings and information for the NIC indicating that each port is running at 100Gb/16GTs. We can successfully stream data from 6 devices (98.4Gb/sec), but when we add the remaining 2 devices, we lose data (essentially when we surpass 100Gb total throughput). We see that the board maxes out at 100Gb instead of 200Gb across both ports (which the NIC specs indicate it can do). Throughout our investigation, we have not been able to figure out how to get the NIC to support 2 x 100Gb throughput, and there does not appear to be an obvious setting.

Any suggestions would be greatly appreciated. Thanks in advance.
 
I have no background with that brand of equipment so I can't be 100% sure what is supported.

First you need to read the specs on the switch and see what the maximum backplane speed is. On a lower end switch every port can run at full speed up and down. So a 8 port 1gbit switch would have 16gbit backplane to be what is called nonblock/wirespeed.
It has been years since I used high end equipment like this but there were internal bottlenecks in some of this equipment. Maybe there is a 100gbit limit

Next what method of port bonding are you using. The standard method is 802.3ad. This method uses a mathematical path selection method. It has a few options but what it does over simplistically is add the source and destination port and ip addresses and if it gets a even number it runs on port 1 and if it gets a odd number it runs on port 2. It does not care about how much load is on one path and it can if you are unlucky choose the same path for all the sessions. It does this to prevent the packet out of order problem you get if it would load balance by packet.

Now there are some proprietary forms of port aggregation that work differently. The best actually do things manipulate the packet size to avoid the packet out of order issue and then reassemble them. It generally is only done by server type devices.
Most switches only support 802.3ad.

You need to read all the details and see how port bonding/aggregation is done on your equipment.
 
Aug 24, 2022
2
0
10
I have no background with that brand of equipment so I can't be 100% sure what is supported.

First you need to read the specs on the switch and see what the maximum backplane speed is. On a lower end switch every port can run at full speed up and down. So a 8 port 1gbit switch would have 16gbit backplane to be what is called nonblock/wirespeed.
It has been years since I used high end equipment like this but there were internal bottlenecks in some of this equipment. Maybe there is a 100gbit limit

Next what method of port bonding are you using. The standard method is 802.3ad. This method uses a mathematical path selection method. It has a few options but what it does over simplistically is add the source and destination port and ip addresses and if it gets a even number it runs on port 1 and if it gets a odd number it runs on port 2. It does not care about how much load is on one path and it can if you are unlucky choose the same path for all the sessions. It does this to prevent the packet out of order problem you get if it would load balance by packet.

Now there are some proprietary forms of port aggregation that work differently. The best actually do things manipulate the packet size to avoid the packet out of order issue and then reassemble them. It generally is only done by server type devices.
Most switches only support 802.3ad.

You need to read all the details and see how port bonding/aggregation is done on your equipment.
Thank you for your reply. The problem we are experiencing is on the NIC side, and despite going through all of the settings and documentation there is nothing obvious that solves this problem. Please note that we already have experience using this hardware combination, but typically we only use 1 of the 100Gb ports and slower devices, so we have never had to think about this. Also, note that we are not doing any kind of aggregation here. From the NIC specifications: can support 200Gb/s when both network ports of MCX516A-CDAT run at 100Gb/s. This is based on this PCIe x16 Gen4 device being plugged into a PCIe x16 Gen4 slot.
 
Last edited:
You can't really have 2 ports hooked to the same network without some form of port bonding. It depends on the OS what it will do by default but it will not use both. In some cases it does not use 1 or maybe blocks it with spanning tree. Worst case you get a network loo

From what I can tell both the switch and the card support LAG which is another name for 802.3ad/ax. Maybe the support something else also but the manuals for the nic card are very limited.