Hi
I have a problem with some network-cards, that puzzles me.
I run a Proxmox-based virtualisation system at my job.
It consists of 8 hosts each with two network cards:
A normal on-board 1-gbit intel card + a Intel X540-T2 10gbit card
The 10gbit card has its 2 connections bundled together with LACP and it connects through a some HP 1950 10gbit switches that are bundled together with IRF.
The storage for the Proxmox-cluster is two big Synology-NAS's that also connect to this 10gbit HP switch.
Some days ago I noticed that my Proxmox backup-speed had gone down on SOME of the 8 hosts - not on all of them.
And I did some testing with iperf3 and got this result:
On some hosts the receiving of data is associated with a lot of tcp-retry-errors.
ALL hosts can transmit flawlessly IF they transmit to one of the hosts, that are able to receive flawlessly, example
root@pve11:~# iperf3 -c 192.168.50.13
Connecting to host 192.168.50.13, port 5201
[ 5] local 192.168.50.11 port 40840 connected to 192.168.50.13 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.10 GBytes 9.41 Gbits/sec 7 1.28 MBytes
[ 5] 1.00-2.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.37 MBytes
[ 5] 2.00-3.00 sec 1.09 GBytes 9.39 Gbits/sec 6 1.38 MBytes
[ 5] 3.00-4.00 sec 1.09 GBytes 9.40 Gbits/sec 0 1.39 MBytes
[ 5] 4.00-5.00 sec 1.09 GBytes 9.38 Gbits/sec 7 1.40 MBytes
[ 5] 5.00-6.00 sec 1.09 GBytes 9.40 Gbits/sec 0 1.41 MBytes
[ 5] 6.00-7.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.41 MBytes
[ 5] 7.00-8.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.43 MBytes
[ 5] 8.00-9.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.45 MBytes
[ 5] 9.00-10.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.9 GBytes 9.39 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 10.9 GBytes 9.39 Gbits/sec receiver
but SOME hosts are not able to receive without retry-errors, example
root@pve11:~# iperf3 -c 192.168.50.12
Connecting to host 192.168.50.12, port 5201
[ 5] local 192.168.50.11 port 42404 connected to 192.168.50.12 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 931 MBytes 7.81 Gbits/sec 3796 1.06 MBytes
[ 5] 1.00-2.00 sec 798 MBytes 6.69 Gbits/sec 2657 957 KBytes
[ 5] 2.00-3.00 sec 685 MBytes 5.75 Gbits/sec 1983 492 KBytes
[ 5] 3.00-4.00 sec 820 MBytes 6.88 Gbits/sec 2250 1.58 MBytes
[ 5] 4.00-5.00 sec 908 MBytes 7.61 Gbits/sec 4489 243 KBytes
[ 5] 5.00-6.00 sec 848 MBytes 7.11 Gbits/sec 1921 267 KBytes
[ 5] 6.00-7.00 sec 736 MBytes 6.18 Gbits/sec 4022 680 KBytes
[ 5] 7.00-8.00 sec 792 MBytes 6.65 Gbits/sec 2808 257 KBytes
[ 5] 8.00-9.00 sec 840 MBytes 7.05 Gbits/sec 7461 1.38 MBytes
[ 5] 9.00-10.00 sec 718 MBytes 6.02 Gbits/sec 1563 768 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 7.89 GBytes 6.77 Gbits/sec 32950 sender
[ 5] 0.00-10.00 sec 7.88 GBytes 6.77 Gbits/sec receiver
SO a few hosts have this problem, and before I did all these iperf3-tests I was suspicious about the switch.
Now I lean more towards, that the NIC's in the involved hosts have some issue.
What are your thoughts on this?
I have a problem with some network-cards, that puzzles me.
I run a Proxmox-based virtualisation system at my job.
It consists of 8 hosts each with two network cards:
A normal on-board 1-gbit intel card + a Intel X540-T2 10gbit card
The 10gbit card has its 2 connections bundled together with LACP and it connects through a some HP 1950 10gbit switches that are bundled together with IRF.
The storage for the Proxmox-cluster is two big Synology-NAS's that also connect to this 10gbit HP switch.
Some days ago I noticed that my Proxmox backup-speed had gone down on SOME of the 8 hosts - not on all of them.
And I did some testing with iperf3 and got this result:
On some hosts the receiving of data is associated with a lot of tcp-retry-errors.
ALL hosts can transmit flawlessly IF they transmit to one of the hosts, that are able to receive flawlessly, example
root@pve11:~# iperf3 -c 192.168.50.13
Connecting to host 192.168.50.13, port 5201
[ 5] local 192.168.50.11 port 40840 connected to 192.168.50.13 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.10 GBytes 9.41 Gbits/sec 7 1.28 MBytes
[ 5] 1.00-2.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.37 MBytes
[ 5] 2.00-3.00 sec 1.09 GBytes 9.39 Gbits/sec 6 1.38 MBytes
[ 5] 3.00-4.00 sec 1.09 GBytes 9.40 Gbits/sec 0 1.39 MBytes
[ 5] 4.00-5.00 sec 1.09 GBytes 9.38 Gbits/sec 7 1.40 MBytes
[ 5] 5.00-6.00 sec 1.09 GBytes 9.40 Gbits/sec 0 1.41 MBytes
[ 5] 6.00-7.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.41 MBytes
[ 5] 7.00-8.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.43 MBytes
[ 5] 8.00-9.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.45 MBytes
[ 5] 9.00-10.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.9 GBytes 9.39 Gbits/sec 20 sender
[ 5] 0.00-10.00 sec 10.9 GBytes 9.39 Gbits/sec receiver
but SOME hosts are not able to receive without retry-errors, example
root@pve11:~# iperf3 -c 192.168.50.12
Connecting to host 192.168.50.12, port 5201
[ 5] local 192.168.50.11 port 42404 connected to 192.168.50.12 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 931 MBytes 7.81 Gbits/sec 3796 1.06 MBytes
[ 5] 1.00-2.00 sec 798 MBytes 6.69 Gbits/sec 2657 957 KBytes
[ 5] 2.00-3.00 sec 685 MBytes 5.75 Gbits/sec 1983 492 KBytes
[ 5] 3.00-4.00 sec 820 MBytes 6.88 Gbits/sec 2250 1.58 MBytes
[ 5] 4.00-5.00 sec 908 MBytes 7.61 Gbits/sec 4489 243 KBytes
[ 5] 5.00-6.00 sec 848 MBytes 7.11 Gbits/sec 1921 267 KBytes
[ 5] 6.00-7.00 sec 736 MBytes 6.18 Gbits/sec 4022 680 KBytes
[ 5] 7.00-8.00 sec 792 MBytes 6.65 Gbits/sec 2808 257 KBytes
[ 5] 8.00-9.00 sec 840 MBytes 7.05 Gbits/sec 7461 1.38 MBytes
[ 5] 9.00-10.00 sec 718 MBytes 6.02 Gbits/sec 1563 768 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 7.89 GBytes 6.77 Gbits/sec 32950 sender
[ 5] 0.00-10.00 sec 7.88 GBytes 6.77 Gbits/sec receiver
SO a few hosts have this problem, and before I did all these iperf3-tests I was suspicious about the switch.
Now I lean more towards, that the NIC's in the involved hosts have some issue.
What are your thoughts on this?