Question Switch Problem?

LittleCreekHosting

Distinguished
Nov 20, 2015
29
0
18,530
Keep in mind this in a commercial environment in a data center. My router is just a regular 1U computer running Linux with 2 interfaces, one for the WAN and one for the LAN. It routes public ip's to the rest of my servers. It's on a 1Gb network. Both router and switch are 1Gbps ports.

Everything has been great for the last year but now I have been gaining a lot of customers recently and now I am having an issue with throughput. My speed from the router itself is great nearly the 1Gb. But the speed from any of the individual servers on the inside is sometimes only 300 Mbps or less depending on the timing of the test. Sometimes its under 100 Mbps. But when I test from servers on the inside the speed is good. Its only bad when going to out to the internet or to the router.

So I feel like I have isolated it to the port on the switch or the LAN interface on the router. My question is there some kind of limit on the number of connections on the switch port or the ethernet port? And which one is likely the problem. It's not strictly a bandwidth issue because earlier I had good bandwidth. It's seems the number of connections has something to do with it.
 
Are you using consumer grade equipment for your network. The problem with enterprise type of stuff is it is much more complex and subject to misconfiguration.

In general most switches are considered non blocking wire speed devices. This is not always true if you have lots of 10gbit ports though. What this means is every port can run at the full rated speed both up and down all at the same time. So if you have say a consumer grade switch with 8 ports it can pass a total of 16gbit of traffic, even though there is no real world case that would come even close to using that much data.

This makes it more likely there is some issue with your "router". This is actually where a cheap consumer router has a advantage over pc. Consumer routers have a feature that allows all the overhead of rebuilding the headers when NAT is done to be offloaded to a hardware chip rather than the cpu chip do it. Consumer routers can pass full 1gbit traffic wan/lan but if you disable the feature many drop to 300-400. But the cpu power in a pc should easily be able to do just the nat function.

The problem is most people who are using a pc as a router are doing it for some reason like they want firewall rules or maybe traffic monitoring. All this takes cpu power. If you put in say a lot of rules it can get limited.
The number of active session should not matter a lot unless the number is excessive..think thousands. Then again this is just memory and cpu usage and again is the reason people use a pc rather than a consumer router. The number of physical connections doesn't mean much since the router only see 1. It might see all the mac addresses but again it is going to take massive numbers of mac addresses to overload things like arp tables.

I would check your resouce utilization in router/pc to see if you are hitting limits. Be aware many router functions can only use 1 core so be careful when you look at the cpu usage.
 

LittleCreekHosting

Distinguished
Nov 20, 2015
29
0
18,530
Thank you for your help.

This is the switch I am using https://www.amazon.com/gp/product/B0779R9LJ3/

The router consists of Gigabyte G31M-ES2L and 4 GB of memory and the processor is Intel E7500.

I do have Snort running but the load average is still around 2 - 3. Memory is 75% used.

The onboard ethernet is connected to the LAN and after 60 days is showing this:
RX errors 0 dropped 6892748 overruns 6892748 frame 6892748

But the network addon card which is connected to the WAN/Internet is showing 0 for all of those.

There is no NAT happening. These are all public IPs and Linux is just forwarding the IP's.
 
The switch should be fine. I did not dig around the specs but the link you provided says it is nonblocking which means it should never delay the traffic.

If I remember correctly overrun traffic generally is caused by your router not taking the data out of the buffer as fast as it arrives.
what is strange is you get the same number of framing errors but again if I remember correctly these are caused by other things.
Been years since I saw these was mostly back when a port was running half duplex.

It would be rare for that to happen but I guess you could check if the port on your pc is full duplex. It would also drop to 100mbps in most cases which would be much more noticeable. There is no such thing as 1gbit half duplex.

I would try a different cable it could be the cable is causing errors. But those tend to be rx errors more than crc/frame errors.

So what I would next try is to transfer large files between 2 ports on the switch. You can use the IPERF command if you want it should get 900+mbps in both directions. If this work I would then hook the router to that port on the switch with the same cable the pc tested with. Kinda a crude method but unmanged switches have no ability to see anything other than lights.

Can you change the software in the router to swap the 2 ethernet ports lan/wan. The problem should move to the wan port if there is something wrong with the port itself.
 

LittleCreekHosting

Distinguished
Nov 20, 2015
29
0
18,530
The dropped and overruns number has not gotten any higher so that may be an unrelated event.

I did notice that ksoftirqd/0 was using quite a bit of CPU, sometimes it was over 50%. I read that could be because a high speed network card receives a very large number of packets in a short time frame. It appears the packets may be getting queued instead of being sent immediately.

It appears to me the CPU is not enough to handle the traffic. Does that sound reasonable?
 
Your guess is as good as mine I guess. I always used real routers...ie from cisco, juniper etc. in business installs. These devices have very clear data rates documented so you know if it will bottleneck your network. Pc running linux routers take a lot of digging to find since there are so many "routers" images you can get.

Although it likely will make the problem worse if you run it long term I would run tcpdump and wireshark to see if can see the data rates you get packets and maybe where they come from. I would capture just the headers since you don't really care about the data and this reduces the load it puts on the machine.

What you would be looking for is burst of traffic. Not sure if your switch has spanning tree since it is unmanged but it many have it. You I guess could plug 2 switch port directly together and see if the switch locks up shortly after. Sometime is you have dual nic servers or vm it can get a loop condition. If you can't loop it with a cable then it means the switch is preventing it.

You still could be getting some garbage traffic from some machine with a problem on your network.

In someways this is where a managed switch helps. You can run reports that show traffic utilization on each port.
 

LittleCreekHosting

Distinguished
Nov 20, 2015
29
0
18,530
I think I figured it out. Every day there are cancellations and suspensions for non payment. I also suspend ones who have abuse complaints against them. So at some point I noticed everything was working properly again. ksoftirqd/0 was under 1% again. So I think a VM was doing something nefarious and at some point got shut down for one of the above reason. I think I couldn't detect it on the Linux box because everything was getting queued and nothing stood out. I think in this case a managed switch would have helped for sure.

Thank you for your help.