Question Regular long lag spikes - how to debug?

Nakroma

Honorable
Jul 29, 2019
16
0
10,510
Hello,

I'm currently experiencing pretty regular lag spikes, I'd say roughly every hour, sometimes a bit more often, sometimes a little less often. They last from a couple of seconds to about a minute. During them almost all of the internet is completely gone - it's a bit weird, for example an online stream or online video game will stop having connection, but voice chat or the chat of the stream will be a bit degraded but keep going .
I've only noticed them now because I've been playing more online games recently, but I don't know since when they occur, a year or so ago I definitely didn't have this problem.

The only thing that has changed since then, is that I've added a NAS to the network, however I took that offline for testing and the lag spikes still occured. Obviously tried router restarting but that doesn't help either. It also occurs on the entire network and not just my PC as far as I can tell, I've tested it with another PC and during a lag spike it was also noticable on there.

Some more details:
  • I have 500 Mbit/s download, 30 Mbit/s upload from PYUR (Germany) cable (DOCSIS)
  • Router is a FRITZ!Box 6660 Cable
  • Connected to the router via ethernet are 3 devices (2 PCs and the NAS), connected via WLAN are roughly 7 (2 phones, 2 laptops, 1 tablet, 1-2 smart TV related devices) (this number obv changes sometimes depending on whats on and whats not but that's the gist of it)
I'm asking here because I know very little about networking and don't even know where to start debugging this. The fritzbox doesn't show any obvious errors or unusual usage during these lag spikes, I've tried running Wireshark and nothing looked unusual (altough I can barely read it to begin with).

Thanks for your help in advance!
 
did you check while connected by ethernet or wlan?
did you check the cable frequencies in the Fritzbox, while the lag is happening? any lost or all there?
check with your local ISP PYUR, if they can have a look at the connection speed or time outs. Cable is a shared medium, so might be your neighbors downloading/uploading at the same time and the bandwidth is not enough for all of you.
Might be the fritzbox having problems
 
  • Like
Reactions: Nakroma

Ralston18

Titan
Moderator
Also check your system's logs: Reliability History/Monitor and Event Viewer.

Either one or both tools may be capturing some error code, warning, or even an informational event just before or at the spikes.

Also look in Task Manager > Startup for any unknown or unexpected apps being launched at startup.

Another place to look is Task Scheduler. There may some trigger in place that launches some app to attempt a backup, update, or simply just "phone home".

Lastly you can use Process Explorer (Microsoft, free) to observe system performance. If the lags are regular start watching beforehand .

Objective simply being to discover what the system is doing or trying to do at the time of the lags.

FYI:

https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer

Take your time, be methodical, watch carefully. No need to rush.
\
\
 
  • Like
Reactions: Nakroma
If you used wireshark that is the hard way to look at problems like this. If you work at it you can see the exactly latency for any tcp session packet by packets.

You can more simply see this in the network tab of the resource monitor. The problem is games do not use TCP so you can't see the delays as easily.

In any case the main way to test latency is simple ping commands.

First do tracert 8.8.8.8. This will not show the problem unless you are extremely luck it is mostly to get the IP in the path.

Open at least 3 cmd windows and let constant ping run to the IP in hop 1 (your router), hop 2 ( the first ISP router for most people), and the final ip of 8.8.8.8 or maybe the game server if you can get it.

In general issues with ping to hop 1 indicates a issue with your pc and rarely the router. Issues in hop 2 indicate a issue with the connection coming to your house. Issues past there get tricky to get fixed even if you find them. It could be a issue with connectivity between ISP which are not your ISP.
 
  • Like
Reactions: Nakroma

Nakroma

Honorable
Jul 29, 2019
16
0
10,510
did you check while connected by ethernet or wlan?
did you check the cable frequencies in the Fritzbox, while the lag is happening? any lost or all there?
check with your local ISP PYUR, if they can have a look at the connection speed or time outs. Cable is a shared medium, so might be your neighbors downloading/uploading at the same time and the bandwidth is not enough for all of you.
Might be the fritzbox having problems
  • I double checked it and it's the entire network, so all devices no matter if LAN or WLAN
  • Not sure if I read that correctly but the frequency graph table thingy doesn't change during the spikes
  • I wrote to my ISP, just not sure when they look into it, hopefully that will help though. The thing is my friend who has the same ISP lives literally in the building next to me and doesn't have these issues.
  • I'm wondering if it would make sense to reset the fritz.box to factory settings, because that seems to me the only common point in all of this

Also check your system's logs: Reliability History/Monitor and Event Viewer.

Either one or both tools may be capturing some error code, warning, or even an informational event just before or at the spikes.

Also look in Task Manager > Startup for any unknown or unexpected apps being launched at startup.

Another place to look is Task Scheduler. There may some trigger in place that launches some app to attempt a backup, update, or simply just "phone home".

Lastly you can use Process Explorer (Microsoft, free) to observe system performance. If the lags are regular start watching beforehand .

Objective simply being to discover what the system is doing or trying to do at the time of the lags.

FYI:

https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer

Take your time, be methodical, watch carefully. No need to rush.
\
\
Hey, thanks for the big write up, I'm not sure doing this will help though since the issue affects the entire network, not just my computer.

If you used wireshark that is the hard way to look at problems like this. If you work at it you can see the exactly latency for any tcp session packet by packets.

You can more simply see this in the network tab of the resource monitor. The problem is games do not use TCP so you can't see the delays as easily.

In any case the main way to test latency is simple ping commands.

First do tracert 8.8.8.8. This will not show the problem unless you are extremely luck it is mostly to get the IP in the path.

Open at least 3 cmd windows and let constant ping run to the IP in hop 1 (your router), hop 2 ( the first ISP router for most people), and the final ip of 8.8.8.8 or maybe the game server if you can get it.

In general issues with ping to hop 1 indicates a issue with your pc and rarely the router. Issues in hop 2 indicate a issue with the connection coming to your house. Issues past there get tricky to get fixed even if you find them. It could be a issue with connectivity between ISP which are not your ISP.
Thanks for sharing this method, I think that's a really good tool to have. I tried it and the weird thing is the pings just don't seem affected? During the lag spikes some pings seem to go missing (i.e. icmp_seq increases by more than 1) but other than that the response time is the same as during normal periods.
 
The ping actually IS showing the lag. Although many people think lag is related to latency the very large lag spikes in games are caused by packet loss.

In many ways packet loss is better than a increase in the latency. Packet loss is mostly caused by some kind or error or defective equipment. Also because the latency on ping before and after the loss is normal this means it is not a overloaded link condition. A overloaded link if it gets bad enough packets are discarded.

So now it is a matter of figuring out which hop the packet loss occurs at. The most common one is on hop 2 which means there is likely some issue with the cabling coming to your house. The other would be hop 1 which indicates a issue inside your house.
 
  • Like
Reactions: Nakroma

Nakroma

Honorable
Jul 29, 2019
16
0
10,510
The ping actually IS showing the lag. Although many people think lag is related to latency the very large lag spikes in games are caused by packet loss.

In many ways packet loss is better than a increase in the latency. Packet loss is mostly caused by some kind or error or defective equipment. Also because the latency on ping before and after the loss is normal this means it is not a overloaded link condition. A overloaded link if it gets bad enough packets are discarded.

So now it is a matter of figuring out which hop the packet loss occurs at. The most common one is on hop 2 which means there is likely some issue with the cabling coming to your house. The other would be hop 1 which indicates a issue inside your house.
Thanks for explaining that! I'll test it again with all 3 hops, altough that will probably take a bit to reproduce and double check.

Btw I'm checking the tracert documentation right now and just now realize that the second hop is timing out for me (I just ignored the "* * *" before, not realizing it means something). If it's just about determining if the issue is with the router or past it, taking the third hop instead would also work right?
 
Assuming you only have 1 router in your house that means the ISP has configured the first router to not respond to ping or trace. It is not uncommon to have restrictions place on routers related to test traffic. Mostly this is to prevent abuse like a denial of service attack using ping. Idiot children losing shooter games have been known to attack servers and routers.

Be better if all routers responded then you know where the problem really is. The ISP likely can tell if the problem is between the first router and your house or between the 2nd and 3rd router. You really want the problem to be in the cable coming to your house, those are the simplest for the ISP to find and fix. They are generally something like water or dirt in a cable splice that is reducing the signal levels so data gets damaged more easily.
 
  • Like
Reactions: Nakroma

Nakroma

Honorable
Jul 29, 2019
16
0
10,510
Assuming you only have 1 router in your house that means the ISP has configured the first router to not respond to ping or trace. It is not uncommon to have restrictions place on routers related to test traffic. Mostly this is to prevent abuse like a denial of service attack using ping. Idiot children losing shooter games have been known to attack servers and routers.

Be better if all routers responded then you know where the problem really is. The ISP likely can tell if the problem is between the first router and your house or between the 2nd and 3rd router. You really want the problem to be in the cable coming to your house, those are the simplest for the ISP to find and fix. They are generally something like water or dirt in a cable splice that is reducing the signal levels so data gets damaged more easily.
Okey so I let it run for the entire day (roughly 13 hours) and here are the results:
  • Hop 1: 10 Packets lost (out of 46397 -> 0.022% packet loss)
  • Hop 3: 155 Packets lost (out of 48020 -> 0.323% packet loss)
  • 8.8.8.8: 139 Packets lost (out of 48023 -> 0.289% packet loss)
So this would indicate the problem is between my router and hop 3, is that right?
 
Yes that is what it means.

This is going to be a tough one. Your loss is so low it is going to be hard to convince the ISP there is a issue. They will test for a couple minutes see nothing and declare there is no problem.

It depends how this loss occurs. Generally you will not see a lost packet here and there. Most times when you see lag in a game you lost multiple packets in a row. Now ping does not test fast enough to see say multiple packets loss in 1 second but most people who are having issue will see 2 or 3 ping lost in a row and then it be fine again or it lose packets fast enough that you will see multiple on the screen separated by good pings.

If you had done this only for a short time I would have recommended you test when there was a load. More errors happen when the connection is being used that when it is idle.

1/3 of 1 % is not a lot of loss. You will feel it in games but it will not be as obvious as if you had say 2% loss.

If this is a cable modem check the signal level and other values to see if they are in acceptable range. All cable modem/routers I have seen have these levels but where they hide them varies a bit between brands.
You will find tables of recommended levels if you search the exact values depend on the type of docsis being used. This really is the ISP job to check the levels but you might see something and can point it out to them.
 
  • Like
Reactions: Nakroma

Nakroma

Honorable
Jul 29, 2019
16
0
10,510
Yes that is what it means.

This is going to be a tough one. Your loss is so low it is going to be hard to convince the ISP there is a issue. They will test for a couple minutes see nothing and declare there is no problem.

It depends how this loss occurs. Generally you will not see a lost packet here and there. Most times when you see lag in a game you lost multiple packets in a row. Now ping does not test fast enough to see say multiple packets loss in 1 second but most people who are having issue will see 2 or 3 ping lost in a row and then it be fine again or it lose packets fast enough that you will see multiple on the screen separated by good pings.

If you had done this only for a short time I would have recommended you test when there was a load. More errors happen when the connection is being used that when it is idle.

1/3 of 1 % is not a lot of loss. You will feel it in games but it will not be as obvious as if you had say 2% loss.

If this is a cable modem check the signal level and other values to see if they are in acceptable range. All cable modem/routers I have seen have these levels but where they hide them varies a bit between brands.
You will find tables of recommended levels if you search the exact values depend on the type of docsis being used. This really is the ISP job to check the levels but you might see something and can point it out to them.

Alright, thanks a ton for the help. I've captured 1-2 minutes during one of those lag spikes and will send that to the ISP. Lets hope they do something. I'll go doublecheck the levels as well:
--- 8.8.8.8 ping statistics ---
112 packets transmitted, 83 received, 25.8929% packet loss, time 112590ms
rtt min/avg/max/mdev = 22.642/26.712/38.147/3.868 ms