Using Wireshark to Troubleshoot DNS over IPsec Tunnels

WildMonkey365

Commendable
Aug 30, 2016
77
0
1,640
I have remote offices connecting to a Windows DNS server at their corporarte location through IPsec tunnels. 2 or 3 tickets come in a day from any of the sites reporting no DNS resolition. Either the PC has to be rebooted or DNS client Services needs to get restarted. The routers are brand new and the manufacture confirmed my settings. The sites have also been set up correctly and confirmed and. The problem is definitly narrowed down to ISP or cheap equipment. How can I verify where the issue lies using Wireshark? I have the ability to take pcaps from LAN or WAN on the routers.

 
Solution
In general ping will not be affected by that setting because the packets are too small. It is why you can get loss of actual data but ping will say there is no problem unless you use max size packets.

You basically have 2 choices. You do not allow the fragmentation of packets and the router with the vpn tunnel wills end back a message saying that the packet it too big. The MTU discovery in the ip stack on the end machine should then send smaller packets. Problem is these ICMP messages are sometimes filtered by firewalls so the end machine does no adjust the packet size. The other option is to allow the fragmentation and there are 2 variations. Packets that have a do not fragment flag set are still fragmented but the far end vpn...
Your best bet is to have wireshark running on the DNS server and at the remote site. This lets you compare and see how far the traffic is actually getting.
You will need to capture it on the lan side of things, best with a filer that only captures traffic to the dns server. Although you technically can capture and decode ipsec traffic when you have the keys you have to have full captures from the start so you can get the session keys as they change since any preshared key is only used to get things started before the unique session keys are generated.
 



That makes sense. Are you familiar with WAN vs LAN traffic Priority on these types on MPLS setups? There is an option to give LAN priority on all routers to optimize traffic from local subnets connecting to a domain controller/ DNS Server at the main site.

 
It is going to depend what the MPLS provider has setup. In most cases they will have a bunch of standard QoS markings. Have to ask them exactly what they defined them to mean. Most times if you want EF for something like VoIP you have to pay extra because they honor the EF marking even in their cloud. What most vpn does it copies the QoS marking from the inside packet to the encrypted packet. Sometime this is disabled because it does give you some information about what is inside the packets.
 
Question for you. There is a port speed mismatch on the WAN port of this customers firewall. By that I mean the ISP Ethernet Port handing off the public address/internet connection to the WAN port of the customer firewall is only showing at 100mb on the customer firewalls port info menu. The ISP says the adtran router handing that internet connection off to the customers firewall only has 10/100mb ports not gigabit. The customers firewall autosenses at gigabit and it cannot be hard set to 100 full. I have the ISP replacing the adtran tommorow with a gigabit handoff port which should show on my port info menu of the customers firewall. The problem right now is there is significant packet loss on all the tunnels of all 6 remote sites. There has been RDP connection issues & DNS issues at all sites. The main site with the port speed mismatch has the DNS server, RDP server and programs that all devices at all 6 sites use. The main site is the only site with a WAN port auto sensing at 100mb full. The other sites ISP equipment has gigabit ethernet hand offs. I have pinged locally at the main site to see if there may be a loop causing the packet loss at all 6 sites but pinging locally hasnt dropped any pings. I'm counting on this ISP equipment upgrade to work tommorow or the customer will probably want to rip out all my firewalls. Can you help me? Could the mismatch be the magic ticket once resolved?
 
You would only see issues if one side went half duplex. If both run 100m full it should run with no errors. I would rig the firewall to allow you run tracert and ping from the firewall and no force the traffic into the tunnels. You could then run tracert and pings to find the source of the errors. It likely is in the connection to the ISP but it really could be anyplace.

Still you want to leave all the equipment set to auto. Setting it to any fixed value if you do not do both will cause the side that is still auto to go into half duplex because it gets no response to the auto negotiation.
 
I have been pinging things like 8.8.8.8 & cnn.com with little or no dropped pings. I can also ping the pubic address of the customers firewall as well as the ISP Gateway modem providing internet to the main site with no drops on hours of pings. I also pinged from remote site LAN to main site LAN all weekend thats 140,000 (one hundred & forty thousand pings) with only 100 drops. This to me doesnt seem bad. The real problem is the RDP computers at remote sites sometimes loose their connection to the main site server for either a second and the session will pick back up or drop totally and have to be restarted. There is also a DNS client issues where the DNS client service on a computer may need to be restarted in order to resolve again. This happens less at 4 of the remote sites and never gets reported to the IT team but at 2 sites it happens atleast once a day. Im getting pretty desperate because they arent use to this problem when they had their ISP provided mpls. The tunnel integrity seems fine to me with the kind of packet loss I mentioned above but I cant prove what is causing the RDP & DNS connection issues at the remote sites. The IT guy has set the RDP to connect via IP not Host name which stopped the DNS issue but still gets a brief pause in RDP at one site in particular about once or twice a day. I just need to prove that my routers arent to blame and my job is done and they keep my firewalls. Do you have any advice? Like I said im desperate! Thanks for responding!!!
 
Leave constant pings though the tunnels and see if you get loss. If you get no loss to the actual end points but you get loss in the tunnels then it has to be something strange with the vpn. Normally you get counters in firewalls that show if it is getting packet loss or error on the tunnel.
 
Could you please give me an example? If I constant ping from a remote LAN computer & get no loss to an "end point" on the main site LAN... (I.E DNS server 192.168.100.254) than I can not possibly have a bad tunnel in my thinking. Doesnt the tunnel and the servers/PC's im pinging to pretty much go hand in hand? In theory I could have a server with a cable problem and drop pings to it so I know the tunnel could be good but the computer im pinging could be bad. But I am getting a pattern when dropping pings to the devices on the main LAN. They all usually drop the exact same amount of pings. I am even pinging the LAN gateway IP of the router supporting the tunnels (192.168.100.1) & I get the same amount of dropped pings on that as I do on the other devices Im pinging. Wouldnt this mean that the tunnel is most likely the issue not the devices I am pinging? I dont really drop alot of pings to external IP's (cnn.com, 8.8.8.8)
 
Let say you form a vpn tunnel between real ip 123.123.123.50 and 234.234.234.234.99 Lets also assume the internal tunnel ip are 192.168.100.1 and 192.168.100.2

If you have actual network issues you should have loss on both pings from 123.123.123.50---> 234.234.234.99 and well as 192.168.100.1-->192.168.100.2. Now if you only get loss on the internal ip then you need to blame the vpn device. If nothing else put loopback ip on the actual vpn devices to eliminate traffic past the vpn boxes.
 
Thanks. I think my vpn routers work differently than yours. I can't give my tunnels an IP address. There's a setting which puts the routers in a "MPLS" group. There are 2 group types. Type 1 is Hub and Spoke (which is what I'm dealing with now) which allows the remote sites to only talk to main site not to each other & main site can talk to all remote sites. Type 2 is a full mesh all sites can talk to all sites. The only requirements are they all have to have a different private subnet. I would hate to find out that these devices are crap because I've trusted them. The dropped pings have been slightly higher to the main site LAN devices than to cnn's & 8.8.8.8's. I am half tempted to tell the IT guy to disconnect the switch from the router at the main site and just ping the routers private address. That way I know nothing on his network is throwing the whole thing off. Tommorow the ISP equipment will be replaced with a gigabit rated Ethernet hand off port so that the WAN port on the VPN router at the main site can auto sense at 1000mb full instead of 100 full which it is right now. We'll see. I truly appreciate your help as always. Thanks.
 
I see. I maybe I got posts confused, you have a real mpls network. The so called vpn part is completely invisible to you even on what is called the edge router at your premise. When the traffic gets to one of the core routers they put special tags on the data to keep it separate.

When they hook your MPLS cloud to the internet also it get very confusing to troubleshoot. The path internally in the provider network can be very different that the path to your office locations. Unfortunately because of the mpls tags a tracert goes to their cloud of routers which all appear as a single hop. So even thought the actual router in their network that goes to the internet may be in a different city than the router that goes to your office it will all appear as the same device in a tracert.


I know we have a MPLS cloud and then we buy a internet connection from the same vendor many times but have them deliver it on a different "vlan". That way the traffic must pass our firewall and we can also to a point tell how traffic flows.

You may want to ask the ISP for a diagram of how your network is connected. You want to know the location of each of the provider data centers that your office connect to and some basic idea of how these data centers are interconnected. Almost all will give you this, some will actually show you things like the redundant paths in their network.

You are correct you always want to ping the router itself not the stuff behind it. If you see problem pinging between the router on your remote location to the router on the central location it has to be in the MPLS cloud and is mostly invisible to you.

The other thing you want to try is to transfer data between the site while you do this. What may be happening is you are exceeding the contracted bandwidth you have from the provider cloud to main site. Many times this is hard to see because the traffic is being dropped in the mpls router that is going to your main site. The ISP should be able to tell you if this is the case. The one we use we can get to their utilization graphs that show how much traffic is running on any link at any time.
 
That makes sense. The firewall/router we are using to make this setup is called Simplewan. They offer a feature we are using which they classify as SD-WAN but market the feature calling it Virtual MPLS. They actually had a real MPLS through the ISP but this solution is supposed to save them a lot of money as the cloud is generated over typical broadband circuits like Comcast & Verizon. I talk with one of their engineers a couple times a week and what they are using to create these "MPLS" set ups are just auto IPsec tunnels but supposedly the phase 1 & 2 algorithms are dynamic. They have a better model with a better processor & Ethernet ports which may support the tunnel better. I'm getting little or no packet loss from any of these routers pinging 8.8.8.8 but slightly more when pinging to the corporate LAN at the same time. Hopefully this gigabit/gigabit auto negotiating speeds solve the problem but from what I've read it should work even if the ISP Ethernet Port and the routers WAN Port are negotiating at 100 full. We shall see. Thanks!!!
 
So the gigabit port upgrade didnt stop the dropped pings. There is a setting in all the routers called "Allowed Fragmented IPSEC Packets" which is cutrrently enabled. Could this be causing the packet loss? What are your thoughts on fragmented packets over an IPsec tunnel.
 
In general ping will not be affected by that setting because the packets are too small. It is why you can get loss of actual data but ping will say there is no problem unless you use max size packets.

You basically have 2 choices. You do not allow the fragmentation of packets and the router with the vpn tunnel wills end back a message saying that the packet it too big. The MTU discovery in the ip stack on the end machine should then send smaller packets. Problem is these ICMP messages are sometimes filtered by firewalls so the end machine does no adjust the packet size. The other option is to allow the fragmentation and there are 2 variations. Packets that have a do not fragment flag set are still fragmented but the far end vpn device will do packet reassembly. This puts load on the vpn box. Other packets fragmentation is the standard issue of increasing the load on the receiving device that need to reassemble the data.

In general if everything is working correctly you do not get packet loss, you might get a tiny bit more latency because of packet reassembly times but it is generally so fast it can not be measured.

I think you are still back to finding a way to test router-router so you know for sure if it is the WAN network or it is something internal to your lan
 
Solution
I have the IT guy running constant pings at all 6 remote sites & at the main site starting this morning. He will ping multiple private addresses on all 7 LANs which will verify if his networks are causing some looping & in turn creating the packet loss. He will be pinging multiple public address like cnn.com & 8.8.8.8. He will also be pinging the public address of the router at the main site as well as the ISP gateway of that router. I have since turned the Allow Fragmented IPSEC Packet setting off & ensured both the Ethernet hand off Port from the ISP and the routers WAN Port at all locations are auto negotiating at 1000mb full. I can not adjust the MTU on any of the routers which is set to 1500. If there is no packet loss issues on his LAN's or to the public addresses that he is pinging than the tunnels on the simplewan devices are crap. I have faith though. I really appreciate your help as always but especially on this one. I dont have many friends or any co workers that can help with this type of stuff so thank you!!!