Wifi Access Point roaming not working as expected

edmond419

Distinguished
Apr 2, 2010
435
0
18,810
Hi all,

Scenario:
I currently have two wifi access points: one built into my router (TP LINK WR1043ND running DD-WRT v24-sp2), the other an AP (TP LINK WA901ND). They both have the same SSID, key and encryption type.

Basically, the WA901 is connected via a switch, which goes to another switch, before going to the router. Both are unmanaged switches.

Problem:
If I connect my computer to WA901 it works fine. When I roam over to the router's own AP, it also continue to work fine.

However, when I try to roam back over to the other AP or connect to the router first, I get no data flow at all until I power cycle the AP. Static addressing didn't solve the problem.

So I'm not sure what I am doing wrong here. Please help. :)

Edmond
 

adamwinn

Distinguished
Dec 31, 2007
245
0
18,860
I assume you have DHCP server enabled on only the WR1043ND, and not on the WA901ND?

I suggest using a different SSID for one of the APs and see if the behavior is the same or not. Your network card may be sending some type of auth/deauth packet that is confusing the AP. It seems unlikely though (because your network card knows the router by BSSID/MAC) - but it's definitely something that needs to be isolated.

So yeah, try changing the SSID on one of them - if behavior is unchanged, try a firmware upgrade on both, and a driver upgrade for your wireless card.

Let me know results of that and we can continue to isolate the issue.
 

edmond419

Distinguished
Apr 2, 2010
435
0
18,810
Hey Adam. Thanks for your response.

DHCP server - Yes that's right.
Firmware - They are both running the latest available firmware.
Wifi driver update - Using OSX Yosemite 10.10.1, but I couldn't find any available updates for that.

Changing the SSID didn't fix the problem.

Adding a second access point to the network with the same settings and disabling the router's inbuilt AP worked, though I'd rather use the router's inbuilt one so that one could be deployed elsewhere.
 

adamwinn

Distinguished
Dec 31, 2007
245
0
18,860
If disabling the router's inbuilt AP worked, it sounds like there's a critical bug in the router's AP software. This is confirmed by the fact that rebooting the AP is the fix.

I spoke with some wi-fi engineers today and they were experiencing a similar issue - certain auth/deauth patterns can cause parts of the firmware to crash. If you really want to isolate the issue, you could:
1) Enable the AP on the router again
2) SSH to the AP
3) Run PS command and save a copy of the output
4) Reproduce the behavior that causes the suspected firmware crash
5) Run PS command again and compare the results

Doing this might reveal a certain service or process crashing. It won't reveal a deadlock though. You could also see if the router has 'top' command, and that may reveal if there's some CPU or memory spike observed after the crash.
 

edmond419

Distinguished
Apr 2, 2010
435
0
18,810
Hey Adamwinn,

Thanks for your response. Sorry for the late reply, been busy on other things.

The ps and top output of both instances doesn't seem to show any differences, nor were there any major resource usage spikes.

At step 3, ps.
Code:
  PID USER       VSZ STAT COMMAND
    1 root      1444 S    /sbin/init
    2 root         0 SW   [kthreadd]
    3 root         0 SW   [ksoftirqd/0]
    4 root         0 SW   [kworker/0:0]
    5 root         0 SW<  [kworker/0:0H]
    6 root         0 SW   [kworker/u2:0]
    7 root         0 SW<  [khelper]
    8 root         0 SW   [kworker/u2:1]
   74 root         0 SW<  [writeback]
   76 root         0 SW<  [bioset]
   77 root         0 SW<  [crypto]
   79 root         0 SW<  [kblockd]
  106 root         0 SW   [kworker/0:1]
  112 root         0 SW   [kswapd0]
  160 root         0 SW   [fsnotify_mark]
  290 root         0 SW<  [deferwq]
  548 root       904 S    /sbin/hotplug2 --set-rules-file /etc/hotplug2.rules --persistent
  567 root      1680 S    watchdog
  628 root         0 SW<  [cfg80211]
  836 root         0 SW   [kworker/0:2]
  840 root      1728 S    hostapd -B -P /var/run/ath0_hostapd.pid /tmp/ath0_hostap.conf
  908 root         0 SW<  [kworker/0:1H]
  915 root       876 S    cron
  935 root      1032 S    dropbear -b /tmp/loginprompt -r /tmp/root/.ssh/ssh_host_rsa_key -d /tmp/root/.ssh/ssh_host_dss_key -p 22  -a
  943 root       956 S    dnsmasq --conf-file=/tmp/dnsmasq.conf
  963 root      1448 S    ttraff
  996 root      2228 S    startstop_f run_rc_startup
 1042 root      3476 S    httpd -p 80
 1137 root      1112 S    syslogd -L
 1139 root      1112 S    klogd
 1140 root      1636 S    resetbutton
 1158 root      3476 S    httpd -S
 1442 root      1440 S    process_monitor
 1447 root      1388 S    inadyn -u maskedusername -p maskedpassword --input_file /tmp/ddns/inadyn.conf
 1783 root      1660 S    wland
 2137 root      3048 S    /tmp/openvpnserver --config /tmp/openvpn/openvpn.conf --route-up /tmp/openvpn/route-up.sh --down-pre /tmp/openvpn/route-down.sh --daemon
 2471 root      1112 S    udhcpc -i eth0 -p /var/run/udhcpc.pid -s /tmp/udhcpc -O routes -O msstaticroutes -O staticroutes
 2476 root      1100 D    dropbear -b /tmp/loginprompt -r /tmp/root/.ssh/ssh_host_rsa_key -d /tmp/root/.ssh/ssh_host_dss_key -p 22  -a
 2477 root      1116 S    -sh
 2480 root      1112 R    ps

At step 3, top.
Code:
Mem: 19172K used, 42524K free, 0K shrd, 2520K buff, 6880K cached
CPU:  0.1% usr  0.1% sys  0.0% nic 97.6% idle  0.0% io  0.0% irq  1.9% sirq
Load average: 0.05 0.11 0.05 2/39 2501
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  840     1 root     S     1728  2.7   0  0.4 hostapd -B -P /var/run/ath0_hosta
 1158     1 root     S     3476  5.6   0  0.0 httpd -S
 1042     1 root     S     3476  5.6   0  0.0 httpd -p 80
 2137     1 root     S     3048  4.9   0  0.0 /tmp/openvpnserver --config /tmp/
  567     1 root     S     1680  2.7   0  0.0 watchdog
 1783     1 root     S     1660  2.6   0  0.0 wland
 1140     1 root     S     1636  2.6   0  0.0 resetbutton
  963     1 root     S     1448  2.3   0  0.0 ttraff
    1     0 root     S     1444  2.3   0  0.0 /sbin/init
 1442     1 root     S     1440  2.3   0  0.0 process_monitor
 1447     1 root     S     1388  2.2   0  0.0 inadyn -u maskedusername -p maskedpassword
 2501  2477 root     R     1120  1.8   0  0.0 top
 2477  2476 root     S     1116  1.8   0  0.0 -sh
 2471     1 root     S     1112  1.7   0  0.0 udhcpc -i eth0 -p /var/run/udhcpc
 1137     1 root     S     1112  1.7   0  0.0 syslogd -L
 1139     1 root     S     1112  1.7   0  0.0 klogd
 2476   935 root     S     1100  1.7   0  0.0 dropbear -b /tmp/loginprompt -r /
  935     1 root     S     1032  1.6   0  0.0 dropbear -b /tmp/loginprompt -r /
  943     1 root     S      956  1.5   0  0.0 dnsmasq --conf-file=/tmp/dnsmasq.
  548     1 root     S      904  1.4   0  0.0 /sbin/hotplug2 --set-rules-file /

Step 5, ps
Code:
PID USER       VSZ STAT COMMAND
    1 root      1444 S    /sbin/init
    2 root         0 SW   [kthreadd]
    3 root         0 SW   [ksoftirqd/0]
    4 root         0 SW   [kworker/0:0]
    5 root         0 SW<  [kworker/0:0H]
    6 root         0 SW   [kworker/u2:0]
    7 root         0 SW<  [khelper]
    8 root         0 SW   [kworker/u2:1]
   74 root         0 SW<  [writeback]
   76 root         0 SW<  [bioset]
   77 root         0 SW<  [crypto]
   79 root         0 SW<  [kblockd]
  106 root         0 SW   [kworker/0:1]
  112 root         0 SW   [kswapd0]
  160 root         0 SW   [fsnotify_mark]
  290 root         0 SW<  [deferwq]
  548 root       904 S    /sbin/hotplug2 --set-rules-file /etc/hotplug2.rules --persistent
  567 root      1680 S    watchdog
  628 root         0 SW<  [cfg80211]
  836 root         0 SW   [kworker/0:2]
  840 root      1728 S    hostapd -B -P /var/run/ath0_hostapd.pid /tmp/ath0_hostap.conf
  908 root         0 SW<  [kworker/0:1H]
  915 root       876 S    cron
  935 root      1032 S    dropbear -b /tmp/loginprompt -r /tmp/root/.ssh/ssh_host_rsa_key -d /tmp/root/.ssh/ssh_host_dss_key -p 22  -a
  943 root       964 S    dnsmasq --conf-file=/tmp/dnsmasq.conf
  963 root      1448 S    ttraff
 1042 root      3476 S    httpd -p 80
 1137 root      1112 S    syslogd -L
 1139 root      1112 S    klogd
 1140 root      1636 S    resetbutton
 1158 root      3476 S    httpd -S
 1442 root      1440 S    process_monitor
 1447 root      1388 S    inadyn -u maskedusername -p maskedpassword --input_file /tmp/ddns/inadyn.conf
 1783 root      1660 S    wland
 2137 root      3048 S    /tmp/openvpnserver --config /tmp/openvpn/openvpn.conf --route-up /tmp/openvpn/route-up.sh --down-pre /tmp/openvpn/route-down.sh --daemon
 2471 root      1112 S    udhcpc -i eth0 -p /var/run/udhcpc.pid -s /tmp/udhcpc -O routes -O msstaticroutes -O staticroutes
 2476 root      1100 R    dropbear -b /tmp/loginprompt -r /tmp/root/.ssh/ssh_host_rsa_key -d /tmp/root/.ssh/ssh_host_dss_key -p 22  -a
 2477 root      1116 R    -sh
 2523 root      1112 R    ps

Step 5, top
Code:
Mem: 19032K used, 42664K free, 0K shrd, 2520K buff, 6880K cached
CPU:  0.1% usr  0.5% sys  0.0% nic 95.8% idle  0.0% io  0.0% irq  3.3% sirq
Load average: 0.02 0.09 0.05 2/39 2522
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  943     1 root     S      964  1.5   0  0.6 dnsmasq --conf-file=/tmp/dnsmasq.
  840     1 root     S     1728  2.7   0  0.4 hostapd -B -P /var/run/ath0_hosta
 1158     1 root     S     3476  5.6   0  0.0 httpd -S
 1042     1 root     S     3476  5.6   0  0.0 httpd -p 80
 2137     1 root     S     3048  4.9   0  0.0 /tmp/openvpnserver --config /tmp/
  567     1 root     S     1680  2.7   0  0.0 watchdog
 1783     1 root     S     1660  2.6   0  0.0 wland
 1140     1 root     S     1636  2.6   0  0.0 resetbutton
  963     1 root     S     1448  2.3   0  0.0 ttraff
    1     0 root     S     1444  2.3   0  0.0 /sbin/init
 1442     1 root     S     1440  2.3   0  0.0 process_monitor
 1447     1 root     S     1388  2.2   0  0.0 inadyn -u maskedusername -p maskedpassword
 2502  2477 root     R     1120  1.8   0  0.0 top
 2477  2476 root     S     1116  1.8   0  0.0 -sh
 2471     1 root     S     1112  1.7   0  0.0 udhcpc -i eth0 -p /var/run/udhcpc
 1137     1 root     S     1112  1.7   0  0.0 syslogd -L
 1139     1 root     S     1112  1.7   0  0.0 klogd
 2476   935 root     S     1100  1.7   0  0.0 dropbear -b /tmp/loginprompt -r /
  935     1 root     S     1032  1.6   0  0.0 dropbear -b /tmp/loginprompt -r /
  548     1 root     S      904  1.4   0  0.0 /sbin/hotplug2 --set-rules-file /

Also, while the issue is reproduced, while connected to the WA901 with static addressing, I can only access devices connected directly to that AP including that AP's own control panel, and nothing else beyond that point.

Also a full power cycle of the WA901 (power disconnect and reconnect) would be needed to fix the problem. A soft reboot or replugging of the network cable would not fix it.
 

adamwinn

Distinguished
Dec 31, 2007
245
0
18,860
Thanks for the details! There's one minor difference in the PS and I believe it's our clue to the root cause: At step 5 an extra process is running. PID 996, "startstop_f run_rc_startup"

Doing some quick searching shows a little forum activity about this service on DD-WRT/Open-WRT coinciding with DNS/DHCP failures. http://svn.dd-wrt.com/ticket/3553 is an interesting post.

Have you tried any different versions of the firmware on the suspect device? You may want to take a look at the log files on the device - it's very possible it's going into 'kernel panic' after getting some unexpected packet.
 

edmond419

Distinguished
Apr 2, 2010
435
0
18,810
I'll have to try looking at different versions of the firmware in a few day's time, as I'm not able to upload new ones at this time due to my current environment.

I have looked at the contents of syslogd. Doesn't seem to show anything special at around the time I reproduce the problem. Only shows messages from the hostapd daemon notifying of my computer's disconnection and reconnection to and from the WR1043ND's inbuilt AP. No DHCP messages whatsoever could be seen.

The funny thing is that when I reproduce the issue, other devices is not affected -- until I perform the problematic steps on the devices will they stop being able to use the network.