Planning | Testing | Setup | Wi-Fi | Top |
I have multiple issues with my Wi-Fi access point:
Since my Wi-Fi is used by guests with doubtful security, I want my Wi-Fi access point to have my firewall on it. That implies that it will run on one of my machines, not a commercial appliance. I've been doing this for several years, using hostapd by Jouni Malinen, currently version 2.10.
In the original configuration, the AP (and most other hosts) are at one end of my house. At the other end are two security cameras, which associate unreliably with the AP due to distance and intervening exterior walls. I'm using a commercial appliance, a TP-Link AC750 Wi-Fi Range Extender model RE220, but it has issues, and I would like to use my own machine for this also. But various problems have gotten in the way.
At rare intervals, since I've been using NICs with MediaTek chipsets (Terow Mediatek AC-1200Mbps and Alfa AWUS036ACM), the CPU (x86_64) mysteriously freezes, with complete silence in the log files. This kind of thing has been reported for decades; the earliest report I saw was from 2007; and the association with MediaTek is likely just coincidence. Others blame an unfixed or unfixable bug in Linux memory management. But I moved the AP off my main router (Jacinth) onto a Raspberry Pi 3B (Holly, aarch64/ARM). This got rid of the freezes for several years, but now they're back. One person solved a similar issue by upgrading to a RPi 4B, and I've been looking for an excuse to do that upgrade. The new RPi 4B is called Beaver and is the subject of this hardware review.
Here is a list of the NICs I'm working with:
Nbr | Brand | Firmware | MAC Address |
---|---|---|---|
0 | Terow AC-1200Mbps | Mediatek | 00:13:ef:5f:0c:3c |
1 | Alfa AWUS036ACM | Mediatek | 00:c0:ca:b0:60:4b |
2 | Cudy WU1400AC | Realtek | b4:4b:d6:27:de:88 |
3 | Alfa-rt AC1200 | Realtek | 00:c0:ca:a8:4c:7e |
4 | Intel 3168NGW | Intel | 60:f6:77:76:21:63 |
6 | TP-Link TL-QN722N | Atheros | 90:f6:52:08:c8:4b |
7 | Za-pai | Ralink | 00:e1:80:67:84:34 |
1-letter codes for AP hosts, NICs and places:
Compact Officewhere Jacinth lives, west end of living room
In prior unsuccessfui attempts to make a range extender, I got into
packet storms that caused uninvolved Wi-Fi packets to be collided with or
otherwise forced off the aether. The code word packetvore
denotes an
episode of very high loss of Wi-Fi packets.
Summary:
Discussion: All combinations of host and NIC are capable of running for long periods with high quality Wi-Fi. All but one combination suffered packetvore attacks or major glitches. It is definitely not true that one or the other NIC fails consistently (both seem similar), and the same for the two hosts. Patterns of packet loss vary from day to day, or even hour to hour. I would be very inclined to attribute the problems to malign outside influences, except that neighbors' beacons are all under -75dBm, way below local levels, and it's plausible that the neighbors' stations have similar power, far from enough to disrupt local communication.
Conclusion: The H/A/J combination has been trouble-free for a week, cross fingers. By abandoning the range extender configuration and using wired Ethernet for all APs, I think I'm now on the right track.
After a week of hiatus I'm going to violate the policy if it
isn't broken, don't fix it
. The initial status is:
The planned change is to have all 3 APs active, with SSID CouchNet, and with Beaver and the Alfa atop the curio cabinet (B/A/Q). A future improvement will be to buy 2 more Alfa AWUS036ACM's, replacing the Terow AC-1200 (unreliable) and the Za-Pai (it's za-pai). Another improvement is a second Raspberry Pi 4B serving Jim's desk; see this review's front page for good words about the RPi 4B in the desktop replacement role.
I want to do a signal strength survey before and after moving the APs to their new locations. Former locations of the APs and NICS:
Planned future locations and NIC assignments:
These will be the sites to measure at. Ring cameras report their
own signal strength; at other sites I'm using Xena (laptop) to measure.
The procedure on Xena will be to disconnect Wi-Fi, wait 10sec,
reconnect (giving it the chance to switch to the strongest AP), then
run iwconfig $NIC
3 times 15sec apart, and report the average
signal level, always negative, in dBm. It also reports which AP it is
associated with.
Data to record at each site:
iwconfig $NICreports the BSSID, i.e. the MAC address of the AP in use, and its signal strength. Ring cameras report their own signal strength. On the AP,
hostapd_cli -i $NIC all_stalists all associated stations (reformat it neatly and look for the Ring cameras).
Site | Initial | Final | ||||
---|---|---|---|---|---|---|
Signal | AP Used | Scan | Signal | AP Used | Scan | |
Couch | -48 | H/A/J | A=-44 Z=-57 | -41 | P/T/C | A=-44 T=-33 Z=-55 |
Office | -41 | H/A/J | A=-34 Z=-67 | -55 | P/T/C | A=-47 T=-31 Z=-44 |
Dining Rm | -66 | H/A/J | A=-57 Z=-61 | -44 | B/A/Q | A=-39 T=-58 Z=-71 |
Breakfast Rm | -71 | H/A/J | A=-67 Z=-53 | -54 | B/A/Q | A=-56 T=-61 Z=-76 |
Laundry | -76 | H/A/J | A=-68 Z=-71 | -56 | B/A/Q | A=-64 T=-62 Z=-83 |
Garage | -63 | H/A/J | A=-60 Z=-80 | -59 | P/T/C | A=-63 T=-73 Z=-77 |
Jim's Bed | -56 | H/A/J | A=-45 Z=-66 | -61 | P/T/C | A=-60 T=-62 Z=-68 |
Alice's Bed | -60 | H/A/J | A=-55 Z=-64 | -53 | B/A/Q | A=-56 T=-56 Z=-73 |
Bathroom | -73 | H/A/J | A=-71 Z=-61 | -58 | B/A/Q | A=-54 T=-44 Z=-82 |
RingC1 | -59 | H/A/J | -59 | P/T/C | ||
RingC2 | -71 | P/Z/Q | -59 | B/A/Q | ||
RingC3 | -49 | H/A/J | -52 | P/T/C |
Wi-Fi is working.
I received and installed the new Alfa AWUS036ACM NICs, replacing the Terow AC-1200 and the Za-Pai. No problems during or after installation, except…
Before and after installing the new Alfa NICs, I had a small number of packetvore attacks, i.e. high packet loss rates, in some cases belived to be less than 100%, but sometimes I'm pretty sure it's 100%, which lasted several minutes at least. When the station was induced to connect to a different AP, it communicated normally, and the AP returned to normal function (but I don't know how long that took).
I'm trying to gather information about what's happening. The first job is to set up a test station connecting to each AP, and I'll use the AP hosts themselves: Beaver → Holly → Piki → Beaver, sending from their onboard NICs, all rad9. /etc/sysconfig/network/ifcfg-rad9 contains: (replacing value keywords with their numeric values)
STARTMODE='auto' BOOTPROTO='static' IPADDR='my fixed IPv4 adr/nbr of bits' IPADDR_0='my fixed IPv6 adr/nbr of bits' WIRELESS_ESSID='CouchNet' WIRELESS_WPA_PSK='WouldntYouLikeToKnow' WIRELESS_AP='MAC adr of AP NIC on the target'
Since a route with more bits is preferred, a host (maximum length) route /etc/sysconfig/network/routes-rad9 contains:
# Dest Via NMask Ifc Options Target Wi-Fi IPv4 adr/32 - - - Target Wi-Fi IPv6 adr/128 - - -
No special routes on the target. The station
A Wi-Fi range extender
includes an access point, which other
stations connect to, plus a client (station) that can forward traffic between
the connecting stations and some other AP, usually the main Wi-Fi router. I
wanted to junk the commercial range extender, TP-Link AC750 Wi-Fi Range
Extender model RE220, and replace it with my own Raspberry Pi 3B. But packet
loops (code name: packetvore) are easy to create and hard to diagnose and
mitigate, and I eventually abandoned the range extender project, switching to
standalone APs on the LAN via wired Ethernet.
This design produces a working
range extender, except for the minor
detail of packetvore attacks at random intervals.
Each Raspberry Pi range extender has one external USB NIC to be the access point that stations connect to, and the uplink is its internal Wi-Fi, connected to an AP that's directly on the LAN.
The AP is in a bridge. For the LAN-resident APs the wired Ethernet NIC is also in this bridge, and the effect is as if the connecting stations were directly on the LAN. No routing is needed.
The firmware in the TP-Link range extender assigned IPs randomly from a separate address range, and it routed between that range and the LAN. Stations could originate connections to and beyond the LAN, but LAN clients could not figure out the randomly assigned IP addresses of the stations, so could not originate connections to them. This arrangement was one of the reasons I junked the TP-Link range extender.
For the RPi range extender, it would be really nice if the Wi-Fi uplink NIC could go in the bridge just like wired Ethernet. But there are corner cases where 802.11 Wi-Fi and bridging conflict, such as multicast, and the 802.11 driver maintainers got tired of nested kludgy workarounds, and introduced a control bit in the bridge driver so drivers like theirs could declare the interface unbridgable. Sabotaging this bit is simple, but I got tired of maintaining a hacked driver that tainted the kernel, and I gave that up.
Instead, there's a Geneve tunnel (it's bidirectional) with one end
in the range extender's bridge and the other in the bridge of the
LAN-resident AP (referred to as the Geneve server
), which puts
the stations effectively on the LAN, same as with wired Ethernet.
But…
Geneve bearer packets cannot successfully be borne by the Geneve tunnel (a chicken and egg issue). So I use policy routing to divert those bearer packets to the RPI's internal NIC, which passes them to the Geneve server. Being addressed to that host (rather than a generic LAN host), the bearer packets are swallowed by the Geneve bearer port, and the payload packets appear on its bridge. This actually works, usually.
Initially there is only one extender and one Geneve server, but as much as feasible, the setup program is designed to handle multiple extenders and servers.
Geneve tunnel endpoint IPs and channel ID number (24 bits) must be specified at creation. I'm using the IPv4 addresses of the bridges in the extender and the Geneve server. (IPv6 would also work.) The channel ID must be the same at both ends. I take the last octet of the IPs of the extender and server, and multiply the smaller by 0x100 and add the larger.
The first problem I encountered was, on NICs #2, 3 and 6, hostapd starts up,
puts the NIC in the bridge, claims to have turned on AP mode, but WiFi Analyzer
for Android doesn't report any beacons for that NIC (it has a debug SSID so is
distinguishable), and you can't associate with it. Message:
hostapd.J[13688]: rad0: AP-ENABLED
. This is the normal message for
successfully starting up hostapd. Google searches for my symptom reveal
nothing.
Summary of iw list
for various NICs as seen on Beaver:
Onboard NIC: This is phy0. Supported ciphers: WEP40/104, TKIP, CCIP-128, CMAC. Modes: managed, AP, etc. Bands: 2.4GHz, 5GHz. Valid interface combinations: 1 or 2 managed, or 1 managed + 1 AP, or just 1 AP, or various P2P (no IBSS). Must be on the same channel I think. Firmware is onboard (nothing was uploaded during driver init).
Nic #0 (Terow, Mediatek): Supported ciphers: WEP40/104, TKIP, CCMP-128/256, GCMP-128/256, CMAC, CMAC-256, GMAC-128/256. Modes: managed, AP, etc. Bands: 2.4GHz, 5GHz. Valid interface combinations: number of Managed + APs + etc. max of 2, must be on the same channel I think. Firmware is onboard (nothing was uploaded during driver init).
Nic #2 (Cudy, Realtek): Supported ciphers as for #0 (Terow). Modes: managed, AP, etc. Bands: 2.4GHz, 5GHz. Valid interface combinations: only one interface at a time. Firmware is onboard (nothing was uploaded during driver init).
I had trouble to get several NICs to act as APs on Piki, so I tried them on Beaver. Outcomes:
I added to generic.incl: beacon_int=100; start_disabled=0
Empirically, start_disabled=1 makes the Terow not send beacons; 0 lets
it send them. The Cudy still doesn't send them in either case.
Learning to use hostapd_cli: Only useful-looking commands are shown.
hostapd_cli on Cudy: It is not putting out any beacons.
OK. Did not make beacons appear.
OKto both. Even after about 2min, no beacons were produced.
Tidbit on a forum:
Raspberry Pi 4 hostapd hotspot not visible, OP CybeX, 2019-11-21.
Options to produce debug output:
/usr/sbin/hostapd /etc/hostapd/hostapd.conf -dd | tee /tmp/hostapd.log
It looks like his issue wasn't missing beacons, it was a screwed up DHCP
server. For reference, my symptom is, my AP doesn't appear in the list of
selectable SSIDs in Android 12 or NetworkManager, and my AP doesn't show up
in WiFi Analyzer (Android). The OP actually reports symptoms matching mine but
described with less detail, and I don't see a DHCP issue in his initial report,
though he says he solved his problem by disabling dhcpcd.
From another forum post: maybe rfkill didn't un-kill it. rfkill
list
reports that all phy's plus Bluetooth are not soft/hard blocked. For
the identifier it wants the number in the list (not the phy name nor the
interface name). But rfkill unblock 9
didn't start the beacons.
Running hostapd with the -dd option (debug output), see command line above. What I found:
Repeating the above with Terow (with beacons) and comparing:
Digging through /usr/share/doc/packages/hostapd/hostapd.conf, annotated config file with default or recommended values for every parameter, and extracting everything mentioning beacons. Keywords beginning with # are defaults that normally would not be set explicitly. Many items described here as adding some element to the beacon frame, also add it to the Probe Response frame.
With start_disabled=1 the NIC will start with no beacons (our symptom),
so there must be a command to start the beacons. The only commands for
hostapd_cli that mention beacon
in the help are update_beacon (update
the content of the beacon frame), and req_beacon (send a Beacon report request
to a station). I wonder if you're supposed to just change the value to
start_disabled=0? The command line would be:
hostapd_cli [-i $interface] set start_disabled 0
Steps in testing the above:
FAIL.
FAIL. beacon_int is among the values shown by hostapd_cli status .
Searching in the source code of hostapd-2.10 .
But I got a hint somewhere: iw
may be my friend, specifically
iw dev $IFC ap start
(or stop). No, it gives a usage message. Looks
like missing options.
After considerable struggle I junked the Cudy and Alfa-1200 because the drivers are out of kernel, and I found the Za-Pai Ralink NIC, which has an in-kernel driver and which emits beacons as it should; tested on Beaver.
Now bringing up the Za-Pai up on Piki.
betterwith https schema.
Dead loop on virtual device br0, fix it urgently!(19 reps, 2 for gen1, the rest for br0). This was just after Pikiwf hostapd.J+geneve.J started. Noticeable packet interference while that was happening. Now, 1.46Mb file's speed was 3.40e6 by/s and 16.7Mby file at 3.3336 by/s with no packets lost during either download.
What could be causing Piki's onboard NIC to trash 97% of its own packets? What I've learned so far:
perfect, i.e. well under 1% packet loss. Wi-Fi on Beaver usually loses at least 3% packets, with 4 echo requests/replies per sec and nothing more, no downloads or terminal session action, and from time to time (regularly about once/min plus random) it has a few secs of near-total loss. This has been going on since I first got Beaver.
net-geom.J -vmay give a clue. Nobody here but us chickens; all but Jacinth Surya Xena Petra have the generic configuration: IPv4+6 address on br0 or en0, default route (4+6) via Jacinth, no other special routes, Beaver and Holly have their radios as members of br0.
expirestime (irrelevant).
linuxnet-qoswhich seems to be an object oriented programming framework involving net traffic control. It says,
The mq qdisc is automatically instantiated (by who? kernel or their backend?) as the default root queuing discipline for interfaces with multiple hardware queues.It is conceivable that the Raspberry Pi 4B (Beaver) Ethernet controller has multiple hardware TX queues while the RPi 3B (Holly) doesn't.
This morning I switched Selen (cellphone) to Beaver and went through a lot of links. Most but not all delivered the pages (one timed out), but definitely slowly compared to Holly. Let's do tests where Selen (Wi-Fi to Beaver) plays the videos that yesterday were being downloaded to Xena by curl. What packets does Beaver see?
The 1.46Mb WMV file: On Holly it was downloaded in under 1 sec. On Beaver there were about 30 groups (payload in from Jacinth, out to Selen, Selen acks, out to Jacinth), 1448 bytes each, 43kb captured, 1837 pkts received, 200 captured (due to -c 200), 89% dropped, 400kb accounted for.
If I captured 750 pkts I might cover the whole download. Let's give it a try. With -c 2000 it covered 0.47 sec of the download, while 1.46Mb should have fit in 1009 pkts and at 54Mbit/s (typical on this link) it would have taken 0.22 sec. 2000 pkts captured, 2046 pkts received, 0 dropped, 46 mismatch the filter. VLC tends to download several Mby to its buffer at the start of performance, if the source will deliver it (vs. pseudo-isochronous streaming), which this source will. Conclusion: at last I'm seeing more packets than expected, which I had a lot of trouble to prove were present.
In all of the above thrashing around, Xena was pinging Beaver (and others) and every packet was timely answered; in past serious clog-ups many packets aren't answered within the tester's accept window.
Can I prove, from the ack window, that some packets are duplicated?
It would take a lot of work. Instead I'm going to try to implement
traffic control, limiting the bandwidth on the Wi-Fi NIC to slightly
less than its actual capacity. I took the time to improve
my
traffic control script (described here).
I finished the improved script and it seems to be working, and specifically it limits the achieved data rate to close to the configured value, which is less than the maximum that the interface could do (for testing, and in some cases needed operationally). Now I'm going to re-do the download tests with traffic control active.
Conditions:
-i any, there will be similar payloads and ACKs on en0.)
otherpackets. The expected number of packets were received from Jacinth, and a believable number of ACKs were sent by Selen. Video performance was normal. No ping packets were dropped except, as usual, Selen responded to only about 25% of them.
Next test, similar to the above but with Piki connected to Beaver and Selen connected to Piki. On Piki, geneve.J and hostapd.J are active. tcpdump as above will run on Piki (and Beaver if needed).
Need to think about this: On Piki, ip -4 route show
shows a
route to 192.9.200.192/26 (local LAN) via rad9 (correct, working) and
also via br0, but no default route. These are prefix routes. Members
of br0 are en0 (normally not connected), rad7 (the access point), but
missing gen0 which does not exist anywhere. I restarted geneve.J; gen0
is back. But the routes are unchanged (IPv4+6). I'm proceeding with
the test without tampering with (fixing) the routes.
Connections from Selen to https://www.jfcarter.net:1447/etc time out. Connections to http://jacinth.cft.ca.us/etc are redirected to that URL and time out. When Selen connects to Beaver, these work. I showed the http page, changed to Piki, and refreshed, and it showed the same page (good). Proceeding with the test.
Can't follow links to other URLs on Jacinth like the video test file page. Could it be a DNS issue? I'm going to try opening an OpenVPN connection to Jacinth 1194. Didn't help; since the tunnel can't send bearer packets through the tunnel, and is too dumb to send bearer packets and other packets by different routes, no traffic to Jacinth goes via the tunnel. (Actually the issue is that policy routing is available on Linux but not other OS's that OpenVPN wants to be portable to.)
Next try: transplant the test file to Iris. When Selen is connected to Holly or Beaver, I can play the test file, but if connected to Piki, I can't follow the link. I also tried Iris' numeric IP (in case of DNS issues), didn't help. This is with and without the VPN.
Selen connected to Piki, web browser to Piki's IPv4 address.
The index page is shown, with the logo image. There is a ssh session
on pikiwf IPv6 (the uplink) (correct). Command line on Piki, tossing
this ssh traffic; if you print it, you also have to print the traffic
by which you print… Chicken and egg issue: you get an omelet.
tcpdump -l -i any not host pikiwf
Ditto except web browser to Beaver's IPv4 address. The index page was shown promptly, with the logo image. There were lots of echo request and reply between Selen on rad7 (the AP) and gen0 (Piki's Geneve tunnel). But no sign of HTTP traffic which should have been seen both on rad7 and Geneve payloads.
How to put both an AP and a managed interface on the same Wi-Fi NIC:
Do iw list
. You get a sequence of sections each titled
Wiphy phy$N
where $N is an integer. Guess which is which;
probably phy0 will be the onboard NIC, and phy1 will be the USB NIC
because it is initialized later
. Look for supported interface modes
. You should find Managed and AP (and others). Then look for
valid interface combinations
. Sometimes the descriptions are
kind of cryptic, but you need it to allow at least one Managed
and AP simultaneously. Usually you see channels <= 1
(both
on the same channel).
Now create a normal managed Wi-Fi interface configuration and bring it up. And start hostapd with your normal setup script. They coexist happily if that's a valid interface combination.
192.9.200.206 pikien.cft.ca.us b8:27:eb:d4:e5:13
If you bring up rad7 in managed mode, it works and can connect to an AP (Beaver). But hostapd will not start; can't init the NIC. If you start hostapd first (AP mode), it works, and beacons are emitted. Stations can associate but communication is hosed (routing?) and the station can't get a DHCP address. If you bring up rad7 later in managed mode, it works and can connect to Beaver, beacons come out, but stations can't get a DHCP address. If you take down rad7 managed, no more beacons.
Problem may be that Geneve is hosed on Piki. Geneve can't come up unless pikiwf (rad7 or rad9 managed) is up. I.e. there's a route to the server for Geneve bearer packets.
Another issue: a managed Wi-Fi NIC is not supposed to be in a bridge. In this design, it is supposed to send direct to the server (in this case Beaver). If it were in the bridge, and if the Geneve tunnel to Beaver were running, it would cause an instant packet loop. Tests above may or may not have been affected with these issues.
Subsequent testing:
geneve.J (setup program) made assumptions about en0 which I was violating, with the result that en0's IP address was not found, and the program did not check for that, and got a syntax error. Fixed I hope.
These were the conditions on Piki: en0 is connected, but neither Wi-Fi nor Geneve are up. en0 is plugged into the same hub that Iris is. Piki can download video files of various size from Iris (curl $URL | sum) at about 88 Mbit/sec with no impairment of any other connections. (HTTP not HTTPS which is substantially slower.) The theoretical max for Ethernet on RPi-3B is 100 Mbit/sec. (Beaver, a RPi-4, theoretically can do 1 Gbit/sec.)
With these conditions: en0 connected, rad9 (internal Wi-Fi) up and connected to Beaver, hostapd.J (rad7) down, Geneve down. A special route sends the Iris connection via rad9, not en0, i.e. Pikiwf → Beaver → Iris, and replies should have taken the reverse path since rad9 and en0 have different IP addresses. The average payload rate was 760 kbit/sec, whereas 22 Mbit/sec is normal for this link. Packet loss rates for all connections were near 100%, including Xena → Holly, which did not participate at all in any of this traffic. So I believe that a packet loop was saturating the aether and kicking the Xena → Holly link off Wi-Fi. This is the symptom I am trying to get rid of.
So how could the packet loop occur? Let's trace a broadcast packet such as an IPv4 ARP request (IPv6 neighbor discovery would be multicast, but all the participants are subscribed to the relevant local LAN scope group, so the effect is the same.) It could originate on any host, but let's start on Piki. The packet will go out on rad9 to Beaver (bridged to the LAN) and on eth0 direct to the LAN. Piki's bridge currently has no members, thus no outgoing ARP packets. The en0 packet will be received on Beaver's bridge and will be sent out on Wi-Fi to all Beaver's stations, specifically Piki rad9. Conversely the rad9 packet will leave Beaver to the LAN, and Piki en0 will receive it. Will Piki send either of these packets any farther?
Weird: piki en0 has an IPv6 address of 2600:3c01:e000:306::cf which belongs to Piki. net-geom.J knows that it should be deleted (i.e. is probably not what put it there). Removing it did not help the packet loop. Pretty sure wickedd-dhcp6 thinks it has a lease with this adr and it periodically re-adds that address. kea on Jacinth has a record of the lease; Piki doesn't.
Found a culprit! Piki was a router, and shouldn't be. /etc/sysctl.conf net.ipv4.ip_forward = 1 and friends were turned on. With both en0 and rad9 active, they forwarded to each other through Beaver, producing the packet loop. Also /etc/sysctl.d/70-yast.conf has a cryptic copy of the relevant settings; rename to 70-yast.conf-OFF. For the fix, you need to set ip_forward etc. explicitly to 0; they won't go off by themselves. See next paragraph for the filenames.
Better just reboot Piki after this change. Steps after rebooting:
Retrying speed test (curl $URL | sum):
Investigating on Beaver.
obviouslyit's not a range extender nor an AP. The right response is to override manually with -X.
Now I'm going to turn on Piki's AP (with Geneve).
Analysis of packet loop situation:
Tracing a payload packet (mentally) from Piki to Iris and back. Conditions: Beaver and Piki both have Geneve and hostapd, though Piki's hostapd is not supposed to be used in this analysis. Beaver has Ethernet; Piki doesn't, but does have Wi-Fi. Piki starts by sending a HTTP GET request to Iris.
Learning to use Kismet. https://www.venea.net/man/kismet(1) -- UNIX style man page. https://www.kismetwireless.net/docs/readme/intro/kismet/ Use the … menu to get a sidebar with the toplevel table of contents. It opens on Introduction; pick subsequent topics. Kismet needs user kismet, group kismet, homedir /var/lib/kismet . To start the client, just kismet, no command line arguments. It emits INFO messages, and tells you to connect a browser to http://localhost:2501 . You need to specify an internal login, saved in executing user's homedir, ~$USER/.kismet/kismet_httpd.conf This is called the administrator login and PW. In the pop-up, pick settings, and set something. I ended up not changing anything. Now how do you add a data source? Example: kismet -c rad0 Then web browser to http://localhost:2501/
You also need to put the interface in monitor mode. Possibly only on particular chipsets like Atheros. Possibly mac80211 driver can let monitor and managed modes coexist, if the NIC supports it. Install package aircrack-ng (SuSE name and many other distros) and run airmon-ng (there's a man page).
Trying another approach. On Xena I put rad0 in monitor mode: airmon-ng start rad0 It complained that NM, wpa_supplicant and avahi-daemon could change rad0 back to managed mode and I should kill them, which I did. To check: ls /sys/class/net/ ; look for rad0mon. Now you can run tcpdump and see packets, but they're encrypted, can't see payloads) Most are beacons.
Test procedure: Start tcpdump on Xena. Have Piki download a video file. Stop tcpdump. Dig through a zillion garbage packets. Specific setup steps:
Next try: turn off pikien (en0) and try again. Speed 90 Kby/s, high ping losses to pikiwf (rad9), elapsed 16 sec. Packetvore is active. tcpdump captured 25 packets (probe requests) from CouchNet-Beaver, and 742 packets total, none dropped. I have a feeling that not all packets are being reported by rad0mon.
Duh, the channel wasn't set and it was receiving the default channel (probably 1), which had little activity. Command line on Xena: iwconfig rad0mon channel 11 . In 8 sec it received 802 packets of which 78 were identified as CouchNet-Beaver. Most or all were beacons.
Trying the download again with Xena rad0 in monitor mode on channel
11. Results as before, packets eaten. So what did I get this time?
6802 packets captured (vs. about 1000 can hold the whole file),
0 dropped. 275 packets identified as CouchNet-Beaver, all were
beacons. curl elapsed 15sec at 94 Kby/s,
Beaver rad0 MAC is 00:13_ef:5f:0c:3c . 1166 packets had this MAC,
all were tagged as Acknowledgment
.
Piki rad9 MAC is b8:27:eb:81:b0:46 . 1166 packets had this MAC.
I can distinguish Beaver vs Pikiwf sendings from the signal strength.
They come in sets:
In Wi-Fi Evolution on CouchNet in 2021 I did ping tests and found that the round trip time was a lot longer, like about equal to the beacon interval of 100ms, when the AP transmitted first and the station replied, compared to the station transmitting first, where the round trip time was typically 5ms (extremes: 1.2ms to 19.4ms).
If Piki to Beaver communication were somehow limited to one packet per 100ms, the data rate would be about 1.5e4 by/s, compared to the actual 9.3e4 by/s (varies ± 1e4 by/s), so I think this asymmetry in being able to initiate a connection is a red herring. However I'm checking it out further.
Setup for the following tests:
Test results:
Downloading the 1.82e5 byte file and running tcpdump:
Back to tcpdump.
Testing next day: with Piki en0 down, and Piki geneve.J restarted by hand so gen0 actually exists, Piki downloads 1.82e5 by from Beaver at 2.90e6 by/s which is a very plausible speed. No effect on Xena. Beaver downloads 1.82e5 by from Piki, ditto except only 1.34e6 by/s.
Piki downloads 1.46e6 by from Beaver at 3.22e6 by/s, no Xena prob.
Piki downloads 4.53e7 by from Beaver at 3.30e6 by/s, no Xena prob.
I think this sucker is fixed! The problem is (suspected to be)
that Geneve spuriously failed to start immediately after hostapd.J
started. Got to fix that.
Selen connects to CouchNet-Piki.
These sites could be connected to: Wikipedia, Google front page,
NOAA Weather, SCEDC Recent Earthquakes, KUSC front page.
These sites timed out, DNS failure suspected: Jacinth front page,
Jimc's home page on Jacinth, Jimc's site on Claude, Home Assistant
on Dragon, MyUCLA patient portal, SuSE package search, Packman.
Got past DNS but didn't finish loading: Amazon.com, Fidelity.
Now I'm setting up Beaver and Piki for semi production.
Deployment checkout:
I'm transferring Xena's Wi-Fi to Piki and trying to diagnose the routing issues seen on Selen.
Selen associated with Piki, tcpdump on Piki looking for anything coming from Selen. What is it getting stuck on, particularly for local LAN clients?
packet too bigpackets.
Test plan: First turn on DF on the Geneve tunnel, br0, rad9, rad7, en0; this will prevent the payloads from being fragmented, and per RFC 1191 the flow containing them should adapt to Geneve's bearer MTU. If that doesn't help, set explicit MTUs on various NICs.
Fakeout: MTUs were calculated assuming Ethernet header length. Per Gast, Matthew S, 802.11 Wireless Networks: The Definitive Guide (2nd ed.), O'Reilly 2005-04-xx, ISBN 9780596100520 , an 802.11 data frame has these fields:
When Selen is associated with Beaver, but no traffic should be going through Geneve, Selen has extreme but less than 100% packet loss when trying to play music. With Geneve turned off on Beaver, it works fine.
My next test will use a radically reduced MTU on the Geneve tunnels of 1024 bytes, both Beaver and Piki. Both Beaver and Piki can't get IPv6 addresses… because the lower bound for MTU on IPv6 is 1280 bytes. Fixing this with -M 1380, so the Geneve tunnel can legally transmit IPv6 (with its MTU of 1322 by). Now br0 on Beaver and Piki have DHCP6 addresses.
With all that straightened out, Piki can download the 1.46e6 by file at a speed of 1.04e6 by/s which is pathetic but is not subject to the clog-ups seen before. Playing music on Piki from Jacinth: Failed, VLC could not connect.
More testing: Selen is associated with Beaver and tries to play music resident on Jacinth, URL = https://www.jfcarter.net:1447/~jimc/music/music.cgi/Hindemith_Music/2_Hindemith_Sym_Matamorphoses.m3u Wonder of wonders, it plays successfully. Changing to Piki: Android believes it's not connected to the Internet. VLC could not retrieve the above URL. Editing the URL of the music index page to http://jacinth.cft.ca.us/… -- could not load it. Could not load Jacinth's front page either.
I made Xena associate with Piki. The VPN connected to Jacinth via Piki. All downloads succeeded (same as when going via Holly). Turning off the VPN. That provoked problems. 1 download succeeded, to en.wikipedia.org, 9.75e4 bytes at 2.14e4 by/s (very slow). The rest all timed out. Soon thereafter, Wikipedia gave 0 length responses with a code of 200 (robot defense?)
Conclusion: there is something crazy with Piki stations, that does not affect downloads done by Piki itself.
Interesting observation: gen0 MTU=1322 (bearer packet MTU=1380), br0 MTU=1322 (minimum of all members?), rad7 MTU=1500. What would happen if I set rad7 MTU=1322?
ip link change dev rad7 mtu 1322worked.
Another test: Selen associated with Beaver. Internet was not complained about. I was going to try to play music, but it loaded part of the music index page, froze, and timed out.
Today's test: going through Beaver step by step. Piki is associated with Beaver. Xena is associated with Holly (not being tested).
Let's look carefully at the MTU issue. When geneve.J creates a tunnel (on Piki), it sets MTU=1380 for the bearer packets, and the tunnel NIC comes out with MTU=1322 which is 58by less (IPv4 size). On Beaver the bearer has the default of MTU=1500, and the tunnel NIC is 1442, 58by less (same as on Piki). What MTU is really needed? I'm assuming that DF is on and that senders will adapt their packet size to the available MTU. The MTU of pikiwf and of Beaver:rad0 needs to be -ge the bearer size. (T) The MTU of both tunnel NICs is set automatically to 58 bytes less, and this automatically sets both br0 to the same (or smaller) size. The MTU of Piki rad7 needs to be equal to the tunnel MTU.
Rather than reducing the MTU, let's try increasing the tunnel MTU to accomodate 1500by payload packets. No, that's no going to fly, because Beaver rad0 has to accomodate the packets too. Trying to get this right:
normalstations, all with the default MTU=1500.
Normalstations connect to it.
I executed the above plan.
systemctl restart systemd-udev-trigger.service
ifup rad9does set it UP but with NO_CARRIER, and no IP address can be assigned. Changing STARTMODE=manual.
rad9: ACS-COMPLETED freq=2437 channel=6, ACS-ENABLED, AP-ENABled. (Is it sending beacons? I don't see them.)
ACS: Survey is missing noise floor(5 times)
ACS: Channel 1 has insufficient survey data(6 lines repeated 11 times for channel 1-11.)
ip link set dev gen1 mtu 1500But if I do the same for rad9 (mtu to 1558) it doesn't change. Trying to explicitly set MTU-=1558 in /etc/sysconfig/network/ifcfg-rad9 .
After an update, I tried deploying the current hostapd.J on Holly. Not wise.
channel=11but even so it started ACS (Automatic Channel Selection). Invariably hostapd logged, for each legal channel,
ACS: Survey is missing noise floor, 5 times, followed by
ACS: Channel 1 has insufficient survey data. It is likely but not assured that the NIC firmware is not capable of doing ACS (and in the past I have never succeeded on any of my NICs).
channel=11was moved early in the conf file, after
ssid=CouchNetand before
hw_mode=gand
ieee80211n=1, and channel 11 was used with no ACS attempt. Before the fix, it was a lot later, after
ieee80211h=1and a lot of other parameters. Before an unknown update in an unknown package this did not result in an ACS attempt. I'm using hostapd-2.10-2.10.aarch64 (failing). The order of parameters in the conf file is the same as found in /usr/share/doc/packages/hostapd/hostapd.conf (the example conf file with all known parameters annoted), except for a very few most of which are mentioned above.
systemctl start hostapd.J; the conf file was identical, but this time bogus ACS was attempted. I don't see any difference between the command line I used and the one in the systemd unit file; I made the command line be logged (-v). Oooo, there's a big difference! It's not appending the generic conf file that has
channel=11, which would fully explain why hostapd was attempting ACS.
Now continuing with testing the plan for MTU. Not successful.
ip link change dev rad9 mtu 1558it says
Error: mtu greater than device maximum.(And similarly with
ip link set rad9 mtu 1558.) But setting the MTU to a lower value is allowed. rad9 is not in br0.
Testing non-DF on the bearer channel, in fact, non-DF everywhere, which is the default.
ip route get 192.9.200.203puts it on br0 (Geneve) but it could equally go on rad9. Xena can ping piki (br0) and pikiwf (rad9) on IPv6 but not IPv4.
To finish before deploying Piki:
ip route add default via 192.9.200.193 dev br0. With a default route for IPv4, Xena can now ping piki and pikiwf and the answers will come back to Xena. Piki gets an IPv6 default route (on both rad9 and br0) by Router Advertisements on the LAN. Piki's IPv4 address is static; if it were DHCP the default route would come in that way. [Done]
iwconfig rad9also reports a signal level of -76dBm, which is also pathetic. Xena (through Beaver) can't successfully ping Piki, and I couldn't shut down Piki by systemctl; I had to just turn off power.
Location | rad7 dBm | rad9 dBm | Ring dBm | Notes |
---|---|---|---|---|
Laundry | -79 | -76 | -- | No communication |
Piano | -70 | -71 | -- | Camera barely works |
Rice Cooker | -46 | -62 | -65 | Works, speed 14.4Mbit/s |
Dishwasher | -62 | -63 | -70 | Intermittent signal loss |
Curio Cabinet | -50 | -58 | -62 | Works, speed 6.4Mbit/s |
First the good news: Piki (near the dishwasher) seems to be fully operational. Sometimes. Ring camera 2 connects to it and does normal stuff with the mother ship. Xena can do SSH to piki and pikiwf. But testing with ping (IPv4+6) from Xena, much of the time its packet loss rate is low (1% loss or less), but it gets into episodes of high loss (over 50%) and also gets into a catatonic state from which it does not recover; it does send beacons at -57dBm to -66dBm, but no ping answers or any other commmunication. In one case that was timely observed, onset was sudden from prior low packet loss state, and both pikiwf (rad9) and piki (rad7) dropped out at the same time. This lets out bugs in rad7's firmware as the culprit, but availability of Piki depends on pikiwf, so issues on rad9 could cause the observed symptoms. In another incident it recovered from 100% packet loss to 90%, but after a few seconds it went back to 100%.
I put a signal monitor on Piki:rad9 and let it run overnight. It's reporting the signal seen by Piki when connecting to Beaver, every 1.4 secs, plus whether Jacinth (via Beaver) responded to ping4. Here's a summary of a histogram of signal strength.
normalrange of signal strengths between -67dBm and -80dBm. About 1% of successes had a signal strength of -76dBm or worse, even one at -85dBm. About half of the faiiures were similar while the other half had higher signals which usually allowed a success.
Conclusion: for Piki:rad9 to Beaver:rad9 (both onboard NICs), for the dishwasher location the signal strength varies from -65dBm to -85dBm with -71.6dBm being the average and -75dBm being the minimum for consistent succesful transmission. (But one success at -85dBm was seen, plus others on low signal.) So I should try another location. I wish I knew why the signal strength varies so much.
The new location atop the curio cabinet is a big improvement, and is going to be the final move. On a monitor run, Piki failed to ping4 Jacinth 11 times out of 37577, in 3 clusters (at 21:50, 09:52, 10:23) all having reduced signal strength.
Three days later:
Wi-Fi service was switched over to Beaver, including moving the Alfa AWUS036ACM to Beaver (and Terow AC1200 to Holly). Net interfaces:
Testing: when Selen or Xena connect to Beaver or Holly, they can perform videos perfectly (Katamari Star, Wave Surfers, Big Buck Bunny). General use is normal for either. Except…
At irregular intervals, Beaver gets scrambled. I can't be sure if Holly behaves the same. Tested by: Xena connected (Wi-Fi) to Beaver (mostly) or Holly (sometimes). Xena pings beaver + beaverwf + holly. Packet loss rate to all three is similar. Most of the time the packet loss rate is 0.1% or less. It increases variably to 50%-90% lost, lasting 1 to 2 minutes. This didn't happen during testing with videos. (Not saying that videos immunized it; rather, the packetvore did not attack during testing.) Usually 15-30 min passes between attacks, but sometimes as low as 5 min.
Next step is to fix bogosities and retest: Beaver en0 IP, and Holly rad9 Geneve uplink being in br0.
bridge=br0from generic.conf to 80211n-CN.conf and 80211n-Holly.conf, and omitting completely in 802.11-Geneve.conf, and similarly on Beaver (for which Geneve is currently turned off).
Starting up Piki. (About 21:31:00) It did not come up. Moving it so I can hook up en0. rad9 ESSID is CouchNet-Geneve, which is sending beacons (per WiFi Analyzer on Selen), but rad9 doesn't associate. Holly sees it associate and immediately dissociate, every 30 sec. Piki sees ringc2 do a complete association on its AP, then it dissociates after 15 sec, and retries after 30 sec. I'm assuming that ringc2 tried and failed to get a DHCP address because Piki's uplink was hosed. Similar behavior has been seen on Selen in the past but it actually says it couldn't get an address.
How I'm leaving it: Piki powered off. Beaver; hostapd.J running, geneve.J disabled. Holly: both hostapd.J and geneve.J disabled. This at 22:33. At 22:48 there had been 2 ping packets lost; no packetvore. The packetvore showed up at 22:51; this time it is a total outage for 3 min (a record). Lasted to 22:59, then came back without any interventions.
Now switching back to only Holly at 23:09. So far so good.
Collecting info and planning how to gather more info about what's wrong.
Overnight, Piki was powered off, Beaver was awake but hostapd.J and geneve.J were disabled+stopped. Holly hostapd.J was running, SSID was CouchNet, but no geneve.J . Bearer NIC (rad9) had no carrier because the Geneve AP on this NIC was disabled. Holly has the Terow NIC. No (known) error in Wi-fi service.
Next plan: a 4-way contest: Beaver vs. Holly and Alfa AWUS036ACM vs Terow AC1200. With sketchy records, I think all four are going to avoid the packetvore. But I need to re-test with defininte records. I'll have Piki's Wi-Fi uplink (normally bearing Geneve packets) to connect to CouchNet and monitor when it goes down. Per past experience there should be plenty of dropouts in a 2-3 hour test. en0 will be connected, for access to Piki when Wi-Fi goes out.
Xena can now ping pikiwf (and pikien), IPv4+6. Packet loss is moderately good.
The tester is called ~jimc/bin/net-assess-wifi , currently on Piki. Design outline:
10 packets transmitted, 10 received, 0% packet loss…Extract the number received.
Signal level=-57 dBm. See Piki's signal.sh .
Outcomes: In all cases the listed combination was the only access
point active, and no Geneve. On both hosts, rad9 was up
but
not associated with anything (normally has the Geneve AP, now
disabled).
Holly + Terow AC1200: Start 18:00, 137 min, no packetvore.
Start 12-05 21:45. about 3 min after starting, the system went into a massive packetvore attack, with loss rates almost always over 50% and 90% at least half the time. Did not revert until a reboot. Wi-Fi Analyzer on Selen reports CouchNet beacons at -45 dBm, and none of my other APs were transmitting. Neighbors were all worse than -75dBm. So it's (probably) not caused by neighbors. Still going on at 22:07; I rebooted Holly. After reboot it's back to normal. For about 1 min, then choppy packet loss, lasting 2 min, then returned to zero packet loss. Next at 22:14:20 I stopped hostapd.J, un+replugged the Terow, and restarted hostapd.J. Almost zero packet loss. Going to bed.
Start 12-07 23:30 after giving up on Beaver + Alfa AWUS036ACM.
This time it ran for 39.5 hours with 4 incidents of 8/10 pings
received and 1 of 9/10 pings, until the test was terminated.
Is holly+Terow perfect
? Next test will be Holly+Alfa.
Beaver + Alfa AWUS036ACM: 62 min, no reports produced, but there was about a 5 min interval of choppy packet loss, likely not reaching 40%, and Alice complained of a Wi-Fi problem. For overnight, switched back to Holly + Terow.
Giving Beaver + Alfa another chance. With an improved tester. Starting at 12-06 17:00, summary at 20:45, 2 incidents with more than 2 of 10 ping packets lost. The number of packets received per 30sec was not significantly different from the surrounding error-free tests, indicating no packet storm. In the one that I saw on another monitor, the packet loss rate was elevated for 15-30sec and then self-healed. I'm leaving this to run overnight. Tomorrow I'll try swapping the NICs. Oops, at 20:55:30 it went into high loss rate mode including several stretches of 100% loss. Stopping and starting hostapd.J brought it back to normal. However, 2 min later it had a 90 sec stretch of irregular high packet loss. More of the same at 21:39. This isn't going to fly overnight, reverting to Holly + Terow.
Holly + Alfa AWUS036ACM: Started 12-09 14:00. At 16:00, perfect performance. At 17:00, 1 8/10 ping fault. Continuous monitoring shows very few packets lost. Stopped test at 1800, switching to Beaver+Terow.
Started again 12-09 22:20 to test Alfa overnight. As of 12-10 15:00 it had run for 16hr 20min and had one 8/10 ping fault.
Beaver + Terow AC1200: Started 12-09 18:06. Until 21:15 (3+hr) there was one ping fault, 7/10 received, and another caused by switching over to Beaver (doesn't count). Continuous monitoring shows very few packets lost. 22:11:30 and 22:12:00 had 2 separate major glitches, 1/10 and 0/10. Switching to Holly+Alfa for overnight.
Final conclusion: The packetvore lives in Geneva. If I can get wired Ethernet to Piki, and if I forget about a range extender using Geneve, I can get Wi-Fi working right and can terminate this time sink.
Planning | Testing | Setup | Wi-Fi | Top |