XP 4GB
The step from 2GB to 4GB, even if XP can only access 3-3.2GB (or similar), is a valid one. Yes, there is a that difference which remains unused, and so it may appear to be not cost effective when buying more than 3GB, but it is usual to get to 4GB by various common (and logical) combinations of memory modules.
Most do not install 3 x 1GB or 1 x 1GB + 1 x 2GB modules to get 3GB when using XP, they'll tend to buy 1-2 x 1GB or 1-2 x 2GB to start with and add modules with the same SPDs at a later time.
Quite likely that dual channel may be a factor as well, rather than install 1 x 2GB or 1 x 4GB into DIMM0 on a 4 DIMM board, it is more likely to have 4 x 1GB or 2 x 2GB.
To put another way, even if 3GB is a more cost effective maximum for XP, most configurations will be 1GB, 2GB or 4GB.
If a system consistantly needs more than 2GB (yes, I can hear the hard drive grinding to dust), then it needs it, so XP with 3GB or 4GB, that sounds ok, either way.
800 vs 1066
Yep, if 1066 is run at a syncronous ratio on a 1333FSB board, it will be under clocked, but not many will purposely select a 1:1 ratio for this. More likely the ratio will be 3:2, 4:3, 5:4 (whatever) to eek a bit more performance from the module, tighten up timings as well, if possible.
It is not unusual to read reviews of over clocking the bus speed to the 450-475MHz range (1800-1900MHz FSB, I didn't find a 533MHz), however, the increase in bus speed is one way to over clock CPU, but the other is the multiplier, much like the ratio for RAM.
At 450MHz bus speed, the DDR2 1066 will still run at a syncronous ratio as under clocked (effectively DDR2 900MHz), but it is a faster 1:1 by virtue of the quicker bus speed. If it were DDR2 800MHz, it would need to be at 9:8 ratio to run at bus speed, which is over clocked for this module. Keeping it to 800MHz will need a 8:9 ratio.
There's plenty of 800MHz RAM that can be over clocked, but if you want to run a bus speed greater than 400MHz and not over clock DDR2-800 modules, the DDR2-1066 will have plenty of head room (as you've indicated) while you fiddle with the FSB and multiplier on your CPU.