Because it is much cheaper to have 256-bit traces on a daughter board that is optional than forcing it onto much larger and much more costly motherboards - that's why.
Additionally it would not be viable to use Unbuffered, Non-Registered, memory modules in a configuration beyond 6 DIMMs, let alone 4.
If people want such architectures they can pay for it by building a Xeon or Opteron Server/Workstation Hybrid. The motherboards will set you back at least $500.
Also, doubling the memory throughput in the x86 or x64 architecture does not double the performance, since a typical cache has ~ 80% hit rate, and there are several cache tiers in the hierarchy.
For those that want greater performance in 3D then they can use a PCIe 2.0 x16 slot, or multiple of them, and pay for a GPU with far more transistors, etc (It is the same with doubling the interface throughput of PCIe x16 slots, it is only a very small part of the solution, typically only used to load compressed textures into video memory).
You could cost it out, and it would be easily beaten by a modern, yet far more affordable, GPU created using the same fabrication tech (i.e. 32nm).
Or basically, going from very poor graphics performance to 'slightly less than very poor' graphics performance by tripling the cost of the motherboard and adding another 380 pins to the CPU, which will cause it's cost to skyrocket, isn't something that is going to appeal to consumers versus just adding in a daughter-board style video card at 1/10th the cost.