Bulldozer and Piledriver that followed it (including the FX-8320, FX-8350, and FX-9590) are all 4 module, 8 thread parts. There's a great diagram on Wikipedia: https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
It clearly is labeled as "Block diagram of a 4 module design with 8 integer clusters" - and the diagram clearly shows how the pairs of cores share common resources, namely the 64K L1 cache, instruction decoder, and the FPU. Each integer cluster has its own 16K L1 cache and the modules have a dedicated 2MB L2 cache, while the entire CPU shares 8MB of L3 cache. Compare this with the Intel i7 chips which also have 4 FPUs. How useful this design is heavily depends on the workload. If it's going to be a lot of floating point calculations, like Folding@Home or any scientific application or modelling application, then it will run like a 4 core part. If it's going to be a lot of integer calculations, then it runs better.
It is fundamentally correct to say that Bulldozer/Piledriver is a 4M/8T part, just like Core i7s (for desktops) are 4C/8T parts. The trouble is when people call AMD's part an 8 core CPU. It's simply not.
Also given the power used by the FX-8350 (125W TDP) and the faster FX-9590 (4.7GHz, 220W TDP), the fact that Intel can completely outperform these CPUs with an 84W TDP envelope is impressive. Of course AMD was forced to go to such high TDPs because their design is so outdated it was the only way they could compete with any of Intel's 4C offerings, even the lower-end i5 CPUs.