Parallelism is not really "hard" encoded at the instruction level, at least not at an architectural level. If pertinent you can always add some new logical instructions(very very few suffice) for memory ordering/fence(HTM etc) as for instrumentation(profile), synchronization, prefetch and vector...
all ISAs can do that... IMO could be called "trashing" if not done carefully. x86 by far has the most of that, specially in vector, a whole lot of bloat.
Parallel ISAs conceived from start, we can say it was the Itanic from a control(branch) perspective, and PowerPC, ARM and Alpha from a memory ordering perspective.
One is dead(Alpha) other is as good as dead(Itanic)... and the other 2 were always meant for client/embedded, not big parallel servers, no matter how strange it might sound.
Relaxed memory ordering might had been considered has an easier to program model, not a parallel feature. Only Alpha view it as a way for better SMT Hyperthreading, but intel failed to have vision back then(otherwise all would be intel by now), and killed it after having acquired it(i think was on purpose). Funny thing they ended up crafting SMT Hyperthreading on top of the "mule" that caries everything x86 lol
Meanwhile many changes were done; SPARC has several models, Power (the parent) is different from PowerPC (yet compatible)... but x86 was always the "ugly duck" in the picture, full of quirks and cumbersomes, yet its memory model is one of the best (the best) for low memory space, perhaps that is why it catched on fire, when you have 640K to 1MB of memory, and this last one costs a fortune comparatively(it did back then)... you may have a winner.
Things are completely in reverse now... x86 is not the best for low power neither for high performance or parallel works... and memory is now by the tons with several Gigabytes transition inside a socket.
In the Gigabyte terms Steamroller Kaveri might be the first to go there... intel has Cristalwell but i think its not in the Gigabytes for now... AMD as already done it, sort of, with mobile GPUs
http://www.techpowerup.com/img/11-05-03/17a.jpg
Now they might be preparing to do it even more closer "inside the package". Yes i think it will be "dual" memory interface of sorts, but only the DDR3 interface for DIMMS will be for outside socket, the other will be "internal" and immutable.