waynes :
They are getting advantage from a stripped down functionality circuit the question is what else is it useful for. If it is only useful for a narrow range of things, then what is the use of dwelling on it.
Because machine learning is interesting, as well as GPU computing and the benefit they got by going with an ASIC.
waynes :
It mentions limited instruction set, so it obviously is not completely hardwired, and they are tailoring it for more functionality for future editions. This extra functionality is likely at increased circuit complexity, which maybe would take some of the speculated speed improvements out of it, and reduce the amount of processing power per area compared to what could be.
Maybe, but you don't really know what constitutes the primitives. If the instructions are like computing the tensor product between A and B, then the programmability is probably adding very little overhead.
waynes :
But what if you want to make it more functional again, to do everything but 3D/graphics, then these metric advantages will again become less, but this is the sort of circuit we really do need working alongside a GPU graphic card. I call this a Workstation Processing Unit. GPU manufacturers can work on something like this from their GP-GPU experience alongside their GPU lines.
They already do that. Nvidia has the Tesla cards, which have GPU without any video output. In the case of the P100, the graphics circuitry seems to have omitted from the chip, entirely. That said, I don't know what proportion of a modern GPU is occupied by the raster engines, but it might not be very much.
GPUs can't really compete with integer-only machine learning ASICs. A GPU must have lots of fp32, and that's going to waste a lot of die area, if you're only using integer arithmetic. I expect that Nvidia is already working on dedicated machine learning chips. If they built an inference engine like Google's, on the same scale as their current GPUs, it would stomp Google's effort into the ground. And bury it.