Google's Machine Learning Chip Is Up To 30x Faster, 80x More Efficient Than CPUs And GPUs

Lucian Armasu · Apr 5, 2017

Google revealed more details about its first machine learning chip, the Tensor Processing Unit (TPU). According to Google, the chip has 15-30x higher inference performance than a Haswell CPU and an Nvidia Tesla K80 GPU, and it is 40-80x more efficient.

Google's Machine Learning Chip Is Up To 30x Faster, 80x More Efficient Than CPUs And GPUs : Read more

Amdlova · Apr 6, 2017

yeah and it only do one shxt with GPU u can do anything. next please

RomeoReject · Apr 6, 2017

I have no idea what you're talking about, dude above me. This sounds amazing.

Vendicar Decarian · Apr 6, 2017

"GPU u can do anything" - Amdlova

Well.. No. If GPU's could do anything in a practical sense, then there wouldn't be any need for CPU's.

Just as GPU's are optimized for graphics rendering, AI chips are optimized for AI computing. This one happens to be optimized for Google's AI methodologies.

Good for google.

This trend of moving software algorithms to hardware will continue as the limits of Silicon computation are reached. It is a natural consequence of trying to squeeze more computational power out of a technology that has reached it's limits of raw computational power in the form of traditional CPU design.

The advantage of these new kinds of chips is massive parallelism optimized to suit a specific task.

Neural Network processing for example can make use of litterally hundreds of billions of parallel computing elements each taking the place of a single neuron.

Put enough of those things together with some interneuron communication a little internal memory and you have yourself a simulated brain.

You can certainly simulate such things on CPU's and even GPU's but neither are well suited to the task since both are far, far too course grained to produce rapid simulation results.

The trick with massive parallelism is in who gets what messages, and when, and where is the memory locaed and how is it accessed.

Vendicar Decarian · Apr 6, 2017

"dude above me" - Romeoreject

The dude above you is a moron.

"This sounds amazing." - Romeoreject

The Google algorithms seem to work pretty well for pattern matching.

What the production of the ASIC tells you is that Google is confident enough in the methods used and see enough speed advantage in the methods used, that they are willing to commit to the production of an ASIC to implement much of those methods in hardware - which can not be modified once produced.

Do you think the NSA uses similar ASIC's to monitor you?

bit_user · Apr 7, 2017

Vendicar Decarian :

Don't know, but I also don't care. The level of sophistication of their monitoring technology isn't the issue.

What I do care about is China's monitoring of non-Chinese. I think that's far more likely to impact me, personally. Maybe not for 5 years or more, but it could be even worse than all these ad networks tracking us.

waynes · Apr 17, 2017

They are getting advantage from a stripped down functionality circuit the question is what else is it useful for. If it is only useful for a narrow range of things, then what is the use of dwelling on it.

It mentions limited instruction set, so it obviously is not completely hardwired, and they are tailoring it for more functionality for future editions. This extra functionality is likely at increased circuit complexity, which maybe would take some of the speculated speed improvements out of it, and reduce the amount of processing power per area compared to what could be. But what it does mean, is that of will still be a lot more powerful and efficient then present. But what if you want to make it more functional again, to do everything but 3D/graphics, then these metric advantages will again become less, but this is the sort of circuit we really do need working alongside a GPU graphic card. I call this a Workstation Processing Unit. GPU manufacturers can work on something like this from their GP-GPU experience alongside their GPU lines.

bit_user · Apr 17, 2017

waynes :

Because machine learning is interesting, as well as GPU computing and the benefit they got by going with an ASIC.

waynes :

Maybe, but you don't really know what constitutes the primitives. If the instructions are like computing the tensor product between A and B, then the programmability is probably adding very little overhead.

waynes :

They already do that. Nvidia has the Tesla cards, which have GPU without any video output. In the case of the P100, the graphics circuitry seems to have omitted from the chip, entirely. That said, I don't know what proportion of a modern GPU is occupied by the raster engines, but it might not be very much.

GPUs can't really compete with integer-only machine learning ASICs. A GPU must have lots of fp32, and that's going to waste a lot of die area, if you're only using integer arithmetic. I expect that Nvidia is already working on dedicated machine learning chips. If they built an inference engine like Google's, on the same scale as their current GPUs, it would stomp Google's effort into the ground. And bury it.

Search

Google's Machine Learning Chip Is Up To 30x Faster, 80x More Efficient Than CPUs And GPUs

Lucian Armasu

Contributing Writer

Amdlova

Distinguished

RomeoReject

Reputable

Vendicar Decarian

Prominent

Vendicar Decarian

Prominent

bit_user

Polypheme

waynes

Honorable

bit_user

Polypheme

TRENDING THREADS

Latest posts

Moderators online

Share this page