CEVA Unveils Next-Generation Deep Learning And Vision Platform

Not open for further replies.


Sep 26, 2013
As you stated many times in the article:
"However, at least for now, GPUs are still the leading mainstream products that can fulfill the taxing performance requirements for fast neural network training. GPUs may not be the most efficient, but they tend to offer much higher performance in a single chip, and they are more readily available, which is why companies playing with machine learning still prefer them to the emerging alternatives".
I have to inform you that there are & DSP cluster design that can scale much bigger (multi cluster designs) as for instance ones from Tensilica top offering, I don't know about CEVA ones.
To keep fair let's assume they did use old TX1 development kit for comparation & that one only had ability of doing it with 32 bit (single precision). So let's cut the results comparation in half geting 2x performance per 12.5 less power consumption this is still fascinating & it brings us to the conclusion how it's also much cheaper to manufacture as their is no magical stick to get the power consumption 12.5x down. More significant DSP's are available as licensable IP's wile most GPU's remain property non licensable IP's. As I see a current Tensilica offering as the better choice & I am certain they can scale up to 4x full blocks (clusters) me be even more they are faster than Ceva offering & also have 8 bit suport so that practically makes them on pair with biggest GPU chips available today (performance wise) wile they would probably cost 8 times less to make & commercialy available to who ever needs them with up to 40x less price (as Nv & Intel tend to change that much for their "special purpose accelerators"). When you take all this into consideration you see how the DSP's are really winning the race in the large margins.


I don't see anything here that's so different from a GPU. I think they simply tuned the engines and on-chip memories for data access patterns common in neural networks.

Therefore, I'm predicting future generations of mobile GPUs from Nvidia, AMD, and ARM will narrow or eliminate the performance & power efficiency gaps vs. dedicated deep learning processors.

It supports 16-bit fixed-point precision, thus ensuring less than 1% degradation when running a network trained in a 32-bit floating-point environment.
Also, I'm curious how they arrived at this figure. Does it really apply to layers of all types networks of all depths and architectures?
Not open for further replies.