Is Intel pulling a Fermi?

JAYDEEJOHN · Nov 20, 2009

We all saw earlier nVidia showing a card that wasnt there, even have pictures of its CEO holding up a fake.
Earlier this week, Intel showed off what some believed to be LRB. Some even wrote about it as such
http://www.theregister.co.uk/2009/11/17/sc09_rattner_keynote/
While others took the time to see the others pulling a "mistake"
http://www.computerworld.com/s/article/9140949/Intel_to_unveil_energy_efficient_many_core_research_chip?taxonomyId=1
Its not hard to see someone was pulling something or other

jennyh · Nov 20, 2009

Fermi vs Larrabee...which one will fail the hardest?

JAYDEEJOHN · Nov 20, 2009

Better question, where are either?
Both showing other stuff for promoting their "real" solutions?
It appears the numbers Id pulled earlier show that LRB, if it performs as this "Polaris" model does, will be 3+ times better than a 280.
Knowing that Fermis DP is 5.8 times as good as the 285, it looks like Fermi at this point, and thats with Fermi and their disappointing clocks.

JAYDEEJOHN · Nov 20, 2009

Whatd really be funny is, if they both keep waiting it out, and showing other things besides what they really want to sell, is if ATI continues forging ahead and creates a killer gpgpu model with their current arch, and could just be that surprise coming in January as rumors go

jennyh · Nov 20, 2009

Wait did I read this part correct :-

"On the SGEMM single precision, dense matrix multiply test, Rattner showed Larrabee running at a peak of 417 gigaflops with half of its cores activated (presumably the 80-core processor the company was showing off last year); and with all of the cores turned on, it was able to hit 805 gigaflops. As the keynote was winding down, Rattner told the techies to overclock it, and was able to push a single Larrabee chip up to just over 1 teraflops, which is the design goal for the initial Larrabee co-processors."

So an overclocked larrabee with all cores on can do 1 teraflop single precision? Thats only lke 5x less than the 5970 at stock? The interesting part being it seems Larrabee is having heat issues and that is why they are running it with half cores?

Lol this is going nowhere. Did I read something about $10bn spent on Larrabee so far? What a staggering waste of cash on a total lemon...this is worse than itanic by far.

jennyh · Nov 20, 2009

Where are the double precision numbers?

Also...who is going to bother with this lol? intel actually expects people to retrain and change the way it's been done for many years...for a slower product?

Fermi did kill Larrabee, it's over for intel in that regard.

JAYDEEJOHN · Nov 20, 2009

I'll find the figures, but LRBs SP isnt that far from their DP figures.
Either way, youre right, but, part of their "showing" for whatever this really is, is that itll shut down certain parts and cool, while re-intiate other parts that are cooled from non use.
Seems to me, a few years back, some thought something at rest was a waste, and certainly foolish to use it as a promotional feature, to me, this looks the same.
Also, Ive seen info where LRBs scaling starts to fall off at over 32 cores, so its no wonder they "showed" only 40 working, as anything higher wouldve shown poor scaling, which is another LRB "feature", its great scaling abilities.
Looks like theres alot to do, and did I mention drivers?

ElMoIsEviL · Nov 20, 2009

That wasn't LRB, that was a research prototype CPU. Intel has been working with several other vendors in creating prototype multi-core processors.

ElMoIsEviL · Nov 20, 2009

jennyh :

Maybe neither of the two will fail. Why do you presume failure? Give us some valid reasons please... because without valid reason your comment is unreasonable.

jennyh :

Assumptions, assumptions, assumptions. So what makes you think Larrabee has heat issues? You state that the processors was running at half cores activated and then assume that the only plausible reason would be that it is having heat issues (when it could theoretically be a whole slew of plausible reasons). You don't have enough information to draw such conclusions. Furthermore you end by claiming that an unreleased product is similar to a released product when we don't know how the unreleased product will perform. You're relying on blind faith (or blind hope) of a failure.

jennyh :

Larrabee is MIMD therefore the Double Precision numbers won't be far off the Single Precision numbers. You end your comments with a rather peculiar statement. You claim that unreleased product A has killed unreleased product B. Can you please elaborate as to how unreleased products are capable of such things?

Thank You.

JAYDEEJOHN · Nov 20, 2009

And its to the point of this thread.
Both are unreleased, yet both companies keep "showing" its capabilities.
If they keep waiting, ATI could slip in with something here, and more later, which from what Im reading wouldnt be a huge diversion from their current HW, which is superior to both actually, sans the ECC requirements, but thats only a portion of this market requiring the ECC, tho a lucrative part of it.
The numbers Ive seen "leaked" by both companies, going by this, the other info we talked about earlier Elmo, and what nVidia is claiming, is that nVidia has a slight lead at this point.
Now, LRB Im sure has some polishing to do, but not alot, as we all agree, Intels process is tried and true, and not alot more there to do, beides some drivers etc
meanwhile, at 5,8 x that of their old HW, nVidia has also clock problems, which is rumored at 20%.
I know this talk is as phoney and speculative as these "showings", but using what weve been allowed to "see", or "leaked". it does appear Fermi is ahead at this point

"For instance, if only a certain number of cores are needed for a job, they could run while the other cores are idle. When the cores being used start to heat up, they could be shut down and cool idle cores could take over running the job. "
http://www.computerworld.com/s/article/9140949/Intel_to_unveil_energy_efficient_many_core_research_chip?taxonomyId=1&pageNumber=2

To me this sounds much like SMT in ways, and LRB has been said to do anything its required to do, as its totally non fixed function, which has its drawbacks in certain scenarios, same for SMT, when the app is asking too much, it just wont work, or in LRBs case, either will downclock, or just be slower.
Thats the design, not approach, excluding the non FF part.
The approach is being touted as user/dev friendly, but again, theyve writted several new languages for it, so at this point, its another CUDA to me.

ElMoIsEviL · Nov 20, 2009

JAYDEEJOHN :

And its to the point of this thread.
Both are unreleased, yet both companies keep "showing" its capabilities.
If they keep waiting, ATI could slip in with something here, and more later, which from what Im reading wouldnt be a huge diversion from their current HW, which is superior to both actually, sans the ECC requirements, but thats only a portion of this market requiring the ECC, tho a lucrative part of it.
The numbers Ive seen "leaked" by both companies, going by this, the other info we talked about earlier Elmo, and what nVidia is claiming, is that nVidia has a slight lead at this point.
Now, LRB Im sure has some polishing to do, but not alot, as we all agree, Intels process is tried and true, and not alot more there to do, beides some drivers etc
meanwhile, at 5,8 x that of their old HW, nVidia has also clock problems, which is rumored at 20%.
I know this talk is as phoney and speculative as these "showings", but using what weve been allowed to "see", or "leaked". it does appear Fermi is ahead at this point

"For instance, if only a certain number of cores are needed for a job, they could run while the other cores are idle. When the cores being used start to heat up, they could be shut down and cool idle cores could take over running the job. "
http://www.computerworld.com/s/article/9140949/Intel_to_unveil_energy_efficient_many_core_research_chip?taxonomyId=1&pageNumber=2

To me this sounds much like SMT in ways, and LRB has been said to do anything its required to do, as its totally non fixed function, which has its drawbacks in certain scenarios, same for SMT, when the app is asking too much, it just wont work, or in LRBs case, either will downclock, or just be slower.
Thats the design, not approach, excluding the non FF part.
The approach is being touted as user/dev friendly, but again, theyve writted several new languages for it, so at this point, its another CUDA to me.

But that research chip is not Larrabee. I see the articles in this thread pointing to an Intel research chip, and somehow, it's failings being attributed to Larrabee (which is not the research chip in question).

ElMoIsEviL · Nov 20, 2009

The 80 Core prototype is an old project: http://www.bit-tech.net/news/hardware/2007/01/18/intel_builds_80-core_prototype/1

Dating back to 2007.

Intel's research team has managed to successfully produce a prototype 80-core Tera-Scale processor that uses less energy than the company's current flagship Core 2 Extreme QX6700 quad-core processor.

The prototype was built so that the chip giant's researchers could investigate the best way to make such a large number of processing cores communicate with each other. This was in addition to researching new architectural techniques and core designs.

The chip, dubbed the Tera-Scale Teraflop Prototype, is just for research purposes and lacks a lot of necessary functionality at the moment. However, R&D Technology Strategist Manny Vara said that the company will be able to produce 80-core chips en masse in five to eight years.

Currently, the prototype chip consumes less than 100W of power, which is less than the 130W consumed by the quad-core QX6700. Of course, the prototype currently lacks some key functionality, which could potentially throw the power consumption characteristics out of proportion, but it's an impressive feat nonetheless.

Vara added that although there are many more cores on the Tera-Scale prototype, they're a different type of core than the ones used in today's microprocessors. "The new ones will be much simpler. You break the core's tasks into pieces and each task can be assigned to a core. Even if the cores are simpler and slower, you have a lot more of them so you have more performance."

Today's microprocessor cores are very flexible, while Intel believes that tomorrow's microprocessor cores will be much more specialised, but of course, there will be many more of these simpler cores. AMD's Fusion project appears to be going down the route of scaling what we've already got, while Intel is moving towards what would be a more flexible approach to energy efficiency.

Before you get too excited though, this is all on paper at the moment; the real war of the cores won't be decided until both companies have released their respective massively multi-core processing architectures in a few years time.

JAYDEEJOHN · Nov 20, 2009

On the SGEMM single precision, dense matrix multiply test, Rattner showed Larrabee running at a peak of 417 gigaflops with half of its cores activated (presumably the 80-core processor the company was showing off last year); and with all of the cores turned on, it was able to hit 805 gigaflops. As the keynote was winding down, Rattner told the techies to overclock it, and was able to push a single Larrabee chip up to just over 1 teraflops, which is the design goal for the initial Larrabee co-processors.
http://www.theregister.co.uk/2009/11/17/sc09_rattner_keynote/page2.html

JAYDEEJOHN · Nov 20, 2009

These tests are showing hard numbers tho/ The question is, will this be LRBs numbers?

ElMoIsEviL · Nov 20, 2009

JAYDEEJOHN :

Oh I see the mistake there...

(presumably the 80-core processor the company was showing off last year)

He is assuming that Larrabee is the Tera-Scale processor Intel was working on last year (which it isn't) the two are separate projects.

The performance seems to be right where it should be. Although RV870 can theoretically hit up to 2.7TFLOPs, in practice it can only hit up to 1.3TFLOPs (you can view that here: http://www.beyond3d.com/content/reviews/53/1).

And much like Larrabee different mathematical workloads show different performance figures as you can see in the picture above.

If Larrabee hits those numbers it will be more powerful than Fermi (based on the nVIDIA white paper here: http://www.hardocp.com/article/2009/09/30/nvidias_fermi_architecture_white_paper/).

jennyh · Nov 20, 2009

Ok so it's Polaris that was being demo'd here, not larrabee.

Why are intel demo'ing a test chip and not larrabee then? If anything that is worse. Wasn't larrabee supposed to be out Q4 2009?

ElMoIsEviL · Nov 20, 2009

jennyh :

In the Register article they name the chip Larrabee but Intel's 80 Core prototype chip for HPC wasn't and hasn't been known to be Larrabee (but rather Tera-Scale).

As for when Larrabee will be released.. sometime in 2010. I think they're working to perfect their manufacturing process to get the most out of Larrabee.

JAYDEEJOHN · Nov 20, 2009

Capable, even in these synthetics, and deliverable are 2 different things, and is why Nehalem is also being mentioned in some discussions regarding LRB.
Plus theres the loss as well, in BW or whatever else.
Nehalem cant deliver what its said it can, but does well.
The numbers are closer than we think, and from what Ive read, Fermi is ahead.
Its approach as well as capacity, and why I used those other numbers in my previous thread, as those are real world, where thruput, latencies are all being done on the same app.
Going from those numbers, and then reading the nVidia white paper puts a whole new look upon Fermi IMO

ElMoIsEviL · Nov 20, 2009

JAYDEEJOHN :

The RV870 test I showed above was a synthetic test (much like Larrabee is being run on synthetics right now).

In Single Precision Fermi is theoretically capable of around 1.25TFLOPs- 1.7TFLOPs. The lower figure is taking into account their CEOs Claimed Double Precision Figure of around: 625GFLOPs and both taking and not taking into account the current clock speed issues that have been described here: http://www.semiaccurate.com/2009/11/16/fermi-massively-misses-clock-targets/.

I have been mentioning this often (under the pseudo name z3r0c00l here: http://forum.x c p u s.com/nvidia/18565-gf100-real-world-technologies.html) but others are starting to take notice as well: http://www.brightsideofnews.com/news/2009/11/17/nvidia-nv100-fermi-is-less-powerful-than-geforce-gtx-285.aspx

According to specifications based on NV100 A2 silicon [subject to change], C2050 will deliver 520 GFLOPS of IEEE 754-2008 Dual Precision format and 1.040 TFLOPS of single precision. C2070 stands a bit better, 630 GLOPS of Dual-Precision and 1.26 TFLOPS in Single Precision.

These are all theoretical numbers (not real world). It has always been the case that real world numbers end up lower than theoretical numbers.

Fermi doesn't look that great IMHO (well Double Precision wise it looks nice but as a consumer computational device.. it looks dreadful).

jennyh · Nov 20, 2009

I'm assuming this 80-core test chip is actually more powerful than Larrabee is tbh.

Otherwise, they'd have demo'd larrabee.

I mean come on - this is supposed to be an imminent release and we basically havent seen anything about it at all?

Either it's underpowered or it's nowhere near release, and the same goes for Fermi too.

JAYDEEJOHN · Nov 20, 2009

Ive read those numbers, and as Ive said, it comes down to approach now.
We know the gpu numbers work, as theyve been done before, but we dont know that LRBs will, as the chart I showed in the other thread shows the fall off above 32 cores.
Also, we dont know how well the communication is, or thruput will be, but we do on the gpus.
We dont know how well the SW will work on LRB, but again, we do with the gpus.
Weve already deducted any loss from Fermi/280 or whatever 5870 (tho the x2 would simply double those numbers and leave them all in the dust). but we havnt with LRB, which is only taking capability in thought.
Thats what Im saying here.

BadTrip · Nov 20, 2009

jennyh :

Wasnt Bulldozer supposed to be out Q2-3 2009?

http://blogs.zdnet.com/BTL/?p=5766

jennyh · Nov 20, 2009

BadTrip :

JAYDEEJOHN · Nov 20, 2009

For gaming, LRB with no FF parts wont play well, and tho it may seem an accomplishment, when needed, the wait, switch capablity of LRB is limited at best.
Regarding gpgpu usage, it may or may not work better, again, I point towards approach.
The apps listed in the register article wont be using only half of LRB, and no ocing will be possible, and even underclocking may be needed, plus any other latencies involved, vs a product thats been around for a decade, tho using a newer approach, is only to be better than what weve had before.
So, if Fermi is made to beat the 5870, it wouldnt surprise me to see higher numbers from it as well

fazers_on_stun · Nov 20, 2009

BadTrip :

LOL - excellent comeback!! +1

It must be a full moon out tonight, as JDJ & Jenny are off on their monthly "woe is Larrabee" rant again. Makes you wonder - if it's as bad as they keep insisting it is, why are they so afraid of it??? Hmmm???????

And as for demoing unreleased products, nothing can be lower than Randy Allen's flogging Barfalona as "40% faster across a wide variety of workloads" some 9 months before it was released, and the truth (25% slower than Core2) was revealed

...

Is Intel pulling a Fermi?

Champion

Splendid

Champion

Champion

Splendid

Splendid

Champion

Distinguished

Distinguished

Champion

Distinguished

Distinguished

Champion

Champion

Distinguished

Splendid

Distinguished

Champion

Distinguished

Splendid

Champion

Distinguished

Splendid

Champion

Splendid

Share this page