Is Intel pulling a Fermi?

JAYDEEJOHN · Nov 20, 2009

We all saw earlier nVidia showing a card that wasnt there, even have pictures of its CEO holding up a fake.
Earlier this week, Intel showed off what some believed to be LRB. Some even wrote about it as such
http://www.theregister.co.uk/2009/11/17/sc09_rattner_keynote/
While others took the time to see the others pulling a "mistake"
http://www.computerworld.com/s/article/9140949/Intel_to_unveil_energy_efficient_many_core_research_chip?taxonomyId=1
Its not hard to see someone was pulling something or other

C4PSL0CK · Nov 24, 2009

JAYDEEJOHN :

We went from 1 Frame per (Couple) Hours to 14 FPS with that Larrabee demo. Full Ray-tracing isn't its only forte. As I've mentioned too many times Larrabee's flexibility will also allow alot of things that weren't possible before.

"Larrabee will include very little specialized graphics hardware, instead performing tasks like z-buffering, clipping, and blending in software, using a tile-based rendering approach.[8] A renderer implemented in software can more easily be modified, allowing more differentiation in appearance between games or other 3D applications. Intel's SIGGRAPH 2008 paper[8] mentions order-independent transparency, irregular Z-buffering, and real-time raytracing as rendering features that can be implemented with Larrabee. "

So you need to have Crysis run at 30 FPS with 4xAA at 25x16 with RT to be impressed?

Running Crysis WARHEAD Enthusiast Mode with only 2xAA at 25x16 with rasterization on a HD5970 nets 23FPS. I think you might have set your expectations a little too low, Larrabee should defeniantly be able to do atleast over 9000 FPS, no?

http://www.guru3d.com/article/radeon-hd-5970-review-test/16

Well, drivers are indeed important but it isn't the old GMA team behind it anymore. I'm not sure how this will turn out but we'll see I guess.

2 cores also puts it in a different price market. I'm not interested in +300EU cards, I can assure you many agree. It'll be cards such as HD5850 and GTX360 that'll compete with Larrabee. Bang for bucks. Intel surely understands that price is a very important factor when you're a new player.

http://nexgadget.com/2009/11/03/intel-to-offer-larrabee-graphics-at-preferential-prices/

JAYDEEJOHN · Nov 24, 2009

Good example of changes from G200 to Fermi
"Bear in mind that Intel flew heads out of Oak Ridge with their otherwise superior Core 2 architecture after Woodcrest had a reject rate higher than 8% [the results of that Opteron vs. Xeon trial in 2006 are visible today, as Oak Ridge is the site of AMD-powered Jaguar, world's most powerful supercomputer]. In order to satisfy the required multi-year under 100% stress, nVidia Tesla C2050/2070 went through following changes when compared to the Quadro card [which again, will be downclocked from the consumer cards]:
Memory vendor is providing specific ECC version of GDDR5 memory: ECC GDDR5 SDRAM
ECC is enabled from both the GPU side and Memory side, there are significant performance penalties, hence the GFLOPS number is significantly lower than on Quadro / GeForce cards.
ECC will be disabled on GeForce cards and most likely on Quadro cards
The capacitors used are of highest quality
Power regulation is completely different and optimized for usage in Rack systems - you can use either a single 8-pin or dual 6-pin connectors
Multiple fault-protection
DVI was brought in on demand from the customers to reduce costs
Larger thermal exhaust than Quadro/GeForce to reduce the thermal load
Tesla cGPUs differ from GeForce with activated transistors that significantly increase the sustained performance, rather than burst mode. "
http://www.brightsideofnews.com/news/2009/11/17/nvidia-nv100-fermi-is-less-powerful-than-geforce-gtx-285.aspx

JAYDEEJOHN · Nov 24, 2009

And then what? If it competes with the third highest out, and then we see a doubling of output, itll be woefully behind.
I say a 100% increase with the 28nm process because of the HKMG inclusion.
So LRB gets relegated to low/mid within a year of its release.

Mousemonkey · Nov 24, 2009

@jennyh, please refrain from the personal insults.

JAYDEEJOHN · Nov 24, 2009

Lets keep anything personal out of this, ok?
We can learn from each other with level heads

jennyh · Nov 24, 2009

Mousemonkey :

My guess is you and Jennyh share the same IQ pool, double digits.

What about this w***ers insults, one against a respected forum mod no less? Or are you not seeing it that way?

What is the difference from suggesting two people share a double-digit IQ pool and calling somebody a retard? And deleting my post too? You're a biased ***** mousemonkey.

JAYDEEJOHN · Nov 24, 2009

Thats LRBs problem here, it comes in with superior process and HKMG, and loses it next gen.
Therell be no 22nm right away to save it either

JAYDEEJOHN · Nov 24, 2009

jenny, hes kept it fairly respectable, lets not turn it into a pi$$ing match.
Lets all just try to give what we know, and see if all things are truly being considered here, on both sides

jennyh · Nov 24, 2009

Fairly respectable? I don't appreciate insinuations from this capslock idiot.

Like putting some lipstick on a pig. It wouldn't change anything aswell. Surely, you understand?

So when I spend an hour writing a retort that proves he is full of ***, mousemonkey deletes it because he feels there is a difference between insinuating somebody (make that two people) is retarded and flat out saying it?

JAYDEEJOHN · Nov 24, 2009

I did some editing in his posts as well.
Let it go, and lets move on

C4PSL0CK · Nov 24, 2009

JAYDEEJOHN :

http://www.semiaccurate.com/2009/11/09/tsmc-slips-their-32nm-process/
http://www.semiaccurate.com/2009/11/16/ati-58xx-parts-delayed-bit-more/
http://www.semiaccurate.com/2009/11/20/tsmc-rumored-have-killed-their-32nm-node/

And if I'm not mistaken 28nm was planned for 2011 but I can see it being delayed to 2012. Ontop of that Intel has always has a process node advantage so I'm not sure how TSMC is suddenly supposed to deliver chips on a smaller process before Intel can. Especially with all problems lately TSMC has been having I can't see how LRB is in danger from a process point of view.

100% increase? That's speculation, you have absolutely nothing to back that up with. I could say LRB2 will be 200% better aswell. Infact it would be even more believable then your claim since LRB is its FIRST cGPU product. Most products know a very high amount of improvements in their 2nd iteration.

jennyh · Nov 24, 2009

Intel has always has a process node advantage so I'm not sure how TSMC is suddenly supposed to deliver chips on a smaller process before Intel can.

TSMC are already delivering 40nm, and have been for 9 months (note it was late as well so TSMC are well toward their 28nm). Intel are delivering...45nm. I can see how TSMC is suddenly supposed to deliver chips on a smaller process before Intel can, because they are already doing it.

Like you and others have said, but seem to forget when it suits you - intel have *all* the advantages here. Soon on 32nm (I bet you that Larrabee demo was 32nm too - on the top part so 32 cores). They already have HKMG.

ATI and Nvidia have all that to come. ATI also have a real working part out there compared to questionable demos from intel.

If this is the best intel can do with all their advantages, they are in real trouble.

JAYDEEJOHN · Nov 24, 2009

The HKMG have shown a 30% increase in overall core speeds, plus power lowering, its even higher.
Thats simliar expectations of a process shrink improvements, and nothing is left behind in gfx, so no, we wont see a lower power envelope here, thats for cpus, not gpus, so all that savings will be transfered to perf.
Add in the 28nm as wel, and we typically see a 40-60% increase in perf on a typical shrink, 50+50=100
So, again, this is gfx, not cpus, and yes, LRB will only be low/mid end if it only comes in at mid'high now by early 2011

JAYDEEJOHN · Nov 24, 2009

Take 2 gens in gfx, the highest rated card, say the 3870, compare it to a 5870.
Normally wed see only a 3870/4870 increase, but HKMG brings its own results as well.
Peoples minds need to be changed here, this isnt cpus any longer. Thats what Ive been trying to hammer home for quite awhile now

fazers_on_stun · Nov 24, 2009

Anandtech has an interesting take on HPC (since Jenny brought that up several times earlier):

Compared to the mobile and desktop market, AMD is doing relatively well in the server and HPC market. The early delivery of the six-core Opteron (codenamed Istanbul) enabled Cray to build the fastest supercomputer in the world (at least for Q4 2009). It's called the the Cray XT5-HE “Jaguar” with 224162 cores, good for almost 1.76 million GFlops. The Opteron EE made heads turn in the low power cloud computing market, and the six-core Opteron is a good price/performance alternative in the rest of the server world. And last but not least, the 4-socket 84xx Opterons are the unchallenged champions in the quad socket world.

Nevertheless, AMD’s position in the server and HPC market is seriously threatened. An impressive 95 out of the top 500 supercomputers contain Intel's "Nehalem-EP" Xeon 5500 processors. Intel’s star has been rising fast in the HPC market since the introduction of the Intel Xeon 5500. Intel’s Nehalem EX is almost ready to attack the quad socket market. And there's more.
...
So far AMD has countered Intel’s higher “per core” performance with 50% more cores. Indeed, the six-core Opteron can keep up with the Xeon 5500 in quite a few applications. But Intel is readying a slightly improved six-core version of the Xeon 5500 series called Westmere-EP in the first half of 2010. Being a 32 nm high-K dielectric CPU, the six-core Westmere-EP wil offer about the same power consumption with six-cores under load as the quadcore Xeon 5500 (Nehalem EP). At idle, Westmere-EP will consume less (14 to 22% less leakage). Westmere-EP’s architecture is identical to that of the Nehalem EP, with the exception of a 50% larger L3 cache (12 instead of 8 MB) and support for special AES instructions.

AMD’s best core in 2010 is a slightly improved revision of the current six-core Opteron “Istanbul” with the following additions:

• Finally a “real” C1E state which reduces power for each core that is idleing
• Support for DDR-3

In theory, DDR-3 1333 offers 66% higher bandwidth, but in practice the Stream benchmark does not measure more than a 25% boost in bandwidth. The latency of going off-die is about the same. That means that the performance increase in most server applications will not be tangible. Only the most bandwidth intensive HPC applications will get a boost of 10 to 20%.

Currently, AMD's six-core Opteron can match the performance of Intel’s quadcore Xeon 5500 at the same clockspeed in some important server applications: OLAP databases, virtualization and web applications. Intel’s best Xeon wins with a significant margin in OLTP, ERP and rendering. A large part of the HPC market is a lost cause: a quadcore Intel Xeon 5570 at 2.93 GHz is about twice as fast as a AMD Opteron 2389 at 2.9 GHz. The fact that we could not find any Opteron 2435 results in LS-Dyna is another indication of what to expect: the 10-20% higher performance in HPC applications will not be a large step forward.

Intel is going to increase performance by 20-30% per CPU (50% more cores), while AMD’s CPUs will see only marginal increases. So basically, Intel’s performance advantage is going to grow by 20 to 30%, except in HPC workloads where it is already running circles around the competition. Not an enviable position to be in for AMD.

Suppose that you are the strategic brain behind AMD. The competition offers better “per chip” and “per core” performance. The last thing you want to do is to offer the same kind of server platform. If a six-core Opteron (“Lisbon") goes head to head with a six-core Xeon (“westmere EP”), it will not be pretty: the Intel chip will beat the AMD chip in performance and performance/watt (remember, westmere EP is a 32 nm CPU). Despite this, AMD found some clever ways to make their server platforms interesting…

And the "interesting" part - you guessed it - AMD is taking the el cheapo route once again

...

jennyh · Nov 24, 2009

Also, it's not just TSMC but globalfoundries too.

Both of them will use full and half nodes, intel only use full nodes. That means that ATI and Nvidia will have a constant node advantage when on the half node (assuming globalfoundries and TSMC catch up).

More likely is it will be what it is now. Intel moving to 32nm within a couple of months, AMD going 32nm in a year, but also having 28nm for ATI at the same time. Then it's another year before intel go 22nm.

It's a small advantage overall for intel right now, but only so long as they stay ahead on process. A far bigger increase is due with HKMG for ATI and Nvidia - assuming TSMC get their act together at 28nm also that is.

Overall it strongly favours ATI, who have two processes to fall back on next year. Nvidia might do also but it is unlikely they will use glofo now.

JAYDEEJOHN · Nov 24, 2009

For the HPC market, Im afraid cpus will see more and more Fermi/5xxx/LRB intrusions, and cpus will be left behind, until we see the fusion type chips out.
So, if Anand was being truly honest here, he wouldnt be only focusing on cpus.
As for servers, itll depend on usage/needs, and Fermi/gpgpu types also invade this area as well, and take marketshare away, introducing a new player, shrinking the pie, as well as any gpgpu derivative

Or, to be more precise, the gpu isnt dead at all, and will take marketshare

jennyh · Nov 24, 2009

@ Fazers - there is nothing wrong with competing on price when you are under pressure on power. AMD have made their own market out of doing that. The real question is, what will intel do when interlagos hits and they find themselves with catching up to do?

As JDJ says, there will be more gpu's entering that market. If the Chinese can go straight to #5 using 4870's, where do you think they'll get with 5870's instead? Anything new and better is generally going to hurt intel in supercomputer sales because like you're keen to point out - intel own most of the market.

JAYDEEJOHN · Nov 24, 2009

Im going to say this one more time, awhile ago, I wasnt recieved well with my "opinions" on why LRB is coming, why gpgpu usage is important, Intels need for LRB, and what it all means.
Glancing at just this thread should tell you all this, and how important all this is.
It also opens up a few points not considered by some cpu enthusiasts, such as the growth/perf rate seen in gpus, which translates directly to gpgpu usage.
The approach Intel is taking, tic tock is to slow, not a fast enough pace.
When everyones finally on the same process, gpus will have a 1 gen advantage here, using LRBs and Intels tic tock strategy.
What this means is, either LRB has to be 12 months ahead of gpus at ist inception, or anything less, and itll immediately and always be playing catchup IMHLO

yomamafor1 · Nov 24, 2009

Respectable? LOL!!

fazers_on_stun · Nov 24, 2009

jennyh :

I'd say AMD had better pray TSMC starts getting higher yields on their 40nm process

.

Seriously, how many 4870's did the Chinese computer use, what was the power draw and the cost? And don't HPC computers use ECC or some other form of error-prevention? How is that done with off-the-shelf GPUs, assuming that's what the Chinese did?

Besides as the Anandtech article pointed out, those 95 Xeon HPCs, out of the top 500 supercomputers, will practically be gauranteed

to switch to the Westmere EP since it'll be a drop-in replacement.

And finally, about AMD taking the 'value' segment: Intel can do the same. According to the same AT article:

AMD created a very “cool” niche market with the 40W ACP (60W TDP) Opteron EE. Large power limited datacenters bought these CPUs in quantities of a few (and more!) thousands at once. Just a few months ago, Intel also introduced a 45 Watt Xeon L3426 at 1.86 GHz based on their Lynfield core (LGA1156 socket). Considering that AMD’s ACP numbers are rather optimistic and Intel’s TDPs are rather pessimistic, the 8-thread quadcore 1.86 GHz L3426 ($284) makes the six-core 1.8 GHz Opteron 2419EE look expensive ($989). The former can push its clock up to 3.2 GHz under single threaded loads, and is thus a really interesting option if your application has a significant part of non-parallel code.

When your most expensive and thus profitable product starts getting demolished by the competitor's part at < 1/3rd your selling price, expect to either slash prices & profits on your part or lose the entire segment in short order.

jennyh · Nov 25, 2009

I'd say AMD had better pray TSMC starts getting higher yields on their 40nm process .

Seriously, how many 4870's did the Chinese computer use, what was the power draw and the cost? And don't HPC computers use ECC or some other form of error-prevention? How is that done with off-the-shelf GPUs, assuming that's what the Chinese did?

Besides as the Anandtech article pointed out, those 95 Xeon HPCs, out of the top 500 supercomputers, will practically be gauranteed to switch to the Westmere EP since it'll be a drop-in replacement.

Just like Kraken, Roadrunner and Jaguar will be guaranteed to stick with Opterons. Also, Intel will not take the top spot, or probably the top 3 spot back for the forseeable future. Not even with 32nm and more cores will intel be able to beat Jaguar.

All the biggest and fastest HPC's use AMD, because AMD is the best. Intel lost that Jaguar position with Cray I do believe....who'd have thought it.

Something like 5k 4870's in that Chinese server btw. I have no idea what it cost or how much power its drawing, all I know is that somebody somewhere thought it must be worth it and it seems they were right.

C4PSL0CK · Nov 25, 2009

Ok, I didn't want to break my NDA but I've seen LRB. Yeah, that's right. The 32-core is actually 800% faster then the HD5970. The reason they disclosed a 2009 release is because they went straight to 22nm. That's right, that's DOUBLE THE SPACE!!!! And then they added HKMG 2.0. The improved HKMG that's triple as bestest and has been patented so it can only be used by Intel because Intel is the best.

They decided to go with a 128-bit ohh... word just came in they just increased that to 256-bit wide VPU lanes. It runs at +5Ghz but can be overclocked to 10Ghz on air alone. Slightly high TDP tho marking 85w at 100% load, frankly I must say I'm dissappointed but whatever. They also happened to increased the ring-bus to 4096-bit (2048 two-ways) buses to hide latencies when cores are ramping up so lineair scaling drops off only after 320 cores.

The're planning to release it together with a motherboard with 2 Larrabee-Chip compatible socket and a couple of QPI to connect them. That's right, Xeon-style. This all will just cost us 99EU. That's right, 8x the performance of HD5970 (16x when overclocked to 10Ghz) but cheaper then a H5750.

Also the reason AMD fails is because Intel rocks, just so you know. ATI and nVidia will never be able to catch up because Intel's architecture is the FUTURE. That's right, so by the time ATI and nVidia switch to a similar architecture Intel will already have an unbridgeable advantage. So the're doomed to end up in the same gutter as AMD.

The source, who needs one? I just told you, trust me. Afterall, isn't that what you guys have been doing all along?

JAYDEEJOHN · Nov 25, 2009

Ummm, 40nm, no HKMG, reports of LRB=280, ummm no
The only hard link to find would be the LRB=280 one, but theyre there.
My links on LRBs loss over 32 cores are here, my links on Fermi are here, my guesstimates on gfx cards, and process, plus increases can be verified as well as the HKMG usage.

JAYDEEJOHN · Nov 25, 2009

Off topic (wonder why heheh) @ fazers
The Advanced Micro Device (AMD) emerged as a big winner in the recent 2009 sever purchase of China Mobile, as AMD Opteron-based servers account for more than 70 percent of the signed deals.

Those 2-way and 4-way systems will be provided by major sever manufacturers such as HP, IBM, Dell and Dawning.
http://www.chinadaily.com.cn/china/2009-11/24/content_9031876.htm

Also
http://forum.beyond3d.com/showpost.php?p=1362724&postcount=1677
Now, theres no hard numbers, and no, its only opinion, but a good one at that, but, when youre talking 9x perf per watt, no socket worries/changes etc, its a nobrainer, gpus are owning HPC in the future, where applicable, which is a huge part of this market, simply because of its nature

Is Intel pulling a Fermi?

Champion

Distinguished

Champion

Champion

Titan

Champion

Splendid

Champion

Champion

Splendid

Champion

Distinguished

Splendid

Champion

Champion

Splendid

Splendid

Champion

Splendid

Champion

Distinguished

Splendid

Splendid

Distinguished

Champion

Champion

Share this page