Welcome Back Hyper-Threading!

Moonskin

Distinguished
Jan 30, 2007
49
0
18,530
Here's the link

And here's in another link

I wonder ... now that it's clear that Intel will not have an answer to Barcelona in 2007 was gonna happen …

If Penryn will se the light only in Q1 2008 there’s not much sense to launch a new architecture in Q3 2008 is there? …

So maybe we will stuck with the Penryn vs Barcelona duo longer than we’ve initially expected … (I’m referring to some roadmaps placing Nehalem release in middle of 2008 … seems more like 2009 to me now ).

So only if Barcelona will be competitive enough ... maybe AMD may just save the day (and the year :D ... a year that didn’t started well for them).

And what’s all this: “we have Hyper-Threading, but will only utilize 4 threads” stuff … sounds more like a marketing trick than like a certain fact. I think Intel is playing a waiting game for now and keeps its options wide open; only AMD can screw up right now :twisted: . Surely the next step Intel makes will be more or less influenced by what AMD will be doing with Barcelona.
 
G

Guest

Guest
There still seam to bee quite a few more transistor then SSE4 and 2 meg would account for. I would not be surprised to see HT and it would probably be pretty usefull in this architecture.

So yeah let's flip a coin!
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
There still seam to bee quite a few more transistor then SSE4 and 2 meg would account for. I would not be surprised to see HT and it would probably be pretty usefull in this architecture.

So yeah let's flip a coin!
I strongly doubt about it.
 

darkz

Distinguished
Jan 24, 2007
28
0
18,530
I strongly doubt about it.

Apps have to be multithreaded anyway to utilize dual and quad core CPUs fully - and HT gives real measurable perfomance gains in multithreaded apps, so why not?
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
I strongly doubt about it.

Apps have to be multithreaded anyway to utilize dual and quad core CPUs fully - and HT gives real measurable perfomance gains in multithreaded apps, so why not?
Because the real, measurable performance gains you are talking about are all rated on the basis of HT applied to netburst architecture. Those hot, little more than frequency generators, really got a bust from HT, because it relieved some of the inefficiency of a 31-stage pipeline, but bare in mind that a Core 2 core is almost 2 times more efficient than a P4 core and 26% more efficient than a K8, so I am really skeptical on what HT is going to improve in a 45nm Core2.
HT can only improve an inefficient architecture, but a CPU core is (at least nowdays) designed for maximum efficiency, and when such efficiency is mostly achieved, HT does nothing but splitting and re-merging data streams, before and after processing. An architecture that fully utilizes one core, leaves nothing or little bandwidth on a second, virtual one and synchronization penalty of the two, tends to be heavier than the actual boost.
 

qcmadness

Distinguished
Aug 12, 2006
1,051
0
19,280
I strongly doubt about it.

Apps have to be multithreaded anyway to utilize dual and quad core CPUs fully - and HT gives real measurable perfomance gains in multithreaded apps, so why not?

But whether the resources are well shared is a critical question.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
There still seam to bee quite a few more transistor then SSE4 and 2 meg would account for.

LaGrande. :(

It was only a matter of time before DRM infected our day-to-day lives with "Buy DRM or use pen and paper."

TPM = DRM Hell...

DRM, as a concept, isn't bad. It's the 300-lb gorillas using DRM to scavenge every penny to get another $2 million bonus at my expense that piss me off. /rant
 

darkz

Distinguished
Jan 24, 2007
28
0
18,530
HT can only improve an inefficient architecture

Wrong, HT is about executing code from a 2nd thread when the 1st thread is idle due to for example a cache miss. Are you saying C2D has no cache misses and predicts all branches 100% perfectly?
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
In poor graphics words:
P4HT.jpg

this is HT on a P4; the greatly unused resources get pretty strightened; 2 operations or what else may fit in a single one.

C2HT.jpg

but we get a different story on a Core2; resources are perfectly utilized by the core and there's little, if no space at all for a second thread.
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
I'm not saying HT will bring no benefits at all, but in many cases, thread synchronization may hurt more than help in such efficient architecture. If HT is coming in the game again, it means architecture squeezing; Intel may not have something architecturally new to bring and are planning to add HT peanuts.
 
G

Guest

Guest
I though LaGrande was not for DRM afterall. That it could still be used for that but the main reasoning behind it was to isolate the hardware from the software with another layer to make it more secure.

That being said if content provider can use LaGrande for DRM purpose they will for sure. Which will indeed be a sad day!

@m25, you make some valid point, but Ht like in IBM chip seams to work fairly well and they're not 31 stage chips. Also someone mentionned the fact that Core2 can retire 4 to 5 instruction/clock compared to 2 for the P4. I could see HT comming in handy.

Anyway, HT doesn't take that much space on the die, sin intel has enough cache and good manufacturing, it might be really cheap to implement and might give a few % boost. I guess we will see pretty soon!
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
They're trying to keep it low-profile, but LaGrande is all about DRM.

It's complicated, but in a nutshell the OS determines who has access to secure content, such as private keys used for decoding. Because the decoding happens on the chip-level, it's never actually available to the OS to 'snoop'.

The reason they delayed this from coming out with Presler and Conroe is because they hacked it before it came out. Although the processor decodes the data, the key is stored in the TPM module, outside of the processor. A soldering iron, some home-grown circuits, and a few days of free time, and they were able to intercept the key going between the TPM module and the processor. (If I remember correctly. I can't seem to find the article).

Anyway, it's s'posed to provide security to prevent things like viruses and the like from infecting your computer, but I think it'll do more harm than good. Once viruses get into your secure rings, you're f*cked because YOU don't have access to those rings (only the OS). At least now when I get a virus I can spend 30 minutes cleaning it off. With TPM, I won't have access.

In a nutshell, they're trying to downplay it. The DRM capability is there, and it will be utilized. Apple is currently refusing to support TPM, even though they 'went intel'. I tend to think TPM will only succeed in enterprise environment. Even then... what a pain in the a$$.
 

jt001

Distinguished
Dec 31, 2006
449
0
18,780
I always thought of HT as a marketing gimmick, it never gave me any noticable difference in anything I did with my computer but that's just me.

I'd have to agree with m25, the architechture is already efficient on core 2. As far as I see HT wasn't a "feature" it was a fix for a bad(inefficient) design. While it may make a small difference, it won't be as beneficial as it was on P4 and since I never noticed any difference anyway, I'm not too excited.
 

gallag

Distinguished
May 3, 2006
127
0
18,680
I strongly doubt about it.

Apps have to be multithreaded anyway to utilize dual and quad core CPUs fully - and HT gives real measurable perfomance gains in multithreaded apps, so why not?
Because the real, measurable performance gains you are talking about are all rated on the basis of HT applied to netburst architecture. Those hot, little more than frequency generators, really got a bust from HT, because it relieved some of the inefficiency of a 31-stage pipeline, but bare in mind that a Core 2 core is almost 2 times more efficient than a P4 core and 26% more efficient than a K8, so I am really skeptical on what HT is going to improve in a 45nm Core2.
HT can only improve an inefficient architecture, but a CPU core is (at least nowdays) designed for maximum efficiency, and when such efficiency is mostly achieved, HT does nothing but splitting and re-merging data streams, before and after processing. An architecture that fully utilizes one core, leaves nothing or little bandwidth on a second, virtual one and synchronization penalty of the two, tends to be heavier than the actual boost.

i thought i didnt have anything to do with pipe lengh and that the 4-wide execution core will help ht
 

zenmaster

Splendid
Feb 21, 2006
3,867
0
22,790
No, Pipe Length is the key factor here.

That is the primary reason why AMD never introduced HT.

The shorter pipe-line meant that HT would have helped much less in the cases it does help. Add to that the inherent overhead it would create and it becomes far less desirable in CPUs with a shorter pipe-line like the C2Duo or AMDs chips.
 
Which is why AMD chips are far superior to Intel at this time. Its not about clock speed, nor about synthetic benchmarks, its about real world performance with low latency pipelines.


C2D is the biggest gimmick ever.



/filling in for Baron.
 

Whizzard9992

Distinguished
Jan 18, 2006
1,076
0
19,280
lol.

C2D is all smoke and mirrors :p

Although I believe the P4's long pipeline made it a good candidate for HT, I don't believe you can say that it won't benefit from it. There are a LOT of processors out there that use HT (it's just not called HT), and they've been in the server sector for many years.

A good example is the triple-core processor with HT in the Xbox 360. The PowerPC chips generally have shorter pipelines (not counting the G5). There's definately a benefit to virtual threads on the hardware level. I think it's more the added complexity HT poses from a hardware and software level that makes it a questionable feature.
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
@m25, you make some valid point, but Ht like in IBM chip seams to work fairly well and they're not 31 stage chips. Also someone mentionned the fact that Core2 can retire 4 to 5 instruction/clock compared to 2 for the P4. I could see HT comming in handy.
Anyway, HT doesn't take that much space on the die, sin intel has enough cache and good manufacturing, it might be really cheap to implement and might give a few % boost. I guess we will see pretty soon!
I know there's no problem for Intel to reintroduce HT, even because it's an almost 4-year old (pretty ancient by today's standards) technology. The fact is that it's neither about instructions / clock, nor transistor count; The CPU has the ability to pair different instructions within the same cycle and it does not need HT to execute them simultaneously; that's why a single core athlon64 is as responsive as a HT enabled P4.
Even if Core2 can work on 4-5 instructions in one cycle, it does a pretty good job and in most cases does not leave space for another virtual thread, and when the benefit of such thread is so small, it comes very close to the penalty it gives.
20 stage Northwood P4 had a maximum boost of 18-20% from HT
31 stage Prescott P4 gets as much as 30%
Core2 is clock/clock about 90% more powerful than a Prescott P4
The key concept here is that HT does not make the miracle or adding computing power; it just throws in some part of the many operations leaking through these pipelines. Had AMD benefited from it, they'd just have re-branded HT like Intel did with AMD64 and used it to push their own chips, but they just don't need it. Du you think they haven't made any experiments or calculations?!