P4 150% FASTER THAN A T-BIRD!!!

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Actually if you were smart you wouldn't have said that. The P4 is crap and can't compare very well even to it's predecessor, PentiumIII. It's not considered the best CPU.

On the other hand, the Slot A Athlon compared extremely well to P3. Every reviewing magazine, etc. considered it to be superior. Just because a new technology came out about a year after it's original release doesn't make it a "trick".

---------
I am the first and only one with a 16MB GeForce2 GTS graphics card! :smile:
 
I know you are an Intel spin doctor. For your info, I have been an exclusive Intel user until the vapor release of PIII (with a 5-year old core no less) + Ramubs fiasco. Now, I have no loyalty to Intel or any other. Just facts, no BS.

**Spin all you want, but we the paying consumers will have the final word**
 
ok, wusy sorry about the comment I've made, sorry hope I didn't make you cry or nothing, lol!

Btw, I see cpu's different than you ie,. you would smash it but why? it did it's job at the time, right? it's just that the software became demanding and programmers became lazy.

I don't know I just don't like cheap sh!t, but that's just me.

"AMD/VIA...you <i>still</i> are the weakest link, good bye!"
 
you guys all need to learn the joys of sex.
<b>GET OFF THE COMPUTER AND GET LAID!!
AMD DOESN'T GET YOU GIRLS! NEITHER DOES INTEL!
GET A LIFE!!</b>

----------------------
why, oh WHY, is the world run by morons?
 
I don't know where everyone got the idea that I don't overclock??

The "AMD Tbird 1.33" part in my signature refers to the original clockspeed of the CPU.


Unfortunately it seem's as if I got a "bad" CPU. It will "only" do 1636Mhz (12x136), after that the Cache1 can't keep up.

The cpu temps:
Idle - 31C
Full load - 37C

Perhaps rasing the VIO/VIO1 voltages could help, but I doubt it...

TBIRD 1.33@1.63
Asus A7M266
Vapochill
Elsa Geforce3
Apacer 256MB DDR266
Seagate X15
 
thanks for the info. its nice to have all kinds of code in your single app.
anyway this technique however good, adds to the code, and anyway uses "non-optimised" code for other processors. then whats the use of it? just that the app is SSE2 compatible!
a handy feature that, but does not address SSE2 compatibilty issue.

what you really need is the SSE2 emulator, just like we had the em87 long back, the 8087 emulator for AutoCAD 10 and earlier. witha software emulator for SSE2, we wont get much of performance improvement, but our older processors would be able to run apps for P4.

girish

<font color=blue>die-hard fans don't have heat-sinks!</font color=blue>
 
hi there!

dear pls read the posts carefuly before you reply, how many times do people have to tell you that.

who told you I work with 2 bit DOS code? I though you knew even DOS works with 16 bits. and you are right there - DOS is the most compatible platform, however backward it is. but who told you to work on DOS???

btw the world <b><i>is</i></b> moving SSE2 (but just not yet) and to 64 bits, but in what direction? IA64 or x86-64? I wont be surprised with your reply to this, 64 bits seem too much for you, and I dont think you know what SSE2 stands for... its SSIMDE-2. answer this next :wink: .

awaiting your reply...
girish


<font color=blue>die-hard fans don't have heat-sinks!</font color=blue>
 
did you try upping the voltages? If you put the core voltage to 1.85 you will probably get more.

---------
I am the first and only one with a 16MB GeForce2 GTS graphics card! :smile:
 
"however good, adds to the code"

It adds instructions to the executable.

"uses 'non-optimised' code for other processors"

Not so. It uses the best optimizations for whatever processor you have. If you have a Pentium it will use the newer instructions introduced in the Pentium. If you have a Pentium with MMX support, it will use the above plus MMX instructions. If you have a Pentium Pro, it will use the newer instructions introduced in this processor. If you have a Pentium 2, it will use all of the above, MMX and the new instructions available starting with the Pentium Pro. If you have a Pentium 3, it will use all of the above and additionally use the SSE instructions. If you have a Pentium 4, it will use all of the above plus SSE2. It would not fall back to older 386-compatible code unless it did not detect even a Pentium class CPU.

"witha software emulator for SSE2, we wont get much of performance improvement, but our older processors would be able to run apps for P4."

This is not required. Software emulation of SSE2 instructions would be no better than the non-SSE2 code produced in the executable. It is the existence of the instructions in hardware that provide the performance benefit.

-Raystonn

= The views stated herein are my personal views, and not necessarily the views of my employer. =
 
hi,

It adds instructions to the executable

thats what I said, it adds to the code, the code size obviously increases.

Not so. It uses the best optimizations for whatever processor you have.
1. If you have a Pentium it will use the newer instructions introduced in the Pentium.
2. If you have a Pentium with MMX support, it will use the above plus MMX instructions.
3. If you have a Pentium Pro, it will use the newer instructions introduced in this processor.
4. If you have a Pentium 2, it will use all of the above, MMX and the new instructions available starting with the Pentium Pro.
5. If you have a Pentium 3, it will use all of the above and additionally use the SSE instructions.
6. If you have a Pentium 4, it will use all of the above plus SSE2.
7. It would not fall back to older 386-compatible code unless it did not detect even a Pentium class CPU.

these are <b>seven</b> sets of code that a app must carry with it. i dont think it is practicle. no compiler should generate more than 3 (<b>three</b>) branches of the same code. you might have different sets of DLLs for different processors (just like you have DLLs for different languages)
but then this adds to the application size, adds to the cost, and you cannot write a generalised DLL for all kinds of applications to use.

and this make the software costly and incompatible with older processors, or slower on newer processors.

Software emulation of SSE2 instructions would be no better than the non-SSE2 code produced in the executable. It is the existence of the instructions in hardware that provide the performance benefit

thats what i said... there would not be much performance increase. by saying "not be much", I state that there will be *some* performance increase over non-SSE2 code. This is because the SSE2 emulator will be written only once, so it should be optimised for the platform, you may write a version each for 486, Pentium, P2, P3 etc, but you will write it only once and install it only once. This will make the system SSE2 capable, and this solution <b>must</b> produce better results than generating non-SSE2 code everytime an app is compiled. it would also be compatible with P4 applications, and that will reduce the development cost (and upgrade cost of your hardware) and ultimately the cost of the software.

I did not mean any performance benefit above, I just intended to state the compatibility and flexibility the emulator will provide.

girish

And the views stated herein are my own personal views, not made by prejudistic opinions, but just practical thinking.


<font color=blue>die-hard fans don't have heat-sinks!</font color=blue>
 
Yes I thought so, but I'm already at VCORE 1.85... Perhaps if I did the mod for the mb to be able to raise it further...

I probably won't. I'll wait for the AMD 4 and see what it can do instead...


Thanks for the input though.


/J

TBIRD 1.33@1.63
Asus A7M266
Vapochill
Elsa Geforce3
Apacer 256MB DDR266
Seagate X15
 
"these are seven sets of code that a app must carry with it. i dont think it is practicle. no compiler should generate more than 3 (three) branches of the same code."

The whole point of MMX/SSE/SSE2 is that a single instruction operates on a large list (vector) of data. A single additional instruction is not going to increase the size of your executable significantly. We're looking at a couple bytes at most per instance. I've already built some applications targeting in one instance all of the optional instruction sets I last mentioned, and in another instance just one specific CPU. The difference in executable size was less than 256 bytes.

-Raystonn

= The views stated herein are my personal views, and not necessarily the views of my employer. =
 
does that means that you write a app to use all possible instruction sets, and keep all the branches in the single executable? well, that would necessitate to check for the processor every time you need to use a MMX/SSE/SSE2 instruction. thats time consuming, but considering it would bring much performance improvement over that little jump it seems to be nice.

i agree all MMX/SSE/SSE2 are SIMD class instructions, but does it suffice to put both types of code (reducing 7 to 2) in a single executable? actually, i guess you will need different DLLs so that you could write branchless code and take maximum benefit of both instruction sets. if you have a sample, pls mail it to me.

i guess it will depend on the application whether to use inline code branching or different DLLs, and again what types of processors to support, whether support all 7 or just 2 of them - SSE or non-SSE2.

still I think the emulator approach will make a lot of work easier.

girish

<font color=blue>die-hard fans don't have heat-sinks!</font color=blue>
 
"thats time consuming"

It's at most one memory access to a variable that stores the CPU type. That's not really a big deal compared to the huge vector of data that's about to be operated on.

"does it suffice to put both types of code (reducing 7 to 2) in a single executable"

It seems to work perfectly fine. There's only a tiny increase to the size of the code (less than 256 bytes for a 120K executable.)

"I think the emulator approach will make a lot of work easier"

If you'd like to write an emulator I think that would be great. It would have to trap the invalid instruction exception that would get thrown when SSE/SSE2 instructions were encountered. It would then handle the exception by executing the instruction in software. Application execution would then continue after that instruction. Assuming the catching of processor exceptions is allowed by Windows, it would require a device driver. For Linux it would require a kernel patch.

This could work rather well. In fact, it could be installed on all machines regardless of what instructions they supported. If the instructions are supported in hardware, the exception will not be thrown, and the emulator will not be used. Thus, the emulator will not have to bother detecting your CPU type. Just make sure the emulator implements all the new instructions fully. The hardware will decide which ones it cannot handle by throwing the exceptions to your emulator.

-Raystonn


= The views stated herein are my personal views, and not necessarily the views of my employer. =
 
yes, thats time consuming, one access to a variable in memory and a jump around the non-SSE code. but this penalty is nowhere near the performance benefit SSE2 will offer. non-SSE2 code should fall through since it will suffer the performance hit. so that comparison+jump is well worth it.

what is this difference of 256 bytes for? does the application use SSE2 all the time, or just a few blocks of it. if it does use it all the time, then it would be better to write a separate DLL or a separate code block for it.

as for the emulator, it would surely make life easier for software developers and it would be a matter of writing a VxD for windows and a kernel patch for Linux. in fact, if I correctly remember, you can even program the processor to branch off to specific code if it encounters certain instructions. if you get the MMX/SSE/SSE2 indicator byte, then the whole exception idea will not be required. in fact it is not decent way to get such things work.

that would make the emulator a good idea, worth a thought...

btw if you have a sample of that code, pls mail it to me. no need to show the whole app, just the code block how you used the SSE/non-SSE combo.

girish

<font color=blue>die-hard fans don't have heat-sinks!</font color=blue>
 
I'm not sure you can write VxD's for WinNT. I always considered them a dark remnant of the days of Win9x.

As for Linux, I believe the kernel already traps the Illegal Instruction exception via the numeric coprocessor emulation--though I'm not 100% certain that kind of numeric coprocessor fault throws the same exception. It would probably be a simple thing to tack the necessary code onto the numeric coprocessor emulation code.

Kelledin

bash-2.04$ kill -9 1
init: Just what do you think you're doing, Dave?
 
if fact VxD is the newer format of device drivers. win31 used .drv, win9x uses VxD as well as legacy .386 and .drv drivers (i guess they are loaded for 16 bit compatibility), and you do need VxDs for WinNT. must investigate into the matter, I've not written any driver for winnt yet.

all you need to write is a exception handler, which is prety much easy in windows (both 9x and NT) since it has a standard interface provided by the processor itself. the processor passes a pointer to the offending instruction and then you can get its opcode and emulate it ni native code, and write back the results to appropriate registers and return.

well, its not as easy as it seems...

girish

<font color=blue>die-hard fans don't have heat-sinks!</font color=blue>
 
"does the application use SSE2 all the time, or just a few blocks of it"

A few dozen loops were vectorized. There usually is not a whole lot of data to be processed by any single application. The data that is there was vectorized.

"btw if you have a sample of that code"

Just grab any piece of code and recompile it in VC++ with the Intel C/C++ Compiler plugin enabled. Many loops will be automatically vectorized for you. There's no need to show any specific code fragments, because you don't need any special instructions or functions to use it. It's automatic.

"writing a VxD"

VxD's are deprecated. They only exist in Win9x, which is obsolete.

"you can even program the processor to branch off to specific code if it encounters certain instructions"

If the driver model supports this, it would only be for instructions not supported by the current hardware. It would be implemented by the OS catching the illegal instruction exception, checking its table for instructions, finding your entry for that instruction, and branching off to your code. This would in essence be the same thing I described, but the OS would be doing part of the work for you.

"the whole exception idea will not be required. in fact it is not decent way to get such things work."

Actually, trapping that illegal instruction exception is the only real way to do it. The only other possibility would be running your software under a software emulator (similar to how Java is run.) This would be extremely slow. We're talking Pentium 100 speeds here. Allowing the processor to run the application as normal and simply catching the illegal instruction exception would be the proper way to do it. This is how all floating point emulators work.

-Raystonn

= The views stated herein are my personal views, and not necessarily the views of my employer. =
 
it doesn't really matter whether the processor crunches more numbers with SSE2 or with true FPU horsepower, as long as all of the numbers get crunched in the same amount of time.
in all honesty, SSE2 isn't perfect, and I have my doubts as to whether it will be 100% useful, I'm sure there are some applications that it won't be used in, some it can't be utilized, and even more that it would help tremendously that won't use it.
my cousin considers adding RAM to a computer to make it faster cheating, he says you should leave it as you buy it... ?
methinks he's been dropped on his head.

----------------------
why, oh WHY, is the world run by morons?
 
"we think the CPU is powerful while it's actually not, it's just supported by a extension just like the Pentium IV, based on extension to try make it powerful while beneath it is just EMPTY POWER!!!"

I don't really understand what you're saying here. CPUs that support SSE/SSE2 would do the work in hardware in a single instruction, extremely fast. CPUs that do not support SSE/SSE2 would implement the functionality in software. It would be much slower than the in-hardware support provided by such CPUs as the Pentium 4, but shouldn't be any slower than just building the application not to use SSE/SSE2 in the first place.

At any rate, none of this is really needed. It all gets built into today's applications for compatibility with all processors anyway. The actual code-size savings in building an emulator is trivial.

-Raystonn

= The views stated herein are my personal views, and not necessarily the views of my employer. =
 
I sort of agree. It's kind of like having a new car being pulled by an F-16 and saying "WOW THAT CAR IS SO COOL IT CAN GO MACH 1!"

But we have to wait for the roads to be upgraded and optimized for F-16s as well as cars

---------
I am the first and only one with a 16MB GeForce2 GTS graphics card! :smile:
 
no, its more like an automotive hobbyist that wants to go fast, get a nice adrenalin rush.
you can either spend lots of cash and run a badass engine fast, or you can go cheap and give it a shot of nitrous.
either way the car goes the same time, the guy gets a thrill and goes fast.
although, unlike nitrous, you won't run out of SSE2 or MMX, although you will find old software that isn't optimized for it.
If the P4 can't run the old software at least as good as my 700mhz laptop, then I'll have a problem. there are few old programs I use, and most of them are still overpowered by my 700mhz PIII, as long as the P4 is in that ballpark...
If I try to play any game or run any software on a P4 and it runs slow enough to notice or causes problems, then yea, the P4 needs more power, but even with the crippled Williamette, I don't think that will happen, it just won't run it as fast as a slower A4.

----------------------
Independant thought is good.
It won't hurt for long.