ftp://download.intel.com/design/Pentium4/manuals/24896610.pdf
Page 47 Cache lantency
Page 430 insruction lantency
The good news is SSS1/2/3 have not increase in instruction latency.At worse under SSE 1/2/3 1 cycle been add nothing that a 100 mghz cannot undone.The wack is on IA32 ALU instruction lantency on most instruction (mov add sub...) have all double in lantency they where at 0.5 cycle (double pump ALU) now it a good 1 cycle.Cache lantency have skyrocket it have the same lantency that a athlon now so where is the advantage to have only 16KB.What they dont say also is prescott have 31 after the trace cache and they add about 8 before the trace cache.I wonder how the intel have work that, there almost no logic transistor at each stage it become really hard to create a equal stage.I really important that stage are equal in delay as the slowest stage will be the reference for clock speed so if there 1 really slow 1 fast the slowest will set the clock speed for the chip..Now i know why they went from 128reg to 256 reg it was not to have 2 set for each tread but was for the extra stage .Let hope that intel have few thing in this that is disable and will offer a real gain in performance.It just a wild speculation but cache latency is so higher they could have a IA-64 excution block and a share L1/L2 cache.
Just to show dad<P ID="edit"><FONT SIZE=-1><EM>Edited by juin on 02/04/04 09:49 PM.</EM></FONT></P>
Page 47 Cache lantency
Page 430 insruction lantency
The good news is SSS1/2/3 have not increase in instruction latency.At worse under SSE 1/2/3 1 cycle been add nothing that a 100 mghz cannot undone.The wack is on IA32 ALU instruction lantency on most instruction (mov add sub...) have all double in lantency they where at 0.5 cycle (double pump ALU) now it a good 1 cycle.Cache lantency have skyrocket it have the same lantency that a athlon now so where is the advantage to have only 16KB.What they dont say also is prescott have 31 after the trace cache and they add about 8 before the trace cache.I wonder how the intel have work that, there almost no logic transistor at each stage it become really hard to create a equal stage.I really important that stage are equal in delay as the slowest stage will be the reference for clock speed so if there 1 really slow 1 fast the slowest will set the clock speed for the chip..Now i know why they went from 128reg to 256 reg it was not to have 2 set for each tread but was for the extra stage .Let hope that intel have few thing in this that is disable and will offer a real gain in performance.It just a wild speculation but cache latency is so higher they could have a IA-64 excution block and a share L1/L2 cache.
Just to show dad<P ID="edit"><FONT SIZE=-1><EM>Edited by juin on 02/04/04 09:49 PM.</EM></FONT></P>