Cache Architecture or Overall CPU Archticture?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

balzi

Distinguished
Oct 16, 2001
121
0
18,680
In reply to:


You might gain 5% (maybe), yet you have to make your core bigger, as you need 4 times as many piplines to the L1 memory. That would translate into more heat and other penalties.

If I made any mistakes by this post, please correct me.

no no no....
In your fixing frenzy you've re-spelled pipelines to piplines. *add Goons accent* You idiot....

bye

I spilled coffee all over my wife's nighty... ...serves me right for wearing it?!?
 

Flyboy

Distinguished
Dec 31, 2007
737
0
18,980
What about the algorithms used? Could it be that AMD has a much better cache algorithm than Intel?

BTW, single-cycle versus multi-cycle...is that equivalent to RISC vs. CISC? If so what is Intel and AMD's architecture? I heard that it used to be:

AMD = RISC
Intel = CISC

Is this still the case today?

Oh go ye brown-eyed toothless wonder.
 

charliec2uk

Distinguished
Jul 26, 2001
249
0
18,680
HAve you tried the 18F*** series of PICS?

The great thing about the 4 Mhz clock is that it takes one 1us to execute each line of assembler, good for calcuatling delay loops etc.

By the way, Ray (poet and I don't know it) when you are engineering software for good old INTC are you working more with Assembler, C++ or what?

Never spit into the wind...
 

FatBurger

Illustrious
AMD = RISC
Intel = CISC

Nope, both have been CISC for all time. All x86 processors are CISC.
Apple's processors on the other hand, are RISC. That's probably what was supposed to be in place of AMD, someone just got confused when they told you.

<font color=orange>Quarter</font color=orange> <font color=blue>Pounder</font color=blue> <font color=orange>Inside</font color=orange>
 

MeTaLrOcKeR

Distinguished
May 2, 2001
1,515
0
19,780
Like FatBurger said.......all x86 CPU's are CISC......BUT, if you look IN DEPTH on the AMD K7 Architecture, it was designed with a RISC Influence......its been said that the K7 Architecture would make a better RISC Processor than an x86 Processor.........but hey, it's kickin' some serious arse still..... =)

-MeTaL RoCkEr

My <font color=red>Z28</font color=red> can take your <font color=blue>P4</font color=blue> off the line!
 

AMD_Man

Splendid
Jul 3, 2001
7,376
2
25,780
As far as I know, current AMD and Intel processor handle instructions in a RISC fashion, as in they receive CISC instructions and convert them to simple RISC instructions which are faster to execute.

AMD technology + Intel technology = Intel/AMD Pentathlon IV; the <b>ULTIMATE</b> PC processor
 
i'll be polite as much as possible here...

wtf are you talking about, different cache algorithms?

here are the cache implementations:
Direct mapping
Set associative (2 way set-associative for example)
and fully associative

that's it as far as cache "algorithms". There might be other implementations of the cache design in development but thats it. Look under wcpuid and under cpu details. it tells you what the cache implementation is. Either set-associative or fully associative. I haven't seen direct mapping used.

intel and amd both use the same 3 implementations. the last 2 are more used then direct mapping.

"BTW, single-cycle versus multi-cycle...is that equivalent to RISC vs. CISC? If so what is Intel and AMD's architecture? I heard that it used to be:"

absolutly not!

single cycle cpu only does ONE instruction per cycle no matter what the length of the instruction is. You measure what the longest instruction is and thats your "cycle". Every instruction therefore is the same length and takes the same amount of time for each instruction. Even if a add instruction takes 2ns to complete and a loadword instruction takes 5ns (which is usually the longest isntruction), the add instruction would take 5ns instead of 2ns. Thats a single cycle processor. As you can see it does a horrible job in using what resources is available to it.

A multi cycle processor is what the current cpu's are. They can do multiple instructions per cycle and every instruction has it's own cycle for each instruction. So the add instruction that takes 2ns and a loadword that took 5ns, an add instruction would take 2ns and load word would take 5ns because it has it's own cycle. That as you can see uses it's resources more efficiently. Also improves performance depending on the code. if it uses lots of loads or what have you. Adding pipelining to it makes it use it's resources even more efficiently. Notice the module design of these implementations.

if your going "huh" i can't explain it any simpler then that. Infact i probably left out details and some being important. Just go with it and now you somewhat know what it is kind of. If you wanna know more, you must take a class. Computer architecture or computer design or computer organization. They change the names oh so frequently.

As far cisc and risc. cisc were the old pentiums and 486's. The CPU's (intels and amd's) NOW are actually a combination of both! Since the cpu's reduce the instruction and compute that. Hence reduced instruction set code(not sure on this one). So to say the pentium 4 is a cisc or risc processor is wrong since they are both.

<A HREF="http://www.anandtech.com/mysystemrig.html?id=9933" target="_new"> My Rig </A>
 
hey they have voice recognition (cell phones) they have lcd touch screens and software (PDA's) why not mix them in with medicine? Radars. I mean think about! It'll be better then startrek! voice recognition tricorters lol. tricorters that can record the persons condition in real time. See through walls with software and optical sensors and do some nifty things with the software for it. LCD touchpads for notes and store in memory. Which they already have!

You have any idea how exciting this is? All because of the PDA's evolution into a pocket pc.

<A HREF="http://www.anandtech.com/mysystemrig.html?id=9933" target="_new"> My Rig </A>
 
G

Guest

Guest
Sure you got good amounts of work done per cycle, but can that baby do minesweeper?

PIC's are fun.
 
G

Guest

Guest
OOps double post. How in the world did that happen?
<P ID="edit"><FONT SIZE=-1><EM>Edited by knewton on 01/14/02 01:19 PM.</EM></FONT></P>
 

MeTaLrOcKeR

Distinguished
May 2, 2001
1,515
0
19,780
Dude...that would be SWEET...LoL..
Tricorters that see through walls....hahahahahahahaha

What about clothes?????(GIRLS ONLY!!!!!!!!!!!!!) =)~

-MeTaL RoCkEr

My <font color=red>Z28</font color=red> can take your <font color=blue>P4</font color=blue> off the line!
 

Flyboy

Distinguished
Dec 31, 2007
737
0
18,980
i'll be polite as much as possible here...
O.k.
Ummm...could ya' try a little harder?

here are the cache implementations:
Direct mapping
Set associative (2 way set-associative for example)
and fully associative
Exactly, so if I were to compare the CPUid results for both the Intel and AMD processors, would they be identical in terms of the cache implemenentation? Hey wait, I have a Celeron and an AMD system. Here's what I found:


AMD 500Mhz:
[ WCPUID Ver.2.7c (c) 1996-2000 By H.Oda! ]
(Processor 1)
<< Cache Info. >>

[L1 Instruction TLB]
2-Mbyte/4-Mbyte Pages, fully associative, 8 entries
4-Kbyte Pages, fully associative, 16 entries

[L1 Data TLB]
2-Mbyte/4-Mbyte Pages, 4-way set associative, 8 entries
4-Kbyte Pages, fully associative, 24 entries

[L1 Instruction cache]
64K byte cache size, 2-way set associative, 64 byte line size, 1 line par tag

[L1 Data cache]
64K byte cache size, 2-way set associative, 64 byte line size, 1 line par tag

[L2 Unified cache]
512K byte cache size, 2-way set associative, 64 byte line size, 1 line par tag

[L2 Instruction/Unified TLB]
2-Mbyte/4-Mbyte Pages, Off, 0 entries
4-Kbyte Pages, 4-way set associative, 256 entries

[L2 Data TLB]
2-Mbyte/4-Mbyte Pages, Off, 0 entries
4-Kbyte Pages, 4-way set associative, 256 entries


Intel 750Mhz:
[ WCPUID Ver.2.7c (c) 1996-2000 By H.Oda! ]
(Processor 1)
<< Cache Info. >>

[L1 Instruction TLB]
4K byte pages, 4-way set associative, 32 entries
4M byte pages, fully associative, 2 entries

[L1 Data TLB]
4K byte pages, 4-way set associative, 64 entries
4M byte pages, 4-way set associative, 8 entries

[L1 Instruction cache]
16K byte cache size, 4-way set associative, 32 byte line size

[L1 Data cache]
16K byte cache size, 4-way set associative, 32 byte line size

[L2 Unified cache]
256K byte cache size, 8-way set associative, 32 byte cache line

You see what I mean? The caching schemes are indeed different, correct? For example: the Intel uses 4-way set associative for the L1 Instruction cache, while the AMD uses 2-way set associative. And, notice that the Celeron doesn't appear to use a L2 Translation Look-aside buffer.

Secondly, perhaps the memory management algorithm that caches data to RAM is significanlty different between the two. I don't know, just a thought- is that o.k. with you?

if your going "huh" i can't explain it any simpler then that.
How about being a little less condenscending? If it frustrates you to answer my post, then don't respond. You make it look like your trying your hardest to look smart. Let me know if you want me to respond with "duh" in order to boost your self-ego...I'll be happy to do so because I'm a nice guy and I want you to feel good about yourself.

Oh go ye brown-eyed toothless wonder.
 

tlaughrey

Distinguished
May 9, 2001
581
0
18,980
Let me know if you want me to respond with "duh" in order to boost your self-ego...I'll be happy to do so because I'm a nice guy and I want you to feel good about yourself.

LOL ... the technical stuff is a little above me, but I like the humor.

<i>Real knowledge is to know the extent of one's ignorance.</i>
 

LoveGuRu

Distinguished
Sep 21, 2001
612
0
18,980
i said:

---
what your saying is compiling data to specific CPU?
so when you run it on other processor there would be a need of a hybrid component, higher language or just an imulation that would translate into big ass lost CPU time on difrent arcitechtures then INTEL (maybe even P3>P4 change?).
thats not a phesable solution althogh sun is using it with its servers, as they are runing Unix(Solaris) with RISC processor(or not..some 1 correct me plz..) they optimised the server based aplication to their platform for max performance.

either way no one would compile two versions for the same aplication, SSE2(and such) optemisation is exceptable as they do not require second version for other platfors just few scripts.
---

any one gona answare??????
?????????????????????????ARG!

<font color=green>
*******
*K.I.S.S*
*(k)eep (I)t (S)imple (S)tupid*
*******
</font color=green>
 
i did answer your post.

you people have way too much anger built up inside you to make believe i'm being mean. reread it please and read it this time thinking "hey this guy is giving me free information about computers". Remember college isn't free!
infact this will be the last time i share information as you people like to be ignorant and like to think you know it all. Clearly you do not if you think cisc and risk is like single cycle and multi cycle. You do not appreciate knowledge from what i can see! Continue living your life in ignorance.

If you actually read it you would see i did answer your question. You asked is single cycle and multi cycle like risk and cisc. i answered no. If you were to read it you would find that it is completely different then what you thought and I had to set it strait so you understand.

If you would have read what i said, i said there are 3 cache implementations that intel and amd both use. Does that mean they are exactly the same? NO! Learn to read. Also do you understand for set-associative you can have 2 way, 4 way, 16 way set associative? Understand now? I can't believe you actually went out of your way to do that and actually check the cache. i hope you did it out of wanting to learn and not to try and prove me wrong with something you have no clue what it's about?

You should appreciate free information not scorn someone because they gave it to you!

<A HREF="http://www.anandtech.com/mysystemrig.html?id=9933" target="_new"> My Rig </A>
 

MeTaLrOcKeR

Distinguished
May 2, 2001
1,515
0
19,780
I will try and respond to what you said, I just personally dont TOTALY understand what you mean, yout hink you can be a little bit more specific ??!??

Is this what you mean??

Why don't software designers compile a STANDARD for SSE2, meaning, it's no different (the compiled version) whether or not it be on a Socket 462 AMD Platform or a Socket 423/478 Intel Pentium 4 platform, so it uses the same code and executes it the same for each ?????

OR do you mean, why don't they take advantge or all the processer's x87 Commands (FPU Enhancments) which I personalyl don't understand.........example: The AMD Athlon XP uses MMX, Enhanved MMX, 3D Now!, Enhanced 3D Now!, and SSE......
Now, are you saying, why don't they compile somethign that ultizes ALL of these, so that way they ALL work at teh same time?? Theoretically than it would speed up by whatever precentage EACH of those give over RAW Integer and FPU as opposed to using just SSE, or just 3D Now! mmx, etc......

If that is the case I'd also like to know.........
ANOTHER Example.....

Anyone here with a AXP can experiment with this.....

In SiSoft Sandra....you have the CPU Benchamrk, MultiMedia Benchmark etc.....
NOW, in the options for each benachmark you can enable/disable certain things, like SSE, Enhanced 3D Now! etc....now, by default everything is enabled.....btu when you do the test....it returns back for FPU/ALU as using SSE plus the score it got.....BUT if you uncheck SSE, and do the test again, it will say its using Enhanced 3D Now!.......You'll also notice the SSE score is a LITTLE bit higher than that of the Enhanced 3D Now! Score.......NOW, if you disable everything but the SSE, the score wont change from the original SSE score.......and visa versa for the Enhanced 3D Now! score in relation with enabling/disabling MMX etc......NOW what I don't understand is why can't it use BOTH SSE & 3D Now! etc. becaseu theoretically shouldnt it than be a much higher score??? Overall anyways...liek I know SSE and 3D Now! work almost the same but they both have there won advantages, so if it utilized both, it shoudl be better, wouldnt it ??

Anyways, if anyoen still follows me or can answer LoveGuRu's original question, please do =)

-MeTaL RoCkEr

My <font color=red>Z28</font color=red> can take your <font color=blue>P4</font color=blue> off the line!
 

MeTaLrOcKeR

Distinguished
May 2, 2001
1,515
0
19,780
You know that thats all anyone will buy it for! =)

lol....j/k

-MeTaL RoCkEr

My <font color=red>Z28</font color=red> can take your <font color=blue>P4</font color=blue> off the line!
 
lol not me pale lol

have any idea how fast my girlfriend would kick my ass? lol ... i can see for maybe like the groups who prefere to be a single set.

but still it would be cool!

<A HREF="http://www.anandtech.com/mysystemrig.html?id=9933" target="_new"> My Rig </A>
 
Well kind of. Algorithm is a series of psuedo code to accomplish a task. An implementation is simply the actual code or task. Cache design i'm sure has an algorithm they all do. However what i was referring to was the actual task itself. Samething almost but not quite. It's just one of those things where it looks to be the same but it isn't. Doesn't matter really.

You didn't get anything wrong. Just got the single cycle and multi cycle thing wrong. It truely doesn't matter but wanted to clear that up.

To be or not to be?



<A HREF="http://www.anandtech.com/mysystemrig.html?id=9933" target="_new"> My Rig </A>
 

MeTaLrOcKeR

Distinguished
May 2, 2001
1,515
0
19,780
Well Flyboy, I won't comment on what is going on between you and Sk8er.....but I will try and understand what you mean...

EARLIER you asked why the AMD chips have a TLB and why the Intel's dont.......
and you asked the differences between the two, why your Celeron is like such and such and why your AMD is like such and such, correct?

Ok, well for one, you said you have an AMD 500MHz.......and by reading the return data you gave that WCPUID gave you, it appears you have an Original K7 (Slot A Athlon) Am I right ?!?

Now....if this is true than I think I might be able to explain a little bit...For one, the Athlon you have is Slot A, using the original K7 core which has off-die L2 cache at ½ the clock speed......The Celeron has 128K of On-Die full speed L2 cache.......AND there different cache designs......

Your AMD Athlon 500MHz chip is more comparable to the original P3, the Katimai (sp?) core which also had off-die ½ speed L2 cache......thats the only thing i can think of in respect to that...

Now, with the method the Athlon uses in conjunction to how the Intel counterparts work in decoding/sending/receiving instructions is ESSESNTIALLY the same, but it also does a few things differently....like the way it goes through the decoders.....like I mentioned before, the K7 Architecture was created with RISC in mind......you have to look into detail on the cache design on the K7.....Toms has a good article....in depth on everything related to the original K7....It was his first article ever on the AMD Athlon....quite a good one at that......check it out in the CPU Guide History section.....the Article should be dated sometime in like August of 1999....

Ass far as the Set-Associative......Every processor is different, as far as I can recall, the AMD Duron's L2 cache is 16 Way Set-Associative and the AMD Thunderbird core Athlon is like 12 or something like that........

Anyways, again, read up on a few past articles, both related to AMD and Intel processors, it will help you learn more about this stuff I can guarantee more than we could just sit here and tell you.......check it out...... =)


-MeTaL RoCkEr

My <font color=red>Z28</font color=red> can take your <font color=blue>P4</font color=blue> off the line!
 

Flyboy

Distinguished
Dec 31, 2007
737
0
18,980
Thanks. Do you feel that either the implementation (differences b/w intel and AMD) or the algorithms may be responsible for the performance differences?

That's where you and I got mixed up I think. My original thought was the actual memory management algorithms used to decide which data to cache and replace. But your post on cache implementation also intrigued me. Now I'm wondering whether either (or both) of these is key to the performance issues between AMD and Intel.

But thanks for clearing up the single cycle Vs. multicycle.