Bandwidths for each memory and explanation of why one offers double throughput that the other was already in my posts. The only difference is that I did not repeat myths about latency and that I didn't fill in unneeded redundant details that add nothing to the topic. Starting with GHz and then transforming to GT and then transforming to Gbit and then transforming again to GB doesn't add nothing new to a discussion that already starts from GHz and transforms to BG directly.
see, what you call unneeded and redundant, i call relevant, important and verifiable. you call "myths" about latency, yet provide nothing verifiable to bust the myth. however, you did deem the following "needed", may be that's why you posted it:
juanrga :
DDR3 only handles an input or output but not both on the same cycle. GDDR handles input and output on the same cycle.
i'd like you to clarify what this means. i hope others verify what you state.
juanrga :
Nope.
seems like you never read what other posters write. oh well...
juanrga :
de5_Roy :
if kaveri had used gddr5 for the igpu,
GDDR5 was not going to be used for the igpu alone but for the whole Kaveri APU.
in my speculation i am using gddr5 for the igpu while keeping ddr3 as main system memory. may be yours is different.
and the rest:
juanrga :
GDDR5 and DDR3 are not pin compatible.
What vram? what 128 more wires?
Nope. GDDR5 would be used as system ram.
Nope. Only bus to system ram.
all stem from the initial misreading.
juanrga :
The problem is that quad channel is expensive, occupies lot of space, requires quad tested dimms...
Moreover, quad-channel DDR-2133MHz offers 68GB/s, not 40GB/s. The 68GB/s number was given to you before. I simply note.
i wasn't even using DDR3 2133 MT/s. i was using ddr3 1600 (pc 12800) as baseline since it's the widest, one of the cheapest available and delivers around 10GB/s per channel according to the sandra benches i ran.
i see you're still mistyping MT/s as MHz despite cazalan's correction. i admit that i frequently make the same mistake. from what i read yesterday, i can tell the difference now.
I don't know how many times I have to repeat myself about x86 dying. It's not happening. Dell, Gateway, Alienware, and all the other OEMs report their sales every quarter and some analyst who doesn't know anything about computers looks at those numbers and goes "wow x86 is dying, Dell isn't shipping as many PCs!"
As AMD mentioned during last conference the TAM for x86 is decreasing, while it's increasing for ARM. By this reason AMD goes AMbiDextrous and by offering OEMs pin compatible x86/ARM solutions it gets to play in both markets, as well as benefit if one increases at the expense of the other. AMD also gave the next slide with market predictions
blackkstar :
AMD's approach is to be flexible and fit into many markets. AMD staying purely x86 makes as much sense as AMD going ARM only. They want that flexibility after the Bulldozer failure.
Look at how inflexible Bulldozer was as an architecture. It was designed for servers that were meant to run tons of weak threads at once (not necessarily from the same program, I'm talking along the lines of spawning 500 apache threads a second for each request or something along those lines).
So AMD designed an architecture around that and then got completely screwed for the entire duration of Bulldozer's lifespan because they were locked into a single ideal and they had no backup plans.
Bulldozer problem wasn't that it was designed for servers, but that the Bulldozer arch. was a complete failure and didn't work for servers! As a consequence, AMD market share in servers declined to current 4% or so.
blackkstar :
If this was Hector Ruiz's AMD, they would be betting entirely on one or the other. They're not. They're playing it safe for once and they are taking advantage of the fact that they're the only company in the world that can provide a strong GPU, strong x86 CPU, low power x86 CPU, low power ARM cores, and string ARM CPUs. And not only can they provide all of those solutions, but they can mix and match them for specific customers.
Ergo someone wants to build a giant HPC that is HSA enabled and will be running on something that is a great fit for GPGPU? Great, here's your custom solution with some weak arm cores to run your OS and most of the die is GCN cores.
Have some tasks that dont' scale on GPUs? Have this giant traditional CPU!
That's AMD's end goal, and they've stated this over and over and over again. They are creating a set of building blocks for semi-custom products. And meanwhile, they end up going "well, we can give you 8 weak CPU cores with some GCN cores in this thermal envelope. We're the only company that can do this for you. But you'll need to spend some money on R&D for it first, we've never made an 8 core Jaguar before "
AMD has looked back at the last few years and saw how GPU elevated their one very weak product and realized that "if we diversify and split up our CPU section, we can still do fine if one of them under performs"
That's what this whole ARM thing is about. They will use whatever products are best for the job in each segment. And the fact that K12 ARM and x86 are considered "sister cores" is making me consider that the two will be very similar in the end.
Note, the space between those 2 sections. I was replying directly to your statement, then elaborating on your lunacy in a separate portion. They have grammar in spain yes?
Thus not only you see the word "ARM" when I write about x86, but you don't see "ARM" when you write it explicitly in your answer? Now I understand perfectly why AMD says one thing and you read the contrary
AMD: K12 is a new high-performance ARM-based core. 8350rocks: No. K12 != ARM.
AMD: The TAM for x86 is decreasing, while it's increasing for ARM. 8350rocks: No. The TAM for x86 is increasing and it is gaining ground against ARM.
AMD: ARM and x86 cores will be treated as first class citizen. ARM will win over x86 in the long run. 8350rocks: No, AMD considers ARM will be a niche market.
AMD: Bulldozer family was a failure. 8350rocks: No. Bulldozer family is not terrible. It does some things well.
AMD: We will abandon CMT and will return to classic SMT design. 8350rocks: No. AMD new cores will be based in a redesigned CMT architecture (note: CMT is a version of SMT).
AMD: We promise 10TFLOP APU by 2020 8350rocks: No. AMD cannot produce a 10 TFLOP APU for doing anything right now
AMD: All our 28nm products will be 28nm bulk 8350rocks: Kaveri is delayed because is made on 28nm FD-SOI, trust me
AMD: We are migrating to 20nm bulk and then FinFET on bulk 8350rocks: AMD is returning to FD-SOI for 20nm and then FinFET on SOI, trust me
Are you still hanging on to those JS benchmarks of Apples A7 SoC vs Intels Bay Trail CPU? I have a few things about that for you, for one it is JS baserd benchmarks which can be determined by the browser itself. Considering they used different OSes with different browsers, there is no way to determine if there is a browser bottleneck. I think I explained this before.
Nope. They are benchmarks that I have that include HPC workloads and that, of course, use the same OS.
jimmysmitty :
Yet Jim Keller, lead tech on AMDs K8, either left or was let go. I wonder why.
If only a tenth of the problems reported here are true, I understand why Keller, Koduri, Papermaster... abandoned AMD by Apple.
And if what the new AMD CTO Rory Read is achieving is only a tenth of that he plans to do, then I understand why Keller, Koduri, Papermaster... returned to AMD.
jimmysmitty :
While a lead designer is great, there are also the upper management and marketing. Why did Phenom I fall so hard? Well the architecture was flawed for sure but one of the biggest issues I would say was marketing was pushing the CPU where it never would reach. As well the leadership at the time was just horrible.
Not only the chief architect of Bulldozer family was fired but it cost the CEO his job, it cost most of the management team its job, it cost the vice president of engineering his job...
AMD's Feldman claims that AMD has a new team and "We are crystal clear that that sort of [Bulldozer] failure is unacceptable". Time will say.
Still waiting on that Intel GPU killer wanting some details on that and how it handles games compared to a 295x or dual titans.
Also waiting on that I5 2500K performance from a Amd A10 kaveri to.
Waiting for future Intel products seems reasonable. But also waiting for benchmarks given time ago doesn't.
Yeah since those benchmarks were invalid and not on average i find it troubling you seem to be convinced otherwise clearly you are not a reasonable man.
see, what you call unneeded and redundant, i call relevant, important and verifiable. you call "myths" about latency, yet provide nothing verifiable to bust the myth. however, you did deem the following "needed", may be that's why you posted it:
Sure, because performing three changes of units instead starting with the correct unit or dividing a number by two and next multiply it by two to obtain again the original number falls into "relevant, important and verifiable".
I provided you latencies (in ns) for both GDDR5 and DDR3. I also provided two links proving my point that it is a myth and another two links from forums discussing the origin of the myth. Of course, you didn't read anything of that and if you follow the same typical patron you will ask me about giving the links or the numbers again.
de5_Roy :
in my speculation i am using gddr5 for the igpu while keeping ddr3 as main system memory. may be yours is different.
It is not my speculation but what AMD planned to do as was reflected by many tech sites.
I see that I didn't understand your speculation. Ok. I understand it now. Well... In the first place, it is odd and goes against the goal of the integration of components. In the second place it breaks fundamental aspects of Kaveri such as huma and HSA. In the third place, as reported in AMD original docs, the DDR3 controller and the GDDR5 controller are incompatible; this is the reason why AMD did plan to support either GDDR5 or DDR3 but not both at once.
de5_Roy :
juanrga :
The problem is that quad channel is expensive, occupies lot of space, requires quad tested dimms...
Moreover, quad-channel DDR-2133MHz offers 68GB/s, not 40GB/s. The 68GB/s number was given to you before. I simply note.
i wasn't even using DDR3 2133 MT/s. i was using ddr3 1600 (pc 12800) as baseline since it's the widest, one of the cheapest available and delivers around 10GB/s per channel according to the sandra benches i ran.
i see you're still mistyping MT/s as MHz despite cazalan's correction. i admit that i frequently make the same mistake. from what i read yesterday, i can tell the difference now.
The idea of using quad channel but only with 1600MHz modules is even more weird!
The slower RAM costs the same than the faster RAM. :lol:
Caza unability to understand the difference between the IO bus frequency and the data frequency (sometimes named "effective frequency") resulting from the double data rate arch is not going to change the standard way to refer to DDR3 memory speed.
GSkill has a FAQ that explain those typical misunderstandings about memory. I copy some entries:
Q:
What is the difference between “DDR3-1600” and “PC3-12800”?
A:
There are two naming conventions for DDR memory, so there are two names for the same thing. When starting with “DDR3-“, it will list the memory frequency. When starting with “PC3-“, it will list the memory bandwidth.
To convert between the two, just divide or multiply by 8.
For example, 1600*8=12800 or 12800/8=1600.
Q:
Why does CPU-Z (memory tab) show only half the frequency speed of my memory kit?
A:
CPU-Z reports the DRAM’s operating frequency, but DDR (DOUBLE Data Rate) memory can carry two bits of information per cycle, so the effective frequency is double the operating frequency.
DDR memory is typically listed by their effective frequency. So if your memory kit is rated for 1600MHz, it will show as 800MHz in CPU-Z. (800*2=1600)
Waiting for future Intel products seems reasonable. But also waiting for benchmarks given time ago doesn't.
Yeah since those benchmarks were invalid and not on average i find it troubling you seem to be convinced otherwise clearly you are not a reasonable man.
Considering that the claims were made about a concrete set of well-known benchmarks, that were published, and that final measurements coincided with the claims made within some few percents of error, the lack of reasonability must be in another part, specially when you are coming back this old discussion once again.
Note, the space between those 2 sections. I was replying directly to your statement, then elaborating on your lunacy in a separate portion. They have grammar in spain yes?
Thus not only you see the word "ARM" when I write about x86, but you don't see "ARM" when you write it explicitly in your answer? Now I understand perfectly why AMD says one thing and you read the contrary
AMD: K12 is a new high-performance ARM-based core. 8350rocks: No. K12 != ARM.
AMD: The TAM for x86 is decreasing, while it's increasing for ARM. 8350rocks: No. The TAM for x86 is increasing and it is gaining ground against ARM.
AMD: ARM and x86 cores will be treated as first class citizen. ARM will win over x86 in the long run. 8350rocks: No, AMD considers ARM will be a niche market.
AMD: Bulldozer family was a failure. 8350rocks: No. Bulldozer family is not terrible. It does some things well.
AMD: We will abandon CMT and will return to classic SMT design. 8350rocks: No. AMD new cores will be based in a redesigned CMT architecture (note: CMT is a version of SMT).
AMD: We promise 10TFLOP APU by 2020 8350rocks: No. AMD cannot produce a 10 TFLOP APU for doing anything right now
AMD: All our 28nm products will be 28nm bulk 8350rocks: Kaveri is delayed because is made on 28nm FD-SOI, trust me
AMD: We are migrating to 20nm bulk and then FinFET on bulk 8350rocks: AMD is returning to FD-SOI for 20nm and then FinFET on SOI, trust me
...
Juan, post one single shred of anything you are saying to be true, is true.
When you cannot...take your ball and go home, because you are not contributing useful information to the x86 AMD conversation topic. Take your ARM nonsense, and go do whatever it is you do in your spare time besides troll this forum.
I am going to have to go over to the S|A forums and see what they did to get rid of you.
Waiting for future Intel products seems reasonable. But also waiting for benchmarks given time ago doesn't.
Yeah since those benchmarks were invalid and not on average i find it troubling you seem to be convinced otherwise clearly you are not a reasonable man.
Considering that the claims were made about a concrete set of well-known benchmarks, that were published, and that final measurements coincided with the claims made within some few percents of error, the lack of reasonability must be in another part, specially when you are coming back this old discussion once again.
Well if we can't agree on the facts then what's the point?
It is not about confounding frequency with transfer rate, because what is being used above is 2.133 GHz not 2.133 GT/s. You are the one that remains confused about both concepts.
I already corrected you and you make the same mistake again.
DDR3-2133 does not run at 2.133 GHz. It runs at 1.066 GHz.
It is called DDR3-2133 exactly because it gets 2.133 GT/s.
Module assemblers like GSkill try to dumb that down for consumers but it is misleading.
I am sorry, but I will continue referring to 1600MHz memory as... 1600MHz memory and to 2133MHz memory as... 2133MHz memory.
I realize you're just a layman here but some of us are engineers or studying to be engineers and have had to actually implement DDR2/DDR3 memory controllers. If I feed the part a 2133Mhz clock things can go boom.
I already explained why "some basic facts" and physics show that GPUs will be killed.
This proves again that you don't have any idea about the hardware that was being discussed. You are shutting random numbers from your...
Are you admitting that your 8K comment was FUD and that you dont have any idea of the hardware that you are commenting on?
A genius doesn't make claims about hardware and latter ask which is the hardware configuration.
This is all in your imagination. I never said that kill ram or HDD.
The 5x myth was debunked before. The upgradability issue was also replied before...
Except that you confound bulldozer marketing with an academic claim based in the laws of physics. I am not reproducing the claim of particular company who want sell its products.
as per my expectations
quote me and ask me question instead of answering
you don't know and just living in imaginary world
and instead of giving answer you are just saying that you already gave the answer to me and to roy too
This proves again that you don't have any idea about the hardware that was being discussed. You are shutting random numbers from your...
what would you like to say to those who use features like quad sli or crossfire?
and in 2020 if someone tries to play a game on his 4 monitor setup each at 8k resolution then how these apus would be able to store the game data need for processing by igpu ? would amd use the crystals of some broken crystal ball ?
i don't even need to know the hardware, because i am asking you to give the hardware details which will handle these things of that time and you don't have any answer and you are too stubborn to admit it
let me remake this question for current time, then it will be like this
"what hardware i need to play and record gta 4 at 1080p 4 monitor setup, budget is 2k"
answer would be like this http://www.tomshardware.com/answers/id-1966325/high-end-multi-monitor-set.html
did you saw list of configs there
this is how you should give me the reply
"future apu will have bada b bada bu bada bum which eliminates the use of sli crossfire"
oh wait, you can't give any answer because you don't know any answer and don't know how to give answer (just saw your count of best answers) so only thing you can do is trap me in my own question by asking me the config of that apu
Are you admitting that your 8K comment was FUD and that you dont have any idea of the hardware that you are commenting on?
2011, galaxy s2 relesed with WVGA display
2012, galaxy s3 with HD display
2013, s4 with FHD display
2014, g3 with WQHD display
did you say increase in display resolution
so tell me do you think that 8k on pc won't be possible by 2020 and this is why you are saying that my question was fud ?
This is all in your imagination. I never said that kill ram or HDD.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.
For the sake of comparison, a GTX680 has 192 GB/s BW available.
so tell me what does it mean
different memory for cpu and igpu that is not for hsa
or killing ram
for apu which is aimed for hsa the first option was not an option anyway so what i left with is "killing ram"
or is it doing something like xboxone to give a gaming experience of 720p @30fps
The 5x myth was debunked before. The upgradability issue was also replied before...
i din't saw any post where paladin admitted that dgpus will not be able to provide 4-5x more performance than an apu
indeed i always saw something like this
palladin9479 :
Juan has a very poor understanding of what "locality" is, and I believe you do to. All it means is that the latency of a particular circuit is incredibly low due to it being on the same die. In the case of dGPU's vs iGPU's we get into a particular quandary. Anything you can do on the iGPU you can do 4~5x of on a dGPU, this relates to everything from vector processing power to memory access speeds / bandwidth. In order for a iGPU to be "better" then a dGPU the workload would have to be small enough that it would be complected in less time then it takes the dGPU to receive it.
Ex
100 instruction sets, 3 cycles each (load, execute, store) = 300 total instructions
iGPU: 2 per cycle with 1 cycle latency per batch
dGPU: 10 per cycle with 5 cycle latency per batch (500% more latency then iGPU)
iGPU = 50 batch's (50 total cycles used in latency) = (50 * 3) +50 = 200 cycle total execution time
dGPU = 10 batch's (50 total cycles used in latency) = (10 * 3) +50 = 80 cycle execution time
2 instruction sets, 3 cycles each (load, execute, store) = 6 total instructions
iGPU: 1 batch (1 total cycle used in latency) = (1 * 3) +1 = 4 cycle total execution time
dGPU: 1 batch (5 total cycle used in latency) = (1 * 3) +5 = 8 cycle total execution time
In the first instance the instruction load was big enough to fill the entire dGPU and take advantage of it's larger resources. The 500% increased latency is a big factor in it's total performance (30 execution cycles vs 50 latency) yet due to it's 5x higher throughput it still easily beats the integrated option. In the second instance there was a much smaller set of instructions because we need the results back before we can branch to the next segment of code. In that scenario we are unable to take advantage of the higher performance of the dGPU and the iGPU becomes the better option.
This is what we are talking about when we mention that dGPU's won't be replaced. In larger workloads they are significantly superior to the iGPU's due to them having larger heat and space constraints. The iGPU's only pull ahead in small quick workloads when you need the results of the calculation ASAP.
so where did you debunked it juan ? imaginary land !
juanrga :
truegenius :
what would you like to say to those who use features like quad sli or crossfire?
I would say the same that I said the first time, and the second time, and the third time,... that I was asked about that.
where is your first second third or 4, 5, 6, 7, 8, time (quote and link so that i can check check them)
i got it
all these are located in imaginary land
gotcha
i will say again ( instead of just saying that i already said it 20 times ) reduce the dose
A genius doesn't make claims about hardware and latter ask which is the hardware configuration.
Juan, post one single shred of anything you are saying to be true, is true.
When you cannot...take your ball and go home, because you are not contributing useful information to the x86 AMD conversation topic. Take your ARM nonsense, and go do whatever it is you do in your spare time besides troll this forum.
And now you decide that a thread about future AMD products is now "the x86 AMD conversation topic". One half of AMD CPU/APU products cannot be discussed because you don't like them? :lol:
8350rocks :
I am going to have to go over to the S|A forums and see what they did to get rid of you.
This is funny because we are currently discussing on the K12 thread and several posters here agreed with me that they want a 8-core ARM APU. Why don't go there and explain us your "K12 != ARM" and your "ARM cannot scale up" fantasies.
It is not about confounding frequency with transfer rate, because what is being used above is 2.133 GHz not 2.133 GT/s. You are the one that remains confused about both concepts.
I already corrected you and you make the same mistake again.
DDR3-2133 does not run at 2.133 GHz. It runs at 1.066 GHz.
It is called DDR3-2133 exactly because it gets 2.133 GT/s.
Module assemblers like GSkill try to dumb that down for consumers but it is misleading.
And you continue repeating the same confusion despite being corrected and given a GSkill FAQ link that explain the same to you.
You continue confounding the clock frequency of the IO bus with the frequency of transmission of data.
That "2133 MT/s data transfer rate" corresponds to effective frequency of 2133 MHz.
Micron has presented the new DDR4 modules recently and they mention that the (effective) frequency starts at 2400MHz.
Here you have a table from Hynix, mentioning a frequency of 2133MHz for the DDR3-2133 memory, a frequency of 1600MHz for the DDR3-1600 memory, and so on.
What is really interesting in this useless discussion is that I am not obligating you to use frequencies, you can continue using data transfers, but you want obligate me and rest of memory industry to use data transfers. WOW!
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.
For the sake of comparison, a GTX680 has 192 GB/s BW available.
so tell me what does it mean
different memory for cpu and igpu that is not for hsa
or killing ram
I suppose that you are asking for the bold part. Well, it doesn't means your "different memory for cpu and igpu that is not for hsa" nor your "killing ram". Those two are only in your imagination. This show again that you continue pretending to criticize hardware that you don't understand.
truegenius :
for apu which is aimed for hsa the first option was not an option anyway so what i left with is "killing ram"
or is it doing something like xboxone to give a gaming experience of 720p @30fps
truegenius :
i din't saw any post where paladin admitted that dgpus will not be able to provide 4-5x more performance than an apu
indeed i always saw something like this
palladin9479 :
Juan has a very poor understanding of what "locality" is, and I believe you do to. All it means is that the latency of a particular circuit is incredibly low due to it being on the same die. In the case of dGPU's vs iGPU's we get into a particular quandary. Anything you can do on the iGPU you can do 4~5x of on a dGPU, this relates to everything from vector processing power to memory access speeds / bandwidth. In order for a iGPU to be "better" then a dGPU the workload would have to be small enough that it would be complected in less time then it takes the dGPU to receive it.
Ex
100 instruction sets, 3 cycles each (load, execute, store) = 300 total instructions
iGPU: 2 per cycle with 1 cycle latency per batch
dGPU: 10 per cycle with 5 cycle latency per batch (500% more latency then iGPU)
iGPU = 50 batch's (50 total cycles used in latency) = (50 * 3) +50 = 200 cycle total execution time
dGPU = 10 batch's (50 total cycles used in latency) = (10 * 3) +50 = 80 cycle execution time
2 instruction sets, 3 cycles each (load, execute, store) = 6 total instructions
iGPU: 1 batch (1 total cycle used in latency) = (1 * 3) +1 = 4 cycle total execution time
dGPU: 1 batch (5 total cycle used in latency) = (1 * 3) +5 = 8 cycle total execution time
In the first instance the instruction load was big enough to fill the entire dGPU and take advantage of it's larger resources. The 500% increased latency is a big factor in it's total performance (30 execution cycles vs 50 latency) yet due to it's 5x higher throughput it still easily beats the integrated option. In the second instance there was a much smaller set of instructions because we need the results back before we can branch to the next segment of code. In that scenario we are unable to take advantage of the higher performance of the dGPU and the iGPU becomes the better option.
This is what we are talking about when we mention that dGPU's won't be replaced. In larger workloads they are significantly superior to the iGPU's due to them having larger heat and space constraints. The iGPU's only pull ahead in small quick workloads when you need the results of the calculation ASAP.
so where did you debunked it juan ? imaginary land !
And that post from him was adequately replied. Of course, you missed the answer, how unsurprising!
I already explained why "some basic facts" and physics show that GPUs will be killed.
This proves again that you don't have any idea about the hardware that was being discussed. You are shutting random numbers from your...
Are you admitting that your 8K comment was FUD and that you dont have any idea of the hardware that you are commenting on?
A genius doesn't make claims about hardware and latter ask which is the hardware configuration.
This is all in your imagination. I never said that kill ram or HDD.
The 5x myth was debunked before. The upgradability issue was also replied before...
Except that you confound bulldozer marketing with an academic claim based in the laws of physics. I am not reproducing the claim of particular company who want sell its products.
as per my expectations
quote me and ask me question instead of answering
you don't know and just living in imaginary world
and instead of giving answer you are just saying that you already gave the answer to me and to roy too
This proves again that you don't have any idea about the hardware that was being discussed. You are shutting random numbers from your...
what would you like to say to those who use features like quad sli or crossfire?
and in 2020 if someone tries to play a game on his 4 monitor setup each at 8k resolution then how these apus would be able to store the game data need for processing by igpu ? would amd use the crystals of some broken crystal ball ?
i don't even need to know the hardware, because i am asking you to give the hardware details which will handle these things of that time and you don't have any answer and you are too stubborn to admit it
let me remake this question for current time, then it will be like this
"what hardware i need to play and record gta 4 at 1080p 4 monitor setup, budget is 2k"
answer would be like this http://www.tomshardware.com/answers/id-1966325/high-end-multi-monitor-set.html
did you saw list of configs there
this is how you should give me the reply
"future apu will have bada b bada bu bada bum which eliminates the use of sli crossfire"
oh wait, you can't give any answer because you don't know any answer and don't know how to give answer (just saw your count of best answers) so only thing you can do is trap me in my own question by asking me the config of that apu
Are you admitting that your 8K comment was FUD and that you dont have any idea of the hardware that you are commenting on?
2011, galaxy s2 relesed with WVGA display
2012, galaxy s3 with HD display
2013, s4 with FHD display
2014, g3 with WQHD display
did you say increase in display resolution
so tell me do you think that 8k on pc won't be possible by 2020 and this is why you are saying that my question was fud ?
This is all in your imagination. I never said that kill ram or HDD.
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.
For the sake of comparison, a GTX680 has 192 GB/s BW available.
so tell me what does it mean
different memory for cpu and igpu that is not for hsa
or killing ram
for apu which is aimed for hsa the first option was not an option anyway so what i left with is "killing ram"
or is it doing something like xboxone to give a gaming experience of 720p @30fps
The 5x myth was debunked before. The upgradability issue was also replied before...
i din't saw any post where paladin admitted that dgpus will not be able to provide 4-5x more performance than an apu
indeed i always saw something like this
palladin9479 :
Juan has a very poor understanding of what "locality" is, and I believe you do to. All it means is that the latency of a particular circuit is incredibly low due to it being on the same die. In the case of dGPU's vs iGPU's we get into a particular quandary. Anything you can do on the iGPU you can do 4~5x of on a dGPU, this relates to everything from vector processing power to memory access speeds / bandwidth. In order for a iGPU to be "better" then a dGPU the workload would have to be small enough that it would be complected in less time then it takes the dGPU to receive it.
Ex
100 instruction sets, 3 cycles each (load, execute, store) = 300 total instructions
iGPU: 2 per cycle with 1 cycle latency per batch
dGPU: 10 per cycle with 5 cycle latency per batch (500% more latency then iGPU)
iGPU = 50 batch's (50 total cycles used in latency) = (50 * 3) +50 = 200 cycle total execution time
dGPU = 10 batch's (50 total cycles used in latency) = (10 * 3) +50 = 80 cycle execution time
2 instruction sets, 3 cycles each (load, execute, store) = 6 total instructions
iGPU: 1 batch (1 total cycle used in latency) = (1 * 3) +1 = 4 cycle total execution time
dGPU: 1 batch (5 total cycle used in latency) = (1 * 3) +5 = 8 cycle total execution time
In the first instance the instruction load was big enough to fill the entire dGPU and take advantage of it's larger resources. The 500% increased latency is a big factor in it's total performance (30 execution cycles vs 50 latency) yet due to it's 5x higher throughput it still easily beats the integrated option. In the second instance there was a much smaller set of instructions because we need the results back before we can branch to the next segment of code. In that scenario we are unable to take advantage of the higher performance of the dGPU and the iGPU becomes the better option.
This is what we are talking about when we mention that dGPU's won't be replaced. In larger workloads they are significantly superior to the iGPU's due to them having larger heat and space constraints. The iGPU's only pull ahead in small quick workloads when you need the results of the calculation ASAP.
so where did you debunked it juan ? imaginary land !
juanrga :
truegenius :
what would you like to say to those who use features like quad sli or crossfire?
I would say the same that I said the first time, and the second time, and the third time,... that I was asked about that.
where is your first second third or 4, 5, 6, 7, 8, time (quote and link so that i can check check them)
i got it
all these are located in imaginary land
gotcha
i will say again ( instead of just saying that i already said it 20 times ) reduce the dose
A genius doesn't make claims about hardware and latter ask which is the hardware configuration.
AMD Core Day lays out their ARM vs x86 strategy
Not one vs the other, swappable components from top to bottom
http://semiaccurate.com/2014/05/14/amd-core-day-lays-arm-vs-x86-strategy/
Today we can extract heat from 220-300W CPUs using air cooling. The main problem for high-performance APUs has been the memory bottleneck of PCs. AMD can push a 2TFLOP APU for the PS4 because the console is using a fast system memory beyond the slow DDR memory used in PCs.
Memory bottleneck is solved by HBM/HMC memory. AMD is already integrating HBM on the next APUs. Nvidia is doing the same (APUs with up to 1.6TB/s BW). Intel will start selling 'CPU' with packaged MCDRAM (500GB/s BW) the next year.
For the sake of comparison, a GTX680 has 192 GB/s BW available.
so tell me what does it mean
different memory for cpu and igpu that is not for hsa
or killing ram
I suppose that you are asking for the bold part. Well, it doesn't means your "different memory for cpu and igpu that is not for hsa" nor your "killing ram". Those two are only in your imagination. This show again that you continue pretending to criticize hardware that you don't understand.
did you saw that i had written something else there
"so tell me what does it mean"
it is in english so i don't think it will be that hard for you to understand that you just ignored
juanrga :
truegenius :
for apu which is aimed for hsa the first option was not an option anyway so what i left with is "killing ram"
or is it doing something like xboxone to give a gaming experience of 720p @30fps
yeah, current gen consoles are making pc gamers laugh on them
juanrga :
truegenius :
i din't saw any post where paladin admitted that dgpus will not be able to provide 4-5x more performance than an apu
indeed i always saw something like this
palladin9479 :
Juan has a very poor understanding of what "locality" is, and I believe you do to. All it means is that the latency of a particular circuit is incredibly low due to it being on the same die. In the case of dGPU's vs iGPU's we get into a particular quandary. Anything you can do on the iGPU you can do 4~5x of on a dGPU, this relates to everything from vector processing power to memory access speeds / bandwidth. In order for a iGPU to be "better" then a dGPU the workload would have to be small enough that it would be complected in less time then it takes the dGPU to receive it.
Ex
100 instruction sets, 3 cycles each (load, execute, store) = 300 total instructions
iGPU: 2 per cycle with 1 cycle latency per batch
dGPU: 10 per cycle with 5 cycle latency per batch (500% more latency then iGPU)
iGPU = 50 batch's (50 total cycles used in latency) = (50 * 3) +50 = 200 cycle total execution time
dGPU = 10 batch's (50 total cycles used in latency) = (10 * 3) +50 = 80 cycle execution time
2 instruction sets, 3 cycles each (load, execute, store) = 6 total instructions
iGPU: 1 batch (1 total cycle used in latency) = (1 * 3) +1 = 4 cycle total execution time
dGPU: 1 batch (5 total cycle used in latency) = (1 * 3) +5 = 8 cycle total execution time
In the first instance the instruction load was big enough to fill the entire dGPU and take advantage of it's larger resources. The 500% increased latency is a big factor in it's total performance (30 execution cycles vs 50 latency) yet due to it's 5x higher throughput it still easily beats the integrated option. In the second instance there was a much smaller set of instructions because we need the results back before we can branch to the next segment of code. In that scenario we are unable to take advantage of the higher performance of the dGPU and the iGPU becomes the better option.
This is what we are talking about when we mention that dGPU's won't be replaced. In larger workloads they are significantly superior to the iGPU's due to them having larger heat and space constraints. The iGPU's only pull ahead in small quick workloads when you need the results of the calculation ASAP.
so where did you debunked it juan ? imaginary land !
And that post from him was adequately replied. Of course, you missed the answer, how unsurprising!
adequately replied or ignored
juanrga :
palladin9479 :
Juan has a very poor understanding of what "locality" is, and I believe you do to. All it means is that the latency of a particular circuit is incredibly low due to it being on the same die. In the case of dGPU's vs iGPU's we get into a particular quandary. Anything you can do on the iGPU you can do 4~5x of on a dGPU, this relates to everything from vector processing power to memory access speeds / bandwidth. In order for a iGPU to be "better" then a dGPU the workload would have to be small enough that it would be complected in less time then it takes the dGPU to receive it.
Ex
100 instruction sets, 3 cycles each (load, execute, store) = 300 total instructions
iGPU: 2 per cycle with 1 cycle latency per batch
dGPU: 10 per cycle with 5 cycle latency per batch (500% more latency then iGPU)
iGPU = 50 batch's (50 total cycles used in latency) = (50 * 3) +50 = 200 cycle total execution time
dGPU = 10 batch's (50 total cycles used in latency) = (10 * 3) +50 = 80 cycle execution time
2 instruction sets, 3 cycles each (load, execute, store) = 6 total instructions
iGPU: 1 batch (1 total cycle used in latency) = (1 * 3) +1 = 4 cycle total execution time
dGPU: 1 batch (5 total cycle used in latency) = (1 * 3) +5 = 8 cycle total execution time
In the first instance the instruction load was big enough to fill the entire dGPU and take advantage of it's larger resources. The 500% increased latency is a big factor in it's total performance (30 execution cycles vs 50 latency) yet due to it's 5x higher throughput it still easily beats the integrated option. In the second instance there was a much smaller set of instructions because we need the results back before we can branch to the next segment of code. In that scenario we are unable to take advantage of the higher performance of the dGPU and the iGPU becomes the better option.
This is what we are talking about when we mention that dGPU's won't be replaced. In larger workloads they are significantly superior to the iGPU's due to them having larger heat and space constraints. The iGPU's only pull ahead in small quick workloads when you need the results of the calculation ASAP.
Nice answer but there are at least three problems with it:
The first problem is that you continue misunderstanding the main point. The principle of locality identified in exascale research is derived from the power wall problem associated to the nonlinear scaling of the required silicon. It is not derived from latencies associated to integration. Thus all your discussion about latency is useless because the problem that I am mentioning is qualitatively and quantitatively different.
The second problem is that your discussion about latency is wrong because you continue doing incorrect extrapolations from current APU/dGPU designs that don't apply to the new future designs. E.g. you are considering a throughput ratio of 5x, when the Nvidia die diagram clearly show that the best ratio will be 9/8 = 1.125x for the future designs. As I showed before, the 1.125x better throughput of a future discrete GPU is completely exceed by the dCPU--dGPU interconnect, which results on the future APU being faster than the future discrete GPU. Since the APU will be faster, the Nvidia engineers are not wasting their time in the design of any discrete GPU for their future top product.
The third problem is that I am not alone in my 'ignorance', but in the good company of any expert in HPC/GPGPU... like the mentioned research team of the quote that you ignored and deleted in previous posts. As I said before you don't need to convince me of anything. You need to convince to the engineers that design the GPUs. You need to meet them, explain them why are wrong and then convince them to abandon their plans and designs.
In short: your answers continue misunderstanding what I am really saying, continue repeating the same mistakes corrected before, and continue ignoring the quotes from experts that openly disagree with you.
let me remind you my reply again i din't saw any post where paladin admitted that dgpus will not be able to provide 4-5x more performance than an apu so where did you debunked it juan ? imaginary land !
give me answer juan
show me where he agreed with you that future gaming dgpu will be well behind apus ?
or let me hear from paladin if he agrees with you that future gaming dgpus will be behind apu's igpu
and let me over simplify all the discussion going on
here i accept that i don't know abcd of computers ( and admit that Playing the flute to a buffalo (you) is a waste )
so take my question as as fresh question ( and think that i missed all the previous 263 pages of discussion so don't talk crap like "i said it in my earlier posts")
and now will you enlighten me by giving me the specs of your apu, year and cost ( keep in mind that 390x is supposed to have 4224 gcn cores (2x core count in 3 years so you can imagine the performance of single die dgpu by 2020 let alone dual gpu cards and multi cards config which is your apu going to take-on) and it is only 2015 so your apu should be well ahead of 390x)
and how will it cope with future high resolution gaming, what resolution you are expecting this apu to be able to handle for lag free gaming
Being able to mix and match X86, GCN and ARM logic onto processors is fantastically flexible.
Their big cores do need some work- although I think most of the problems with the earlier cores have been a matter of *when* they were released rather than the product itself being fundamentally that bad. Phenom I was late (and there was a bug admittedly). Phenom II sorted out much of the problems and was actually a pretty good product and was competative against Core 2 Quad. Phenom II X6 was a server part re-purposed as an answer to the first gen Core processors however that was when Bulldozer was intened to be released (and actually bulldozer doesn't look so bad against the frist gen Core i7). Trouble was bulldozer got delayed until well into the life cycle of Sandy Bridge.
One thing I have noted- AMD's high core count cpu's have been aging pretty well. I'm running on a Phenom II X6 which I got years ago and it's still fine now. I think part of this 'the pc market is doomed' idea comes from the fact that pc sales have dropped- but a large part of that is that PC's last so long now.
My parents have only just replaced their main system, which was rocking an Athalon 64 X2 5000 from the year it was released (2005 I think). For web browsing and light duty stuff that was more than enough power for them. Unless you're into the competitive gaming scene there is little reason to replace a pc other than due to hardware failure (in my parents case the ancient IDE hdd was pretty much dead).
i've been reading more on memory, took a lot of time to understand a few things. most of these are new.
juanrga :
Sure, because performing three changes of units instead starting with the correct unit or dividing a number by two and next multiply it by two to obtain again the original number falls into "relevant, important and verifiable".
the above is a straw man argument. ironically, your straw man weakens your own claims i.e. the bw numbers you posted. if you mock the underpinnings of the end results that you posted yourself, you're only ridiculing yourself. it woulda been correct if the end bw numbers didn't match up or if the calculations were wrong. the fact that You failed to provide details of your own claims and went on trying to ridicule the mechanisms shows your lack of understanding and research. this is as far as you go on this matter. i don't want to be a part of your fallacy anymore.
juanrga :
I provided you latencies (in ns) for both GDDR5 and DDR3. I also provided two links proving my point that it is a myth and another two links from forums discussing the origin of the myth. Of course, you didn't read anything of that and if you follow the same typical patron you will ask me about giving the links or the numbers again.
i read them. it's not that they lack credibility, some of them do lack because random forum users arguing in a thread doesn't make for a credible source. still, the data sheets and latency calculations showed the absolute latency being similar, or close. if i am to take that as correct, then that pretty much invalidates your claim of ddr3 in pcs being "slow" and gddr5 being "beyond" ddr3. but if those are just gddr5 proponents trying to force their point home using mathematics instead of measured findings, that's another story. why? because then other factors like cost, memory access protocols, power use, complexity etc. come into play.
i do ask you to provide more information though. for example, explanation of this following:
juanrga :
DDR3 only handles an input or output but not both on the same cycle. GDDR handles input and output on the same cycle.
i'd like you to clarify what this means. hopefully, others will verify what you state.
juanrga :
It is not my speculation but what AMD planned to do as was reflected by many tech sites.
i wasn't concerning with amd's plans or other tech sites. i was trying to imagine kaveri as mainstream consumer product aimed at casual pc gaming.
juanrga :
I see that I didn't understand your speculation. Ok. I understand it now. Well... In the first place, it is odd and goes against the goal of the integration of components.
it complies with the integration, it actually takes advantage of it. using on board gddr5 ram as vram, eliminates the need for a cape verde-class discreet gfx card and enables a lot of gaming performance in an intel nuc-like enclosure. ecs has released such a motherboard very recently. a bit of tweaking that concept can easily make it real. another advantage would be the ability to use an on board discreet gpu with dedicated vram, reducing vertical size of the whole pc.
juanrga :
In the second place it breaks fundamental aspects of Kaveri such as huma and HSA.
i don't know much about huma, but i explicitly ignored hsa because at present, no games support hsa. since the apus are in mainstream consumer parts, those pcs are highly likely to be replaced by something fully hsa-compliant and running widely available hsa-enabled software in the future. right now, the hsa aspects of kaveri only benefits software developers and enthusiasts who want to play around with hsa, no appeal to casual crowd.
juanrga :
In the third place, as reported in AMD original docs, the DDR3 controller and the GDDR5 controller are incompatible; this is the reason why AMD did plan to support either GDDR5 or DDR3 but not both at once.
if true, this is an important factor, since amd was in a position to choose either but not both. i speculated that amd choose both (at the expense of die area). imo the gain in igpu bandwidth woulda been worth the extra die space ( for aggregate 128bit bus).
juanrga :
The idea of using quad channel but only with 1600MHz modules is even more weird!
Cheapest?
G.Skill RipjawsZ DDR3 1600MHz 4x4GB CL7: 151 €
G.Skill RipjawsZ DDR3 2133MHz 4x4GB CL9 - 151 €
The slower RAM costs the same than the faster RAM. :lol:
pricing changes on a daily basis. ddr3 prices have been on the rise since memory manufacturers shifted to mobile - explains why the 1600 kit is so expensive. i'd still address this one. i saw the price you posted and immediately suspected why you didn't use a u.s. shopping site. turns out that the ddr3 2133 cl9 kit is 1.6v and is a fair bit more expensive. that proves how pricing is not really a good excuse in this case of an imaginary apu. if had to really pick a quad ch. kit, i'd pick one with 1.5v ddr3 2133 4x 8GB or 16GB when they become available. if had to go cheap, i'd pick a random 4x 2GB ddr3 1600 kit for around $90.
juanrga :
Caza unability to understand the difference between the IO bus frequency and the data frequency (sometimes named "effective frequency") resulting from the double data rate arch is not going to change the standard way to refer to DDR3 memory speed.
aww. you're trying, in futility, to play a game of semantics using naming conventions and actual measurement units. poor try.
juanrga :
I am sorry, but I will continue referring to 1600MHz memory as... 1600MHz memory and to 2133MHz memory as... 2133MHz memory.
And don't get me started on mobile OGL drivers; they're a giant minefield.
Its kinda sounding like the devs are hearing about the focus on OGL, and they're trying to remind everyone OGL sucks.
Let me guess:
Vendor A (the "Graphics Mafia") == Nvidia
Vendor B (the "idiots with software") == AMD
Vendor C ("They don't really want to do graphics") == Intel
Pretty much. AMD/ATI is well known for HORRENDOUS OGL driver support over the years.
Vendor D would be Qualcomm, who's mobile OGL drivers don't even come close to meeting OGL spec. You have no idea how many times I've seen open source projects have to work around OGL problems in their driver stack.