AMD CPU speculation... and expert conjecture

Page 608 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Everyone knew that 176 GB/s is the maximum bandwidth allowed by the memory. The effective bandwidth depends of lots of factors, including programmer ability to use the hardware. If your code is bad or not optimized the hardware will not optimize it for you.

The geniuses at wccftech took a slide as basis for their typical 'professional' level ruminations. :sarcastic:
 
PS4 GPU has an Effective Bandwidth of 120 to 140 GB/s, Not 176 GB/s As Previously Reported
http://wccftech.com/sony-ps4-effective-bandwidth-140-gbs-disproportionate-cpu-gpu-scaling/

Not shocked; you have the CPU and GPU both fighting for the same memory resources. So when one is getting data, the other is probably blocked from accessing the bus.
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


depends on the uses, if its just an htpc, could go with an embedded solution, sapphire has a 4"x4" board with cpus that go from 6w to 25w. with that small you could mount it to the back of the tv and hide all the wires, just depends on the application or use.

http://www.sapphiretech.com/embedded/product.aspx?pid=embedded_boards

pair it with one of these if you need a drive.

http://www.storagereview.com/samsung_intros_wireless_optical_drive_and_portable_bluray_writer

 

The bus is a lot smarter than you give it credit.
 

8350rocks

Distinguished


Yes, HTX can have multiple lanes, and the PS4 hardware was already known to have HTX from CPU to memory, and GPU to memory, bypassing the northbridge to get data directly.

Now, the interesting situation is, are developers using the direct route to memory efficiently? That is likely where this generations performance gains will come from. Programming for higher efficiency of the direct route to memory, among other optimizations for the consoles.
 


Ah, but you forget the case where the CPU/GPU want to access the same data. That will kill your bandwidth in a hurry, since the memory access will fail due to software locks. So one component has to sit and wait for the other to finish.

You're never going to get perfect theoretical numbers anyway, simply due to overhead.
 
i thought that the throttling was due to the cpu (on load) eating into the igpu's thermal budget and causing clockrate throttling thus lowering the bw. the cpu and the igpu should have full "visibility" of each other's caches so if one could read the other's data, it could do it.
 


Did you not read? Why the hell would I "get rid of" consoles that I actually use just to satisfy some internet guys feelings? And no you can't change the parameters of the problem to fit your preconceived solution, the real world doesn't work that way. The living room PC is a cross between a HTPC and a casual gaming box, why in the hell would I over-engineer it by making it some large blockish device just for "internet coolz points!!!". Small, stylish (or hidden), low power and low noise are my requirements. The kinds of cases and cooling have a 65W cap with 100~120W being total system usage otherwise you start to need more airflow which in turn means larger cases and / or more fans. That is what makes the A8-7600 stand out, it provides about 90% of the actual graphics power that the 7800/7850 have but with a low enough TDP to make use of ULP components.

I guess the point I'm making is that I entertain guests and having my living room look nice is a requirement, it can't look like a gamers shack or a geeks den, that is what my lab is for (that picture was when I was testing a bunch of stuff, in day to day stuff those game box's and much of that wiring isn't visible).
 


Testing memory speeds with a single threaded program is a very bad way to go about it, all it does is test how good the cache is at prediction. Best way to test is multiple independent memory copies happening simultaneously. Create an imaginary 10GB data source, split it into four 2.5GB chunks then do four bulk copies at 16MB block sizes. Should completely bypass all cache and prefetch checks on all systems and give you an absolute memory bandwidth. This is important because while the CPU tends to access memory in a serial fashion, GPU's access it in very large bulk transfers in parallel. So just because a single thread on the CPU gets ~60% cache efficiency doesn't mean the GPU would get the same, especially since the GPU doesn't use the CPU's caching mechanism at all. In the case of AMD uArch's, the limiter on memory access isn't the IMC but the prefetcher. The IMC will sit there idling waiting for the prefetcher to tell it to get something.
 

jdwii

Splendid
Well its all up to personal preference i guess obviously their is a market for it however unless these steam boxes wouldn't exist or those types of cases, i know some people are however very picky about what is in their living room its usually called WAF. I personally don't care to hide that i'm a geek around others i would feel like i was lying to them, if they didn't like it i would show them a map that i can print out with a layout of my house and the door.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Indeed! And there are other improvements beyond the main bus, such as the second bus that allow the GPU to read/write directly to system memory, eliminating synchronization with CPU. This second bus can pass almost 20 gigabytes a second.

It is worth to mention that the unified memory pool of the PS4 was the largest piece of feedback the company got from game developers, because this solves "a common bottleneck where data has to be shuffled from main memory to graphics memory and back again in non-unified designs".
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
As far as game software and hardware is concerned, we've hit a standstill. We're not strong enough for ray tracing yet we've got enough shader power to let mid range cards perform at desired resolutions. CPUs have no where to go because we've reached the point where single core performance is not really improving drastically while games are stuck running only a few threads.

DirectX has stagnated beyond belief. I remember getting a DX9 card and playing UT2k4 and having my mind get blown at all the cool new effects. That sort of thing hasn't existed in a long time. It's to the point where you can use a 6950 or GTX 480 and not miss out on anything significant while playing games.

There are two things that can happen. Either this market can be abandoned entirely, or the software problems can be solved via things like Mantle. GPUs need to be stressed in a way beyond shader power and CPUs need some task to scale to a lot of cores. PC gaming needs some sort of killer feature that makes people want to upgrade. It's grown way too stagnant as I said before. People are content with mid-range cards and mid range parts. The raw number of pixels a card has to push hasn't really gone anywhere. Adopting LCD was probably one of the worst things to happen to enthusiast GPUs.

I worry about shifting so hard to mobile. We've seen what can happen there. Eventually the smaller, cheaper chips catch up (like mediaTek) and it starts this big race to the bottom with devices with prices. And it's left a lot of ARM markets with small margins. I realize some of you really believe mobile is 100% the future, but Nvidia has spent 5 product cycles trying to push a premium, high performance ARM part and it hasn't gone anywhere. So they can either race with MediaTek and company to the bottom or remain uncompetitive. Yes, the market is growing, but is it the kind of market you want to be in? Competing with a bunch of cheap Chinese companies and anyone else who can buy an ARM license and compete with you? As compared to a high end product like large GPU where the barrier to entry is so massive you can't just flop over and see a new company with a competitive GPU. The fact that ARM and ARM chip makers are pushing for ARM in other things besides servers tells me they want out of the mobile market and they either see the market going downhill as it gets eaten from the bottom up or that demand is going to die down soon.

I do think that if Mantle, OGLNG, and DX12 end up with game developers pushing hardware in ways they couldn't before due to *some things* being able to scale to more cores and pushing existing hardware much harder, that it could turn sales around. If we got games that made the mid range cards cry and run slow instead of run fine at 1080p, sales would change. The market is there but the demand is not. And if ARM mobile devices keep going the way they are going, it will end up the same.
 
amd apus are getting price drops too. not as much as i expected though
http://www.xbitlabs.com/news/cpu/display/20140821121037_AMD_to_Lower_Prices_of_A_Series_APUs_for_Back_to_School_Season.html

Avexir Readies 3.40GHz DDR4 Memory Modules.
DDR4 Could Hit 3.40GHz This Year
http://www.xbitlabs.com/news/memory/display/20140821223332_Avexir_Readies_3_40GHz_DDR4_Memory_Modules.html
this, i wanna see running on a carrizo pc.

AMD Radeon R7 250XE GPU Is Targetted Directly at 1st Generation Maxwell – No Power Connector
http://wccftech.com/amd-radeon-r7-250xe-no-power-connector/
 

jdwii

Splendid


Yeah the 7850K is still the same price man i have no idea WTF they are thinking. The 250XE seems like a simple downclock i thought they would make a update to their GCN design and go right after the 750Ti. When are we supposed to see a update on their GPU series i'm pretty sure the 290X is noting more than a 7970ghz edition with more cores.
 


7700K to $140 puts it inside the price range where it's actually worthwhile. And WTF at the 7600 going down to $100.

DDR4 going to do very interesting things to APU's. 3400MT/s nets you 27.2GBps per channel so 54.4GBps in a typical dual channel configuration. Of course we won't see that at value pricing for another year or two, that will be when iGPU's get another jump in capability.

Got done reading and do people still not understand the relationship between CL and clock rate?

For those wondering with DDR4 has such higher "latency" its' because physics places a real limit on the refresh of a memory cell in a DRAM configuration. That limit is about 7ns on really good silicon, you can get faster if you go to SRAM but it gets really expensive really fast. Since timings are measured in clock ticks, the faster the clock the higher the CL needs to be to be above that 7ns barrier.

DDR3-1600 CL8 is 10ns, DDR-2133 CL11 is 10.318, DDR3-2400 CL12 is 10ns and so on. So the current advertised latencies are about what you'd see on a new technology and once it matures you'll see the same 10ns for mainstream with ~7ns at the expensive end.
 

sapperastro

Honorable
Jan 28, 2014
191
0
10,710


Be careful what you wish for. When prices were going through the roof and gpus were out dated before the first 12 months had passed, I knew a hell of a lot of people that leaped off them to the consoles and mobile devices. Those people have been streaming back in because of the affordability these days. It used to take a LOT of money to keep up with the top 8+ years ago, and even I used to get frustrated when my 12 month old super pc became a cheap hooker in a year or so time.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Kaveri probably has quite nice memory controller... for GDDR5

http://www.chip-architect.com/news/Kaveri_Trinity_2014-01-07.jpg

Unfortunately, they've never released mobo with GDDR5.
 

Slobodan-888

Reputable
Jul 17, 2014
417
0
4,860
I don't quite understand you.

Someone is planning to release a motherboard with integrated GDDR5 memory that can be used by the APU's iGPU (instead of using system RAM)?

Edit: OK I see what you mean. But why it is so limited for DDR3? And can you, perhaps, help me with my problem?
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


I heard that AMD excepted lower prices for GDDR5 in 2014. Maybe they'll release something with GDDR5 in future. For now DDR3 is the only option and A10-7850K can't show its whole potential.
 


I very much doubt we will ever see Kaveri with GDDR5 on the desktop. There is a slim possibility that it may show up in mobile though (I don't think mobile Kaveri parts are out yet?)...
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780

Mobile parts are out, just no devices on market now.

I have to correct myself:
http://www.anandtech.com/show/7702/amd-kaveri-docs-reference-quadchannel-memory-interface-gddr5-option
It's a DDR3 controller, but wider to be compatible with GDDR5 controller.
 


Ah so is that to allow it to crossfire with a GDDR5 dgpu then?
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
The group who designed AMD bulldozer family had problems with both caches and system memory. To get better caches and better memory controllers you need both smart people and money. AMD lacked both, and Bulldozer is what it is.

It is worth mentioning that one of the main tasks made by Keller during last years at AMD has been on the cache subsystem. AMD has developed dozens of new techniques to improve caches. A part of his work on caches will be transfered over Excavator modules, but the whole of the improvements is aimed for the new architectures: K12/Zen.

As other companies, AMD supports official specs for DDR. Last DDR-3300 modules are not part of official JEDEC specs and thus neither AMD nor Intel support if officially. Overclocking is not official support and it depends on silicon lottery.

DDR3 is in its physical limits. AMD wouldn't waste time and money on developing an improved memory controller when the modules cannot scale up enough and when DDR3 is going to be replaced soon. This is why they turned its eyes towards GDDR5 memory. This memory is much faster than DDR3 and a six-core version of Kaveri with a more powerful iGPU and GDDR5 for system memory was planned, but it had to be abandoned in last minute because one of the companies doing the GDDR5 DIMMs was out of business.

The GDDR5 memory controller is still in the Kaveri die, but it was fused out. AMD docs for Kaveri still mention the quad memory controllers: DCT0, DCT1, DCT2, DCT3.

No future APU will use GDDR5 as system memory because the company that would do the DIMMs continues out of business. AMD is now moving to HBM, a new JEDEC standard, which can provide more bandwidth and efficiency than GDDR5.

Finally, I mention that Carrizo officially supports DDR4-2400. Thus DDR4 will not bring any bandwidth benefit to Carrizo APUs and, in fact, Carrizo mobile will probably only support DDR3. DDR4 support makes sense for the server version of Carrizo: Toronto.

We would see improvements however from the new cache subsystem (which it seems that also reduces latencies).
 

CooLWoLF

Distinguished
The new 95watt 8370E is a very interesting addition to the FX line. Considering how well the 8320/8350s overclock, I am excited to see how far someone can push the 8370E with that extra room with regards to reduced TDP.
 
Status
Not open for further replies.