AMD CPU speculation... and expert conjecture

Page 268 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. In the first place both Sony and Microsoft chose AMD because no other company could provide the hardware they did need for the consoles. In fact, Intel was discharged very fast in the competition. Your core i3 or core i5 laptop cpus will be viable for gaming in future because consoles are weak is plain wrong, since consoles performance is comparable to a top gaming PC (desktop).

Also 30 fps in consoles usually means 'constant' 30 fps, whereas that 60 fps in PCs usually means average 60 fps. As many gamers know, console games at 30 fps can run smoothly than PC games at 60 fps, providing a better gameplay.

Second, developers can target 60 fps in consoles, but since 30 fps is good enough (see also above) for most gamers, developers prefer to use the remaining performance in other aspects of the game. Precisely one of your links says:

Eidos Montreal producer Stephane Roy has suggested, telling VideoGamer.com that he would "prefer to have better physics" in PS4 Thief than a smoother "60 frames per second frame rate".

Third, Carmack said us time ago how he and his team had problems with PCs:

It is extremely frustrating knowing that the hardware we've got on the PC is often ten times as powerful as the consoles but it has honestly been a struggle in many cases to get the game running at 60 frames per second on the PC like it does on a 360

Something similar will be happening with this new gen.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


HT/HTX breaks hUMA, because HT/HTX provides memory access times that depend on the memory location relative to a processor. The memory in the main bus would be faster than the memory accessed through the HTX slot.

In fact, Hypertransport is used in NUMA architectures. AMD itself uses Hypertransport in a proprietary extension: ccNUMA.



I already provided you link and quote saying:

The HSA Foundation has released a new multicore architecture specification called hUMA or heterogeneous uniform memory access.

Therefore hUMA is a HSA Foundation thing. However, it is true that doesn't seem to be part of the HSA specification. It is not incompatible either.



There is no misunderstanding. The rumour was based in leaked documents that clearly indicated that the dual memory controller could use either DDR3 or GDDR5 but not both at the same time. This is waited, because a mixture of DDR3 and GDDR5 would break hUMA.

I still hope that I will see a high-end APU using GDDR5 as system memory. Several companies will be releasing GDDR5 modules in SO-DIMM format. It is speculated this is for a high-end Kaveri or Carizo.

What do you mean by "latency of a DIMM"? GDDR5 latency is essentially the same than DDR3: about 10 ns.

A Kaveri APU with GDDR5 will be much faster than with DDR3. The HD7750 has 512SP and the GDDR5 version is about a 50% faster than the DDR3 version.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. The GPU has improvements you don't find in any other (not even in top $1000 dGPUs). And the CPU runs circles around your i5. As said above the dev. kit uses FX chips. There is no FPS issues; I explained this to you just minutes ago.

As is well-known, AMD won both consoles contracts because no other company had the leading technology. AMD won the benchmarks and performance simulations. Intel was discharged soon in the race.

Physics, AI, and other intensive computations will be made on the gpu because no cpu in the market (AMD or otherwise) can offer the same performance. Only 4 CUs (PS4 has 18 CUs) provide much more performance than an i7-3770k.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Of course AMD won also on price. Intel charges over $200 for the i5 3210m. That's half the BOM cost and you'd need a $200+ class GPU to go with it, which would rocket past the XBone price. How many would buy a $600 PS4?

Would you also prefer they use Iris Pro x2 and drive the price further to like $1000 for a console?
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


aren't consoles hooked up to tvs and not 120hz monitors? I wonder what the refresh rate just might be ...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. This was explained to you minutes ago. Do you read? As I said simulations and benchmarks run by Sony and Microsoft shown that AMD solution was faster than anything Nvidia could provide. Nvidia didn't had the technology. You forgot to cite this link from the same site

http://www.techradar.com/news/gaming/consoles/amd-on-the-ps4-we-gave-it-the-hardware-nvidia-couldn-t-1141607

As I said that cpu run circles around your i5. But this is unimportant and that is why Cerny don't pay attention to the cpu. No cpu in the market can compete with the gpu regarding physics, AI, and other performance intensive computations. Even a small part of the gpu has more performance than an i7-3770k. This was explained to you minutes ago. Do you read?
 


HDTV's running at 1920x1080 resolution. Might as well call it a monitor for all the difference it makes.

It was easy to see the APU concept winning for embedded devices (consoles) for the same reason it wins in the mobile market / SFF market. The price vs performance vs size is just amazing, especially if you can soldier everything to a board. The better memory you couple with it the better it gets.
 


Not even remotely close. Your "i5" has 12 ALU's without HT while the AMD chip has 16 ALU's. The iGPU vector performance alone crush's anything Intel has so far (for CPU's).
 

8350rocks

Distinguished


Except Jaguar didn't come from piledriver, if it did it would be called...wait for it...STEAMROLLER!

Jaguar is an improvement over bobcat, so stop this relentless bit of insanity. You can't win against an informed crowd because they know what you're full of, and it isn't facts.
 


Eh, so MIT is trying what it tried, and failed to do, with its 10,000 CPU cluster back in the 80's. And they'll find the same exact problems at the end of the day: After about 16 or so CPU's, performance gains decline to almost nothing, and after 32 or so, you start getting negative performance.

100 threads nightmarish? I'm not sure so sure. Once you've got 4, 6 or 8 threads working correctly, scaling up to 100 threads is not such a big deal - all the technical issues have been solved, you're just doing more of the same.

"Working correctly"? Do you even know how to thread? In a Windows environment:

http://msdn.microsoft.com/en-us/library/kdzttdcb(v=vs.90).aspx

Not that complicated; just an invokation of _beginthread (which itself is just a wrapper around CreateThread(), minus the memory leak).

The real issue is one of thread control.

Anytime you have multiple threads touching the same object, you HAVE to wrap that object around some form of thread control structure (Mutux, CRITICAL_SECTION, etc) that prevents more then one thread from touching it at the same time. Whenever two threads try and touch the structure, one has to wait. The issue then becomes, if you have multiple threads that try and touch the same objects often, threading will likely reduce performance, due to the threads having to constantly wait on eachother to get their work done (a problem that grows orders of magnitudes worse if those threads in question have child threads which are ALSO forced to wait). There have been various ways over the years to improve performance in the best case outcome (Transactional Memory is a good example of such a workaround), but all tend to increase the worst case penalty as a result.

In short: The more threads you have touching the same data structures, the more performance degrades.

Lets take a look at media encoding: You have some form of file in some format that looks something like this at a really high level:

Header data
Media Data
Media Data
Media Data
...
Footer data

In theory, since every chunk of Media Data is the same exact format, you can easily make a multithreaded solution that could (theoretically) scale up to the number of chunks of Media Data that need to be processed. Heck, anything to do with file processing/formatting is trivial to make parallel, and guess what? Productivity software has already gone this route.

Games are harder, much much harder. You can scale within a specific part of a game engine without too much difficulty, but threading the different engines themselves leads to significant deadlock situations.

You run into priority problems too. For example: UI is obvious very low priority (remember: 16ms per frame, and 16ms is an eternity), but what happens if you are ready to render a thread, and you still haven't had a response from the UI thread? Has it run yet, or was there simply no UI for that period? What do you do if you start to render the current frame, and the UI thread finally runs and, guess what, you missed some UI to process? Delay it? Start over from scratch? No real good answers here: Either a FPS drop, or input lag.

You have scheduling issues. Say you have some function that appears to thread well, and you create 8 worker threads to process it. Great, but you still have the Windows scheduler to consider here. The key issue being: You can't continue until all 8 worker threads complete, and if any one gets held up for any reason whatsoever, you come to a screeching halt. So you need to weigh the cost-benefits of using multiple threads over just one, and often, its determined by how much work is being done. If the work is trivial, it often isn't worth it to thread.

Then of course there are the idiotic arguments by some people here. My favorite was in response to these two images:

crysis3_cpua_jungle_1024.png


and

crysis3_cpua_human_1024.png


Obviously, the first is threaded better, because the FX-8350 does better then it does in the second, right?

Wrong. You don't code different levels, you code the engine. The FX-8350 does better simply because whatever environment is being processed goes through a part of code that happens to scale well. [Likely the code that was offloaded from the GPU, if i had to guess]. The other levels (Post-Human and the Roof of all Evil) favor Intel, again, simply due to what is being processed. The idea different stages are coded any different (or at all) is laughable, but its what people put out to try and prove their points.

So please, as the only person here who's actually coded and developed on multihreaded software, enough. Creating threads is TRIVIAL, its getting those threads to run in some coherent way thats impossible.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's basically what a Xeon Phi is. A 61 core Pentium (modified P54C with 512bit SIMD units) on a card with 8GB GDDR5 and 30MB cache. I don't know how good they are but they cost an arm and a leg. $4000 ea.
 
part of original post:


pd cpu cores typically have 4.2 ghz max turbo clockrate and 3-3.6 ghz base clockrate. the jaguar cores in the consoles can have 2.3-2.5 ghz max. jaguar is also different from pd cpus in terms of target market and usage segment (along with underlying architecture). to 'intel' it down for you: it's like comparing atom cpus with core i5 3570k.
besides, what you really said was a blanket statement. there was no mention of piledriver in your original post. i could easily accuse you of lying. :D

additionally, 8x jaguar cores can outperform a core i3 3220 in certain tasks such as encryption and 2 pass video encoding - workloads that prefer more cores over per-core performance. 8 cores outperforming 2 cores or 2+2 cores isn't really a fair comparison. fair to core i3, i mean. :D
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Indeed. Benchmarks show an 8 core Jaguar will easily beat an i5/i3 at similar clock speeds in anything multi-threaded. And that's without the benefit of GDDR5 which would put the scores even higher.

http://www.anandtech.com/show/6974/amd-kabini-review/3

We don't know what AMD is charging for these things but with the console being $399 it can't be much over $200. That's an enormous amount of processing power in there.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Ok so you have similar CPU performance with the i5, but with a $200 budget your money is gone and you're left with Intels iGPU. Now how well will that compete with an HD 7870? Do you REALLY want a console with an HD 4600 for graphics?
 

8350rocks

Distinguished


Well, considering the PS4 APU is actually 2.0+ GHz and not 1.6...you'd need to increase the numbers by roughly 25% then double them.

Looks like you're still not there...

Stop trying already...you're using bad/old information to attempt to create a false equivalency.
 

8350rocks

Distinguished


Except for AMD's PS4 patents, which mention a maximum clockspeed of 2.75 GHz...or did you not know that already? It's only been on the web for 60-90 days now. Though I suppose with your head so buried in Intel propaganda, you likely never paid attention.
 

8350rocks

Distinguished


No, the actual patents for the APU in PS4 list a maximum clockspeed of 2.75 GHz.

http://vr-zone.com/articles/sony-ps4-dev-kit-passes-fcc-filing-2-75-ghz-max-core-clock-listed/45606.html

There's the scoop from vr-zone from 16 July this year
 

8350rocks

Distinguished


Really, did you file for the FCC licenses? Do you have the paperwork in front of you? Do you work for AMD?

The answer to all of the above is no...

Stop quoting old articles and stuff that is the minority that flies AGAINST the vast majority of articles. One of which directly quotes the FCC paperwork, which your article clearly has not seen.

STFU about RAM speed...that is not the topic of discussion. The APU was filed for with a maximum CPU CLOCKSPEED OF 2.75 GHz reconcile that however you must, but THAT is what AMD told the FCC in THEIR OWN DOCUMENTS.

/End of discussion about i5 vs. PS4 APU

EDIT: Let me help you since you hadn't bothered to read it:

A Sony PlayStation 4 dev kit just passed through FCC with a previously unheard of high 2.75 GHz max core clock speed listed in the product description. What clock speed will the CPU finally be set to run at?

Sony PS4 dev kit FCC filing Sony PS4 dev kit passes FCC filing, 2.75 GHz max core clock listed

Sony-PS4-dev-kit-FCC-2.75-GHz-max-clock.jpg



Thanks to FCC, we are getting to have a closer look at the internal components of a Sony PlayStation 4 developer kit, and it sure is spicy in there. Up until now, from the first time when the PS4 was revealed to the world, we were of the solid impression that the 8-core Jaguar based AMD CPUs ticking away at the heart of every PS4 are running at 1.6 GHz core frequency. The FCC filing reveals that the maximum core clock of the PS4 CPU is 2.75 GHz, a number that we completely hadn’t even dreamed off. Why so? Because Jaguar architecture is built for maximum efficiency around the 2-2.4 GHz mark. A 2.75 GHz core clock speed would require much higher power with a disproportionately lesser increase in performance over, say, 2 GHz. The listing does specify “max clock frequency” as 2.75 GHz, are we looking at “Turbo Core” here? If so, it would mean that individual PS4 cores can clock as high as 2.75 GHz when a task or game is less multi-threaded and depends more on fewer cores with faster performance. Bear in mind, the listing also reveals that the core should always function between a temperature range of 5C to 35C. We know for a fact that the PS4 will be launched in India later this year, perhaps we should pass a memo containing max temperatures of our country’s capital (48C), just in case that slipped out of their minds.

Sony PS4 dev kit FCC 2.75 GHz max clock Sony PS4 dev kit passes FCC filing, 2.75 GHz max core clock listed

Source: FCC | via Engadget


Read more: http://vr-zone.com/articles/sony-ps4-dev-kit-passes-fcc-filing-2-75-ghz-max-core-clock-listed/45606.html#ixzz2d5uU7Y8j
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


I think you don't understand what hUMA and NUMA are.

Imagine an integrated chip with CPU + GPU + DSP... all accessing memory at the same time, what to serve first ? ..a priority scheme can mix different priorities for any of the "processing" elements, any of the those elements can be "the" faster or the slower depending on workload priorities... that is, REAL access times can vary widely for sure, even for sections of the same workload... and is not because of this that anything breaks... a remote link would only add more latency, but could alleviate contention, since its not all for the same pool, but several pools( in the end it could be faster).

ccNUMA is not at the link level, its at the PHY level. Its not the HT links that are "cc", its the protocol that support link PHY controls that have the feature. For ccNUMA both ends of a link must have ccNUMA aware PHY controllers, yet the same exact link, the same physical connection, that same HTX slot, could have a card of some kind without being "cc" of any kind. For hUMA the same thing, both ends of a link must have hUMA aware PHY controllers... HTX and the link physical layer and the data transmission layer, just connects and transmits, nothing else. Like PCIe every transaction is "negotiated", its up to both ends PHYs on a link (PCIe as example, PCIe v3 capable links can have PCIe v2 cards, similar with HTX) to negotiate the kind of transmission.

Similarly PCIe could be "cache coherent", just had the needed extensions to the protocol, and support for those at the cache controllers... which could be sooner than expected by intel hurry.

HTX breaks hUMA as much as NUMA.. traffic going Xbar to the IMC, would go Xbar+I/O link -> I/O link+Xbar to the IMC (integrated memory controller), being the I/O part kind of orthogonal, the real magic is at the links PHYs and cache controllers.

I think you are pulling issues out for controversy... without a clue of what you are talking about.



Just for curiosity what is the link from where you've taken that part you quoted ? ... (point is... just don't take a confused IT journalist as authority, don't propagate mistakes)



I think Semiaccurate has an imake of an AMD chip, an APU and supposedly Steamroller based , were the quite more than probable DRAM device chips were "on package", that is , inside the socket ( exactly like Intel Iris)

But lets forget about GDDR5 for now http://www.brightsideofnews.com/news/2013/6/1/amd-updates-roadmaps2c-shows-28nm-kaveri-for-socket-fm22b.aspx

And **NO**, droppin 2 pools of memory like intel, for the same chip, has nothing to do with breaking hUMA ... i know , you are out to get me, and don't trust anything i say or present, fine with me.. it has to do with costs, its no wonder that top Iris part is supposed to be ~$600

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. This was explained to you minutes ago. Do you read? As I said Intel was discharged soon in the race because couldn't provide the needed technology.

As I said that cpu run circles around your i5. Benchmarks show that an 4-core jaguar is faster than an i3 3220. But this is unimportant because no cpu in the market can compete with the gpu regarding physics, AI, and other performance intensive computations. Even a small part of the gpu has more performance than an i7-3770k. This was explained to you minutes ago. Do you read?

It is funny that you criticize the CPU in the PS4 as weak, and then claim that "compared to the 100 gflops you get on decent processors", when the CPU in the PS4 has more than 100 GFLOP.

It is also funny that you claim that "Jaguar didn't improve much", when it is currently ahead of anything that Intel can provide.

Once again. Do you read?
 

8350rocks

Distinguished



:rofl:

Just curious, where does it say that it isn't the CPU?
 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860


no once except back in feburary.

http://ps4daily.com/2013/02/playstation-4-cpu-runs-at-2-ghz-rumor/

dev kits are faster than consumer product, running at 2.75ghz CPU clock.

 
Status
Not open for further replies.