AMD Piledriver rumours ... and expert conjecture

Page 158 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
I wouldn't trust any HW level encoding right now. Reviews how shown that QS and GPGPU / CUDA based encoding engines have inferior quality when compared to a straight CPU based software engine. Maybe in another generation or two they'll finally come into their own, but for now it's only useful for very casual conversions of home made material.
 
Going to GCN next, making a few more changes allows AMD some wriggle room down the road, as their power usage looks very good.
Its a bit better than what theyd originally said overall, which is also good to see, and to jimmy, read what cleve said in the comments as to why no IVB

I understand why but they could have still included it and put a disclaimer instead of just leaving it out. I want to see the numbers stack up since IB will be Trinitys main competition, not SB.

http://images.anandtech.com/graphs/graph5831/46668.png

I would have expected that intel's 22nm idle power would be able to keep up ... thats the same size battery. oddly enough ivy is near the bottom of the list all the time. I thought Ivy was supposed to be great for mobile, guess not.

It is still a 45w CPU vs a 35w CPU. I wouldn't mind seeing the 35w IB or even the 17w IB vs SB or each other.

I wouldn't trust any HW level encoding right now. Reviews how shown that QS and GPGPU / CUDA based encoding engines have inferior quality when compared to a straight CPU based software engine. Maybe in another generation or two they'll finally come into their own, but for now it's only useful for very casual conversions of home made material.

Post some links. I would like to see QS 2.0 image quality comparisons as from what I have read, QS 2.0 is faster and better quality.
 
I understand why but they could have still included it and put a disclaimer instead of just leaving it out. I want to see the numbers stack up since IB will be Trinitys main competition, not SB.



It is still a 45w CPU vs a 35w CPU. I wouldn't mind seeing the 35w IB or even the 17w IB vs SB or each other.



Post some links. I would like to see QS 2.0 image quality comparisons as from what I have read, QS 2.0 is faster and better quality.


Jimmy i find that hard to believe i mean Intel's Quick sync is still not better quality over using the plain CPU. But then again i don't like any other program but handbrake and I've tried a lot of them.
 
Jimmy i find that hard to believe i mean Intel's Quick sync is still not better quality over using the plain CPU. But then again i don't like any other program but handbrake and I've tried a lot of them.

I meant faster than QS 2.0 even when using better quality settings.

It might be still less but I would like to see if its something major enough to see (as its stated that QS 1.0 is fine for mobile devices but not TVs) or if its just something so small its near impossible to actually see.
 
I would like to see QS 2.0 image quality comparisons as from what I have read, QS 2.0 is faster and better quality.

You've heard wrong.

I'll dig for the article, it was about comparing modern GPU (QS was tossed in with the GPU crowd) vs CPU encoding. They used multiple encoding suites and compared their outputs. All the GPGPU / CUDA / QS based engines had noticeably less quality the CPU software based engines. This isn't a knock against Intel (although you seem to be defending WS while ignoring the rest) it's a knock against all the engines that seem to be sacrificing quality for speed.

This is also prevalent in the 3D rendering world. Compare a scene rendered in HW DX11 / OpenGL vs that same scene rendered with a ray-tracing engine. Rendering in HW is significantly faster, ray-tracing is better quality.
 
I would like to see QS 2.0 image quality comparisons as from what I have read, QS 2.0 is faster and better quality.

You've heard wrong.

I'll dig for the article, it was about comparing modern GPU (QS was tossed in with the GPU crowd) vs CPU encoding. They used multiple encoding suites and compared their outputs. All the GPGPU / CUDA / QS based engines had noticeably less quality the CPU software based engines. This isn't a knock against Intel (although you seem to be defending WS while ignoring the rest) it's a knock against all the engines that seem to be sacrificing quality for speed.

This is also prevalent in the 3D rendering world. Compare a scene rendered in HW DX11 / OpenGL vs that same scene rendered with a ray-tracing engine. Rendering in HW is significantly faster, ray-tracing is better quality.

I did just find an article on QS vs CPU and it was lower quality. But it was QS 1.0, not 2.0 which in comparison to QS 1.0 is better quality while still faster.

As for ray-tracing, it is very beautiful. But there are still some things that rasterization (currently done in games) still does better. I think the future of gaming will be a mix of both.

that means the future of encoding could be a hybrid of something like QS/CPU/CUDA whatever as each will have their ups and downs.

And the reason why I ignore the rest is because nothing AMD or nVidia has thrown out has beat QS in terms of speed while the image quality is about the same. Until then I don't see them as direct competition to QS but more like AMDs version of Thunderbolt, a alternate product thats not quite as good.

http://www.anandtech.com/Show/Index/5771?cPage=4&all=False&sort=0&page=21&slug=the-intel-ivy-bridge-core-i7-3770k-review#

They had a good example. Out of all of them, the HD7970 is the worst. Its blurry and looks washed out. IB is better than SB but also looks a bit more detailed than just x86.
 
Found it for you

http://www.extremetech.com/computing/128681-the-wretched-state-of-gpu-transcoding



Some articles are fun to write. This wasn’t one of them. Nearly four years after Badaboom debuted and a year after Intel launched Quick Sync, it’s amazing to see how poorly these applications balance hardware compatibility, file size, and image quality.

While it’s true that Xilisoft, Arcsoft, and Cyberlink all offer some degree of support for custom profiles, programs like MediaCoder and Handbrake already handle the under-the-hood options for the DIY crowd, and they do it for free. The entire point to buying a software solution in this field is that you’re paying for the front end. You pay someone else to worry about whether or not CABAC is enabled, or how many reframes should be used, or what H.264 profile is optimal. The minute you have to click on “Custom Profile,” the value argument for these products collapses, and you might as well be using something free with better presets and reliable final output.

With Badaboom gone and AMD silent on when it might offer an updated GPU encoding tool, there’s little good happening on the GPU encoding front. Nvidia recently debuted a new GPU encoding engine, but after seeing the quality of the available software tools today, we’re not holding our breath on when you’d be able to make real use of it. Being able to hypothetically encode video is one thing, actually yielding results that are practically useful is something else.

For now, use Handbrake for simple, effective encodes. Arcsoft or Xilisoft might be worth a look if you know you’ll be using CUDA or Quick Sync and have no plans for any demanding work. Avoid MediaEspresso entirely.

What it boils down to is that HW assisted encoding sacrifices quality / file-size (their the same thing) for speed. You can make a high quality render but it will be larger using a HW assisted method (as THG's Andrew ran into) then a pure CPU method, or you can make it small but less quality. This makes it work for home rendering projects, home made videos and what not, but absolutely useless for professional media or anything of importance. The anime I watch all has fansubs added to them, when I the subers about using QS / CUDA / GPGPU their answer (unanimously across the board from unrelated forms) was not no, but HELL NO. They would rather it take the extra hour or two overnight to convert it from RAW then deal with unpredictable quality or size. Most of them moving to 10-bit AVC 5.2 didn't make it any easier.
 
I did just find an article on QS vs CPU and it was lower quality. But it was QS 1.0, not 2.0 which in comparison to QS 1.0 is better quality while still faster.

As for ray-tracing, it is very beautiful. But there are still some things that rasterization (currently done in games) still does better. I think the future of gaming will be a mix of both.

that means the future of encoding could be a hybrid of something like QS/CPU/CUDA whatever as each will have their ups and downs.

And the reason why I ignore the rest is because nothing AMD or nVidia has thrown out has beat QS in terms of speed while the image quality is about the same. Until then I don't see them as direct competition to QS but more like AMDs version of Thunderbolt, a alternate product thats not quite as good.

You ignored them because they were not your priority, my swipe at the sad state of ALL HW based consumer media encoding as taken as an attack on Intel, one that you felt obligated to defend.

The comparison to RT was simple, RT will always produce the most realistic scenes, it's literally following the paths light would take vs using quick and dirty shortcuts to simulate what light would of done. RT is computationally expensive and thus not an option for real-time rendering, aka video games. I'll bet you my salary that 3D production studios use RT when they make their movies. They spend millions of high powered render farms for the specific purpose of making those movies using RT and their doing it for a very good reason, quality over everything else.

For the same reason, media studios won't be using QS / GPGPU / CUDA or whatever HW flavor-of-the-month based engines for their rendering. When their real money and reputation on the line, sacrificing quality for speed is not an option. One QS'd video to one client is all it takes to sink your reputation.

And like I said, in a few generations this should change. Eventually everyone will figure out what works and what doesn't and you'll get hybrid based engines that utilize HW enhancements (whatever their called) to produce high quality video. Until that happens I'd advice anyone working on a serious project to stay away from any and all HW based encoding engines.
 
You ignored them because they were not your priority, my swipe at the sad state of ALL HW based consumer media encoding as taken as an attack on Intel, one that you felt obligated to defend.

The comparison to RT was simple, RT will always produce the most realistic scenes, it's literally following the paths light would take vs using quick and dirty shortcuts to simulate what light would of done. RT is computationally expensive and thus not an option for real-time rendering, aka video games. I'll bet you my salary that 3D production studios use RT when they make their movies. They spend millions of high powered render farms for the specific purpose of making those movies using RT and their doing it for a very good reason, quality over everything else.

For the same reason, media studios won't be using QS / GPGPU / CUDA or whatever HW flavor-of-the-month based engines for their rendering. When their real money and reputation on the line, sacrificing quality for speed is not an option. One QS'd video to one client is all it takes to sink your reputation.

And like I said, in a few generations this should change. Eventually everyone will figure out what works and what doesn't and you'll get hybrid based engines that utilize HW enhancements (whatever their called) to produce high quality video. Until that happens I'd advice anyone working on a serious project to stay away from any and all HW based encoding engines.

Actually no I was just interested in reading the articles you have claiming that QS 2.0 has worse image quality. If you want to put it that way, fine.

And I am sure they all use RT. Hell Intel was pushing it big time when they were planning on Larrabee and I am sure they will push it with KC if its used for that purpose. I never said RT was not better looking but for gaming, RT and rasterization as a mix will probably be best.

And you said the same thing I did which is that at some point it will be a hybrid of QS/CUDA etc and x86 or whatever works best.

I looked around and it looks like no one has done an in depth review on QS 2.0 like has been done with QS 1.0.

It is still an interesting idea as for non-studio people, like ourselves, it is a major time saver and overall Youtube wont need the super perfect quality. Hopefully AMD and nVidia get theirs to perform faster to push each other, much like I am wanting AMD to actually push Intel again, so we get to said point faster.
 
For those who didn't want to read through the above link, it's a review of modern consumer media encoding packages.

QS / CUDA / GPGPU are not engines, their just processors that have special functions built it. It's up to the engine programmer to build a good engine that can talk to and handle those processors. The article lamented on how most of those packages were complete crab and that all the accelerators (QS included) didn't live up to their hype and that software CPU was better in the end.

Handbrake use's x264 as it's engine, which is why it doesn't' support QS / CUDA / GPGPU yet. The x264 team has been wrapping their minds around how to best utilize QS / CUDA / GPGPU support, if you read their developer newsletters you can see them talking about it. Eventually they'll make a good engine that supports QS / CUDA / GPGPU / ect. in some functions but use's the CPU for others, that is when we'll actually be able to utilize those technologies.
 
Found it for you

http://www.extremetech.com/computing/128681-the-wretched-state-of-gpu-transcoding



Some articles are fun to write. This wasn’t one of them. Nearly four years after Badaboom debuted and a year after Intel launched Quick Sync, it’s amazing to see how poorly these applications balance hardware compatibility, file size, and image quality.

While it’s true that Xilisoft, Arcsoft, and Cyberlink all offer some degree of support for custom profiles, programs like MediaCoder and Handbrake already handle the under-the-hood options for the DIY crowd, and they do it for free. The entire point to buying a software solution in this field is that you’re paying for the front end. You pay someone else to worry about whether or not CABAC is enabled, or how many reframes should be used, or what H.264 profile is optimal. The minute you have to click on “Custom Profile,” the value argument for these products collapses, and you might as well be using something free with better presets and reliable final output.

With Badaboom gone and AMD silent on when it might offer an updated GPU encoding tool, there’s little good happening on the GPU encoding front. Nvidia recently debuted a new GPU encoding engine, but after seeing the quality of the available software tools today, we’re not holding our breath on when you’d be able to make real use of it. Being able to hypothetically encode video is one thing, actually yielding results that are practically useful is something else.

For now, use Handbrake for simple, effective encodes. Arcsoft or Xilisoft might be worth a look if you know you’ll be using CUDA or Quick Sync and have no plans for any demanding work. Avoid MediaEspresso entirely.

What it boils down to is that HW assisted encoding sacrifices quality / file-size (their the same thing) for speed. You can make a high quality render but it will be larger using a HW assisted method (as THG's Andrew ran into) then a pure CPU method, or you can make it small but less quality. This makes it work for home rendering projects, home made videos and what not, but absolutely useless for professional media or anything of importance. The anime I watch all has fansubs added to them, when I the subers about using QS / CUDA / GPGPU their answer (unanimously across the board from unrelated forms) was not no, but HELL NO. They would rather it take the extra hour or two overnight to convert it from RAW then deal with unpredictable quality or size. Most of them moving to 10-bit AVC 5.2 didn't make it any easier.

Interesting read.

But it doesn't state what version of QS is being used. I looked and didn't see if they were using say a 2600K or a 3770K. That was what I wanted to see. I can't assume QS 2.0 because of the date on the article.

Still thanks for the link.
 
For those who didn't want to read through the above link, it's a review of modern consumer media encoding packages.

QS / CUDA / GPGPU are not engines, their just processors that have special functions built it. It's up to the engine programmer to build a good engine that can talk to and handle those processors. The article lamented on how most of those packages were complete crab and that all the accelerators (QS included) didn't live up to their hype and that software CPU was better in the end.

Handbrake use's x264 as it's engine, which is why it doesn't' support QS / CUDA / GPGPU yet. The x264 team has been wrapping their minds around how to best utilize QS / CUDA / GPGPU support, if you read their developer newsletters you can see them talking about it. Eventually they'll make a good engine that supports QS / CUDA / GPGPU / ect. in some functions but use's the CPU for others, that is when we'll actually be able to utilize those technologies.

The Handbrake took the OpenCl road (see link in previous post). With consistent standards, I assume this will mean that only some of the functions will be offloaded to GPU. Anand said that with OpenCl encoding, bitrate took a back seat with the same file size. Guess we'll have to wait for proper reviews once it is released to the public.

For handbrake, the majority of the encoding is done on the CPU (see link) and only some of the instructions are offloaded to the GPU. (I think this is nice, but I want to see benchies, dammit).

:)
 
Interesting read.

But it doesn't state what version of QS is being used. I looked and didn't see if they were using say a 2600K or a 3770K. That was what I wanted to see. I can't assume QS 2.0 because of the date on the article.

Still thanks for the link.


I must say using that crappy software that i hate so much ivy is the best looking one with a slight very small improvement.

Look at the ground on the movie clip.

http://www.anandtech.com/show/5771/the-intel-ivy-bridge-core-i7-3770k-review/21

 
Interesting read.

But it doesn't state what version of QS is being used. I looked and didn't see if they were using say a 2600K or a 3770K. That was what I wanted to see. I can't assume QS 2.0 because of the date on the article.

Still thanks for the link.

Wouldn't of matters the version of QS used. Your assuming that I believe the problems to be in QS / CUDA / GPGPU, their not. The problems are in the engines that are used to do the encoding / decoding and muxing / demuxing of the material. Software developers haven't figured out what does and doesn't work and most are trying to market those HW accelerators as a "speed" improvement "my product can encode X video Y times faster then product Z" and thus their engines are producing crap quality videos. You can get better quality by manually tweaking the settings and increasing the bitrate, that just blows the file size up and defeats the entire purpose of using HW accelerators.

Right now there is no decent engine out there that supports QS, Media Encoder is close but still not good enough. The best known encoder is going with OpenCL (thanks Amdgirl) as their preferred standard, we can assume then that once it goes live it will become the standard across the net.

Another item of note, and this is something I can't provide a link for unfortunately, is that the types of functions exposed through GPGPU / OpenCL / CUDA / QS are not sufficient to do all the encoding work. Many of the complex math models used to search, index and optimize video data are not easily portable to a non-CPU standard (yet). You end up having to simplify the code in order to make it compatible with using a HW based accelerator. This creates yet another problem, those HW based accelerators have different memory models then the CPU itself (with the exception of QS) and typically you can't run both within the same set of code. This is where AMD's recent addition to Trinity with HSA comes in. Trinity can process machine code that references graphics memory directly along with the GPGPU engine being able to run code that references CPU memory directly. This means you can write a single function and have it execute a call to the GPU, return a value and immediately plug that value into the CPU and vice-versa. No need to change modes or copy things into and out of memory manually. I don't expect this to become common until Intel implements it, but we're now one step closer to utilizing GPUs and CPU's together.
 
Like i said i'm quite proud of those improvements The A10 should be priced at Low-end I5's and I3's(600-680$). At this price this APU will be a killer but anything higher then that it will not be a killer.

If amd can make me a 4.2Ghz 8 core Piledriver with L3 and price it at the I5 i'm in! I see a 7% improvement in IPC(on Average compared to BD) and a 43% increase in CPU frequency(Compared to the A8 3500M) that made the APU 20% faster overall compared to the A8 3500M in CPU tasks.

I agree with noob2222 i would like to see a 4 core BD clock down to 2.3Ghz and compared to this Chip. That should give us a better clue on improvements for Piledriver. Because unlike Llano-Trinity BD-PD can't be clock 43% higher which would leave a smaller improvement. Even if they can clock their CPU to 4.2Ghz its only 17% higher then the 8150. Let me try to take a guess of PD improvements if they can clock their CPU to 4.2Ghz.

Trinity to PD will probably be 7-10% faster per clock on average because of L3 cache. Trinity looks to have 7% better IPC compared to Bulldozer. The CPU will also probably be clock at least 10% higher.

So i expect the new 8 core PD to be around 15% faster then BD at launch with the possibility of 20%(all based on average performance not "upto" statements) but nothing greater. But its also probably going to be 10-20% more efficient per watt as well.

So to sum it up that's my last guess based on today's reviews and yesterdays news.
BD-PD will be 15-20% faster overall and 10-20% more efficient.

Does anyone else have any estimates i would love to here them! Try to be realistic.

I will be more conservative on the DT predictions regarding Vishera, I am going for a 10-15% improvement across the board. I will go for a 15% improvement on the heat and power numbers. Overall on synthetics I believe it will be very competitive with Intels SB/IB derivatives. As for the Trinity APU's, I am going to be more aggressive and say around 20% across the board improvement.

The good news is, I will probably be replacing my Thuban now. 😀
 
vce... i was a bit disappointed with it. months after it was available, amd only enabled it for trinity's release and it was slower than quick sync (without hybrid mode). however, it could change with more software support, drivers and desktop trinity chips. i am comparing among fixed function encoders btw, not including cpu based software encoding which seems to offer the best quality right now. too bad neither at or toms discuss filesize or image quality. if they did i've missed it. i hope toms does an in-depth analysis into all the fixed function encoders available today e.g. quick sync 1 and 2, vce and nvenc and pit them against cpu based encoding and gpu based cuda and opencl.
 
the interesting thing is looking at the numbers, it seems sandra is very "internal". meaning each test you do tests that very specific section. The tests I did and toms don't show the cache latency that plagued BD. I have a suspicion that Cinebench is a bit sensitive to latency, wich is why it showed such a massive improvement even while lacking l3 cache. 30% from first glance.

kinda funny I tried running single thread at 2.3 ghz (no turbo) ... it was sooo slow it didn't even register a rating 😱

Im going to go out on a limb here and say 10% ipc on the cores, 10-20% improvement on the cache latency (hopefully more, but 10-20 is pushing it imo).

This means most all programs will benefit 10%, some 20% and some that utilize all available resources, possibly 30%.

As for AM3+ boards getting bios update, pretty much all 900 series should be getting support for PD as it seems the 1000 series boards were put on hold (probably till steamroller)
 
the interesting thing is looking at the numbers, it seems sandra is very "internal". meaning each test you do tests that very specific section. The tests I did and toms don't show the cache latency that plagued BD. I have a suspicion that Cinebench is a bit sensitive to latency, wich is why it showed such a massive improvement even while lacking l3 cache. 30% from first glance.

kinda funny I tried running single thread at 2.3 ghz (no turbo) ... it was sooo slow it didn't even register a rating 😱

Im going to go out on a limb here and say 10% ipc on the cores, 10-20% improvement on the cache latency (hopefully more, but 10-20 is pushing it imo).

This means most all programs will benefit 10%, some 20% and some that utilize all available resources, possibly 30%.

As for AM3+ boards getting bios update, pretty much all 900 series should be getting support for PD as it seems the 1000 series boards were put on hold (probably till steamroller)

Do you have access to a Trinity APU? If so could you run CB11.5, then run it in single mode (takes awhile to finish). Then use processor affinity and run it in single mode again, record the results. Use HWInfo64 to see what the CPU reports it's speed at during the whole thing.
 
Here's an extract from TechReport regarding VCE and QS2.0 and QS1.0:

Both VCE and QuickSync appear to halve transcoding times... except the latter looks to be considerably faster. We didn't see much of a difference in output image quality between the two, but the output files had drastically different sizes. QuickSync spat out a 69MB video, while VCE got the trailer down to 38MB. (Our source file was 189MB.) Using QuickSync in high-quality mode extended the Core i7-3760QM's encoding time to about 10 seconds, but the resulting file was even larger—around 100MB. The output of the software encoder, for reference, weighed in at 171MB.

http://techreport.com/articles.x/22932/8

Cheers!

EDIT: Take a look at the game benchies... The FPS's the HD4k gives don't tell the whole story, indeed.
 
I would like to see QS 2.0 image quality comparisons as from what I have read, QS 2.0 is faster and better quality.

You've heard wrong.

I'll dig for the article, it was about comparing modern GPU (QS was tossed in with the GPU crowd) vs CPU encoding. They used multiple encoding suites and compared their outputs. All the GPGPU / CUDA / QS based engines had noticeably less quality the CPU software based engines. This isn't a knock against Intel (although you seem to be defending WS while ignoring the rest) it's a knock against all the engines that seem to be sacrificing quality for speed.

This is also prevalent in the 3D rendering world. Compare a scene rendered in HW DX11 / OpenGL vs that same scene rendered with a ray-tracing engine. Rendering in HW is significantly faster, ray-tracing is better quality.

To be fair, thats because the features not implemented in Rasterization are implemented naturally as part of any Ray Tracing engine [refraction/reflection of light, for example]. The rendering equation, when fully implemented, is the only method for true photo-realism, but Ray Tracing is MUCH easier to implement overall then the full rendering formula would be.

I also note if you want to go into the whole "quality" argument, one could point out to minute quality differences between NVIDIA and AMD, simply due to differences in the implementation of DX. Implementation matters.
 
Status
Not open for further replies.