News Raja Koduri Leaves Intel to Found Software Start-Up

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
The key difference is that Intel needs iGPUs for its CPUs. If it doesn't design them internally, then they'll have to license IP from someone else. That makes their graphics division somewhat safer than the other things you mentioned. Especially if tGPU come to more closely resemble dGPUs in their design and operation.
They've had iGPUs without AXG just fine for over 20+ years. And they've licensed the building blocks for their iGPUs for a good chunk of those and only as of late they went in-house with them. I think they sourced them from Imagination Technologies originally? In any case, point is they don't really need a "GPU division" or a "proper" discrete GPU to keep on haulin' iGPUs and even HPC monsters. They can just shelve those or sell whatever IP they've produced for it and keep what they need for iGPUs and HPC.

EDIT: But I guess it's now impossible to "sell" AXG since it no longer exists, lel.

Regards.
 

bit_user

Titan
Ambassador
They've had iGPUs without AXG just fine for over 20+ years.
Those couldn't scale up - that takes work. Even if they don't have to scale as big, they still needed to scale better than before, and you're going to need a team to design & maintain them.

And they've licensed the building blocks for their iGPUs for a good chunk of those
They've licensed patents, not the actual designs. Very different.

I think they sourced them from Imagination Technologies originally?
The PowerVR GPUs they used appear to have been limited to Atom-based phone and tablet products:


In any case, point is they don't really need a "GPU division"
Call it what you want, but they need a team designing GPUs from 32 EU up to the 384 EU that's rumored to show up in some Arrow Lake models. That's well beyond anything they've historically done.
 
  • Like
Reactions: cyrusfox

DaveLTX

Commendable
Aug 14, 2022
104
66
1,660
Those couldn't scale up - that takes work. Even if they don't have to scale as big, they still needed to scale better than before, and you're going to need a team to design & maintain them.


They've licensed patents, not the actual designs. Very different.


The PowerVR GPUs they used appear to have been limited to Atom-based phone and tablet products:



Call it what you want, but they need a team designing GPUs from 32 EU up to the 384 EU that's rumored to show up in some Arrow Lake models. That's well beyond anything they've historically done.
which really puts into perspective what AMD has been doing : despite not adopting a shared L3 cache unlike intel with the CPU on the Vega and RDNA2 iGPUs they still managed to beat Intel significantly
even though RDNA2 in iGPU is slightly different from discrete GPUs it still manages to scale from 2CU to 80CU
 
  • Like
Reactions: bit_user
Those couldn't scale up - that takes work. Even if they don't have to scale as big, they still needed to scale better than before, and you're going to need a team to design & maintain them.


They've licensed patents, not the actual designs. Very different.


The PowerVR GPUs they used appear to have been limited to Atom-based phone and tablet products:



Call it what you want, but they need a team designing GPUs from 32 EU up to the 384 EU that's rumored to show up in some Arrow Lake models. That's well beyond anything they've historically done.
These semantic inaccuracies you wrote are a bit weird to me.

I used "building blocks" intentionally. I don't know what you understand as a "building block", but it is not a "complete design". Swing and miss on that one, sorry.

Ironically, "PowerVR" is a full fledged GPU design (EDIT: to clarify, I meant they used a flavour of SGX-something or prior model), so it means they actually sourced that whole GPU externally; kind of what they did with that experimental AMD+Intel packaged thing with... Kaby Lake was it? Anyway, it was not the thing I was mentioning. Intel sourced Shader patents (as building blocks) for their overall iGPUs uArchs for years. To a degree, ARC's design is a derivation from that.

As for "scaling up". Well, again, you don't need a full "GPU department" if you'll just stick it to the core building blocks and just pack more of them in a single silicon piece, I'd say? I'll concede this may be way too oversimplified, but I know I'm not terribly off the mark. The difference would be how they intend on attacking HPC going forward using EMIB and tiles, where the GPU is just one element of many on the package; again, not needing a "GPU department".

Welp, time will tell.

Regards.
 
Last edited:
Don't think too hard about "marketing logic". If a name sounds good, they're likely to use it.
Yeah, it could leave you scratching your head until it bleeds.
FWIW, HD 5970 wasn't called Hemlock - the GPU's codename was Tahiti. That's a fine code name, but I don't think it works as well as Polaris, with respect to marketing and selling GPUs.
I'm afraid that you're incorrect on both counts. Tahiti was the name of the HD 7970 / R9 280X GPU. I know this for a fact because I owned two of them (traded one for a mint reference HD 5870). I replaced them with the Fiji-based R9 Fury.

I also know for a fact that the HD 5000-series was the evergreen series because I know that my HD 5870 is codenamed Cypress. Then there's the article by Hilbert Hagedoorn that compared the HD 5970 with the GTX 295 and has perhaps the funniest description of just how badly one card out-performs another. I knew when I read this that I would never forget it.

The article Name is HIS Radeon HD 5970 Introduction - The hemlock is here... and the paragraph is:
"HAWX is very lenient towards ATI cards thanks to DX10.1, but with so much brute power, NVIDIA has been working nervously to optimize their drivers. And as such the more recent drivers make the GeForce cards much more competitive. As such the GTX 295 pushes 42 FPS at 2560x1600 with 4xAA (on average), but the Radeon HD 5970 brutally sodomizes the GTX 295 here with nearly doubled up performance."

That kind of verbage is something that I have never seen in any other tech article before or since. I remember being thankful that I wasn't drinking anything when I read it or I might have needed a new keyboard! :LOL:
 

DaveLTX

Commendable
Aug 14, 2022
104
66
1,660
These semantic inaccuracies you wrote are a bit weird to me.

I used "building blocks" intentionally. I don't know what you understand as a "building block", but it is not a "complete design". Swing and miss on that one, sorry.

Ironically, "PowerVR" is a full fledged GPU design, so it means they actually sourced that whole GPU externally; kind of what they did with that experimental AMD+Intel packaged thing with... Kaby Lake was it? Anyway, it was not the thing I was mentioning. Intel sourced Shader patents (as building blocks) for their overall iGPUs uArchs for years. To a degree, ARC's design is a derivation from that.

As for "scaling up". Well, again, you don't need a full "GPU department" if you'll just stick it to the core building blocks and just pack more of them in a single silicon piece, I'd say? I'll concede this may be way too oversimplified, but I know I'm not terribly off the mark. The difference would be how they intend on attacking HPC going forward using EMIB and tiles, where the GPU is just one element of many on the package; again, not needing a "GPU department".

Welp, time will tell.

Regards.
What Arc had in very similar to GCN however was a hallmark of GCN... Poor memory access and requires high occupancy to fill up the shaders

https://chipsandcheese.com/2022/10/20/microbenchmarking-intels-arc-a770/ Its not entirely wrong to call it a GCN continuation as a joke
 
  • Like
Reactions: bit_user
What Arc had in very similar to GCN however was a hallmark of GCN... Poor memory access and requires high occupancy to fill up the shaders

https://chipsandcheese.com/2022/10/20/microbenchmarking-intels-arc-a770/ Its not entirely wrong to call it a GCN continuation as a joke
Hm... I can't say I've looked at ARC's uArch in much detail, but I thought they re-organized the "OG" shaders into different blocks and made them more aligned to how they wanted the scaling to be in the work groups, no? This is to say, it's actually closer to the original execution units (EUs) from before ARC than GCN's grouping, no?

As for how they manage memory... I find that one a tad ironic as Intel supposedly has most good interfacing patents for memory controllers... Maybe not for GDDR?

Anyway, I'll give that link a read.

Thanks and regards.
 

bit_user

Titan
Ambassador
These semantic inaccuracies you wrote are a bit weird to me.

I used "building blocks" intentionally. I don't know what you understand as a "building block", but it is not a "complete design". Swing and miss on that one, sorry.
Then why don't you come out and say exactly what you claim they licensed and when? I'm not interested in playing games, here. The information I provided says they licensed whole PowerVR GPUs for their Atom-based products, and that's it. If you know differently, I'd like to know about it and what are your sources.

Intel sourced Shader patents (as building blocks) for their overall iGPUs uArchs for years.
Patents are not a design. Licensing patents only give you a right to use certain techniques. You still have to implement them by creating your own hardware design, unless you also license the design. In that case, it's primarily the implementation you're licensing and the patent rights come along with it.

As for "scaling up". Well, again, you don't need a full "GPU department" if you'll just stick it to the core building blocks and just pack more of them in a single silicon piece, I'd say?
That's like saying CPU cores are just some registers and ALUs stuck together. If it's so simple, why is Intel having so much difficulty at it? Why has no one else managed to successfully enter such a lucrative market before them, in the past 20 years?

Have you ever written a graphics program using an API like OpenGL or Direct 3D? I have, and those APIs are orders of magnitude more complex than anything else I've ever used. GPUs have literally their own programming languages (HLSL or GLSL) for the code that runs on the GPU, and a very complex API for managing the resources and data structures and chaining together the shader code. You need teams of people to write and maintain those APIs and tools, as well as to port them to new hardware incarnations and optimize them for new games. And that was before raytracing and AI acceleration came onto the scene.

Remember how long it's taken the mighty Intel, just to optimize that software for its Alchemist GPUs, in spite of already having implemented and optimized it for their iGPUs over many years. Did you not read about how poorly the MTT S80 runs, in spite of using a hardware design licensed from Imagination? Probably most of that is down to software.

Although it might seem straight-forward to build a GPU, did you notice how long it's taking the 10 different GPU efforts in China to mount a credible threat to AMD and Nvidia, in spite of the fact that both AMD and Nvidia had design centers in Shanghai for like 15 years, which means there must be hundreds or thousands of Chinese engineers with first-hand knowledge about at least some aspects of GPU design?

To give you an idea of just how much complexity you're referring to as "building blocks", here's the 234-page GLSL specification for programs that have to run efficiently on a GPU:



Here's the 281-page data format specification:



Here are the 829-page OpenGL core profile and 1027-page compatibility profile specifications:



Here's an index of the 558 different ARB OpenGL extensiosns:



Here's the 2332-page Vulkan specification + non-vendor extensions:



Oh, and here's the AV-1 specification, because these things have to encode and decode video, too.



Moving on to GPU-compute, here's the 328-page OpenCL core specification, the 253-page OpenCL kernel language specification, and the 397-page extension specification:



Of course, there are more codecs, APIs (including maybe something from Microsoft... DirectSomethingOrOther), oneAPI, and Intel's own optimized libraries.

If you think this doesn't take a whole department to implement, maintain, optimize, test, and extend, not to mention building hardware that runs it efficiently, then I guess I'd have an easier time explaining the size of a car company to someone who's never even looked under the hood/bonnet.
 
Last edited:

bit_user

Titan
Ambassador
I'm afraid that you're incorrect on both counts. Tahiti was the name of the HD 7970 / R9 280X GPU. I know this for a fact because I owned two of them (traded one for a mint reference HD 5870). I replaced them with the Fiji-based R9 Fury.
Okay, my bad. Must've been tired and flipped a bit (5 -> 0b101; 7 -> 0b111).

I also know for a fact that the HD 5000-series was the evergreen series
Yes, you're right. Sorry for wasting your time, but I gather you enjoyed the opportunity to recant that fun little quote.
😅
 
  • Like
Reactions: Avro Arrow

DaveLTX

Commendable
Aug 14, 2022
104
66
1,660
Hm... I can't say I've looked at ARC's uArch in much detail, but I thought they re-organized the "OG" shaders into different blocks and made them more aligned to how they wanted the scaling to be in the work groups, no? This is to say, it's actually closer to the original execution units (EUs) from before ARC than GCN's grouping, no?

As for how they manage memory... I find that one a tad ironic as Intel supposedly has most good interfacing patents for memory controllers... Maybe not for GDDR?

Anyway, I'll give that link a read.

Thanks and regards.

It certainly operates a lot closer to GCN than it does to Xe/UHD. Maybe UHD was always just as bad... ironically even with a subpar architecture AMD's engineers still found a lot to tune out from Cezanne/Renoir/Raven Ridge while Xe was a case of throw more silicon at the problem but AMD knowing GCN won't scale, waited for RDNA2 to integrate (RDNA1 didn't achieve good occupancy therefore poor silicon/power efficiency although it was already reasonably far from GCN at that point)

AMD's engineers are far more concerned with efficiency with every added feature than just pure efficiency... somehow it worked out in the end (Jim keller mentioned this, the original goals for Zen is that for every performance percent added, there needs to be less power drawn in percent in the end)
 

bit_user

Titan
Ambassador
Maybe UHD was always just as bad...
The thing about Intel's GPUs that never added up for me is that they kept such a narrow SIMD width. Just 4-way! They also went dual-issue, while (AFAIK) everyone else has stayed single-issue. Increasing the issue rate to add a second 4-way SIMD pipeline added a lot more overhead than simply widening their SIMD would've.
 
Okay, my bad. Must've been tired and flipped a bit (5 -> 0b101; 7 -> 0b111).


Yes, you're right. Sorry for wasting your time, but I gather you enjoyed the opportunity to recant that fun little quote.
😅
No worries, I didn't take offence. It's useless knowledge anyway.

It's true though, any time that I get to remember Hilbert talking about one video card "brutally sodomizing" another video card I do crack a grin at how delightfully odd that concept is.
 
Then why don't you come out and say exactly what you claim they licensed and when? I'm not interested in playing games, here. The information I provided says they licensed whole PowerVR GPUs for their Atom-based products, and that's it. If you know differently, I'd like to know about it and what are your sources.


Patents are not a design. Licensing patents only give you a right to use certain techniques. You still have to implement them by creating your own hardware design, unless you also license the design. In that case, it's primarily the implementation you're licensing and the patent rights come along with it.


That's like saying CPU cores are just some registers and ALUs stuck together. If it's so simple, why is Intel having so much difficulty at it? Why has no one else managed to successfully enter such a lucrative market before them, in the past 20 years?

Have you ever written a graphics program using an API like OpenGL or Direct 3D? I have, and those APIs are orders of magnitude more complex than anything else I've ever used. GPUs have literally their own programming languages (HLSL or GLSL) for the code that runs on the GPU, and a very complex API for managing the resources and data structures and chaining together the shader code. You need teams of people to write and maintain those APIs and tools, as well as to port them to new hardware incarnations and optimize them for new games. And that was before raytracing and AI acceleration came onto the scene.

Remember how long it's taken the mighty Intel, just to optimize that software for its Alchemist GPUs, in spite of already having implemented and optimized it for their iGPUs over many years. Did you not read about how poorly the MTT S80 runs, in spite of using a hardware design licensed from Imagination? Probably most of that is down to software.

Although it might seem straight-forward to build a GPU, did you notice how long it's taking the 10 different GPU efforts in China to mount a credible threat to AMD and Nvidia, in spite of the fact that both AMD and Nvidia had design centers in Shanghai for like 15 years, which means there must be hundreds or thousands of Chinese engineers with first-hand knowledge about at least some aspects of GPU design?

To give you an idea of just how much complexity you're referring to as "building blocks", here's the 234-page GLSL specification for programs that have to run efficiently on a GPU:



Here's the 281-page data format specification:



Here are the 829-page OpenGL core profile and 1027-page compatibility profile specifications:



Here's an index of the 558 different ARB OpenGL extensiosns:



Here's the 2332-page Vulkan specification + non-vendor extensions:



Oh, and here's the AV-1 specification, because these things have to encode and decode video, too.



Moving on to GPU-compute, here's the 328-page OpenCL core specification, the 253-page OpenCL kernel language specification, and the 397-page extension specification:



Of course, there are more codecs, APIs (including maybe something from Microsoft... DirectSomethingOrOther), oneAPI, and Intel's own optimized libraries.

If you think this doesn't take a whole department to implement, maintain, optimize, test, and extend, not to mention building hardware that runs it efficiently, then I guess I'd have an easier time explaining the size of a car company to someone who's never even looked under the hood/bonnet.
Ugh... too much text...

I'll apologize right off the bat as I could not find any hard sources for where Intel got their licenses from to build their graphics solutions over the years, but instead I'll give you this:

https://www.computer.org/publications/tech-news/chasing-pixels/intels-gpu-history

That is still an interesting read and will probably give you some of the answers you wanted out of me, from the context provided in that history lesson.

As for everything else: yes, building any piece of silicon is hard; thanks for reminding me, I guess... And I can't tell you exactly what experience I have with technology given what I do and who I work for, so apologies for that.

I'll stop here but just reinforce my point: Intel does not need a dedicated division for GPUs. All the building blocks for what they need out of a GPU can be done in other already existing units within their Company. Should they have a dedicated department for GPUs, well, they already tried it so I doubt they'll try again anytime soon. At least, not until mr Pat finds a way to make it work the way he wants. Raja was clearly not the right person and hindsight is always 20/20 (or it should?), so there's that.

Regards.
 
It certainly operates a lot closer to GCN than it does to Xe/UHD. Maybe UHD was always just as bad... ironically even with a subpar architecture AMD's engineers still found a lot to tune out from Cezanne/Renoir/Raven Ridge while Xe was a case of throw more silicon at the problem but AMD knowing GCN won't scale, waited for RDNA2 to integrate (RDNA1 didn't achieve good occupancy therefore poor silicon/power efficiency although it was already reasonably far from GCN at that point)

AMD's engineers are far more concerned with efficiency with every added feature than just pure efficiency... somehow it worked out in the end (Jim keller mentioned this, the original goals for Zen is that for every performance percent added, there needs to be less power drawn in percent in the end)
An interesting read, so thanks for the link.

Although, I'll have to disagree that GCN is closely related to how ARC's Alchemist was conceived. They did add interesting things to it, but Raja's team didn't solve the original problems from the UHD EU's going by the writing, so that makes me believe I wasn't completely off the mark by saying it's just a continuation of the original EU's way of working (shader-wise and how they implement the FPUs across them). This is explained in they "occupancy" segment of the read, which I believe is the crux of ARC's issues now.

At a high level, I do see why you'd say it's similar to GCN though. It was a highly optimistic way of grouping the shader units, for sure, but Raja missed the mark, much like with GCN, on how to best to keep all units busy.

Regards.
 

bit_user

Titan
Ambassador
instead I'll give you this:

https://www.computer.org/publications/tech-news/chasing-pixels/intels-gpu-history

That is still an interesting read and will probably give you some of the answers you wanted out of me, from the context provided in that history lesson.
Nope. Doesn't support your claim. That's all I'm interested in.

I've read Intel's iGPU whitepapers, and their evolution is plain to see from the wikipedia comparison article I linked.

As for everything else: yes, building any piece of silicon is hard;
The funny thing is my post focused mostly on the vast amount of software a GPU needs to support, to be viable for both rendering and compute, not to mention AI and media codecs.

That much is clear to see. Of course, you can't readily appreciate all that goes into implementing, optimizing, testing, and porting that software to multiple generations. However, it's still more transparent to us than what goes into designing the hardware that has to efficiently implement that functionality.

I can't tell you exactly what experience I have with technology given what I do and who I work for, so apologies for that.
I don't care about credentials, as anyone can claim anything on the internet. I care only about information that's supported by high-quality sources.

If you're not sure whether you can back up your claims, then you should caveat them appropriately, so that you don't misrepresent hearsay or "foggy recollection" as provably-correct assertions.

I'll stop here but just reinforce my point: Intel does not need a dedicated division for GPUs.
Repetition is not reinforcement. It's clear that you're well out of your depth.
 

bit_user

Titan
Ambassador
I'll have to disagree that GCN is closely related to how ARC's Alchemist was conceived. They did add interesting things to it, but Raja's team didn't solve the original problems from the UHD EU's going by the writing, so that makes me believe I wasn't completely off the mark by saying it's just a continuation of the original EU's way of working (shader-wise and how they implement the FPUs across them).
This was clear to me, way back when I read that their vector engines still implemented SIMD at just x8 granularity. Not only that, but if you followed their shader compiler patches, it was clear that the ISA was largely the same as before (aside from the register scoreboarding change). This indicated it was very much an evolutionary design, rather than a clean-sheet/ground-up redesign.

As I said above, I think their narrow SIMD width is a liability, particularly for power-efficiency. ChipsAndCheese found it to be a bottleneck in another respect:
"Each Xe Core’s L1 cache therefore has to arbitrate between requests from eight “Send” ports. Each of those “Send” ports is arbitrating between requests from two Vector Engines. Contrast that with Nvidia SM, where the L1 only has to handle requests from four SMSPs, or AMD’s CU, where the L1 only has to handle requests form two SIMDs. Intel’s Xe Core has a very complex load/store system because of the small subdivisions within it, and complex things are harder to do well."​

This is explained in they "occupancy" segment of the read, which I believe is the crux of ARC's issues now.
Requiring high occupancy is a symptom of a weak architecture, but one that can have multiple causes.

Raja missed the mark, much like with GCN, on how to best to keep all units busy.
On the basis of when he joined and how much legacy is still found in it, I actually wonder how much blame he deserves for this. It could be that he went along with most of the design decisions the team had already made, with the intention of launching DG2 (Alchemist) much sooner. I believe it was Intel's original intention to make these on their 10 nm process, and switching to TSMC would've introduced further delays. Had he known the ultimate launch date, going into it, he probably would've make a few more changes to Alchemist.

It didn't help that the team was also juggling Ponte Vecchio, and that might actually have commanded more of their time & attention.
 
Nope. Doesn't support your claim. That's all I'm interested in.

I've read Intel's iGPU whitepapers, and their evolution is plain to see from the wikipedia comparison article I linked.


The funny thing is my post focused mostly on the vast amount of software a GPU needs to support, to be viable for both rendering and compute, not to mention AI and media codecs.

That much is clear to see. Of course, you can't readily appreciate all that goes into implementing, optimizing, testing, and porting that software to multiple generations. However, it's still more transparent to us than what goes into designing the hardware that has to efficiently implement that functionality.
Well, I know they had; I just can't find publicly available information for you. So I'll just leave it at that.

I don't care about credentials, as anyone can claim anything on the internet. I care only about information that's supported by high-quality sources.
Er... "Have you ever written a graphics program using an API like OpenGL or Direct 3D? I have, and those APIs are orders of magnitude more complex than anything else I've ever used"...

Don't forget what you type/say so soon...

If you're not sure whether you can back up your claims, then you should caveat them appropriately, so that you don't misrepresent hearsay or "foggy recollection" as provably-correct assertions.


Repetition is not reinforcement. It's clear that you're well out of your depth.
I am sure of those claims, I just (again) can't find any publicly available source for it.

This was clear to me, way back when I read that their vector engines still implemented SIMD at just x8 granularity. Not only that, but if you followed their shader compiler patches, it was clear that the ISA was largely the same as before (aside from the register scoreboarding change). This indicated it was very much an evolutionary design, rather than a clean-sheet/ground-up redesign.

As I said above, I think their narrow SIMD width is a liability, particularly for power-efficiency. ChipsAndCheese found it to be a bottleneck in another respect:
"Each Xe Core’s L1 cache therefore has to arbitrate between requests from eight “Send” ports. Each of those “Send” ports is arbitrating between requests from two Vector Engines. Contrast that with Nvidia SM, where the L1 only has to handle requests from four SMSPs, or AMD’s CU, where the L1 only has to handle requests form two SIMDs. Intel’s Xe Core has a very complex load/store system because of the small subdivisions within it, and complex things are harder to do well."​
What do you mean by "implemented SIMD at just x8 granularity"? Are you referring about FP width itself (type of operations they support)? Simultaneous FP ops? They've been going back and forth on the width on how they "resolve" them, so I wouldn't say that's a dead-giveaway?

I guess it is the latter as it is implied in the next sentences.

Requiring high occupancy is a symptom of a weak architecture, but one that can have multiple causes.
Oh really? Well, colour me impressed on your insight, LOL.

On the basis of when he joined and how much legacy is still found in it, I actually wonder how much blame he deserves for this. It could be that he went along with most of the design decisions the team had already made, with the intention of launching DG2 (Alchemist) much sooner. I believe it was Intel's original intention to make these on their 10 nm process, and switching to TSMC would've introduced further delays. Had he known the ultimate launch date, going into it, he probably would've make a few more changes to Alchemist.

It didn't help that the team was also juggling Ponte Vecchio, and that might actually have commanded more of their time & attention.
If you have junior/senior Architects under you saying "we should do this", and you come from a background with, supposedly, abundant expertise on what's what, then you are responsible for ensuring past mistakes are not repeated. Raja was absolutely responsible for those and it was understood in his title. I am not giving him the benefit of the doubt on architectural decisions gone wrong, as Intel hired him because of that experience; or so they thought?

As for whether or not he actually knew about them, is another topic, which revolves around competency and I have no idea about as I've never worked with him.

Regards.
 

bit_user

Titan
Ambassador
Er... "Have you ever written a graphics program using an API like OpenGL or Direct 3D? I have, and those APIs are orders of magnitude more complex than anything else I've ever used"...

Don't forget what you type/say so soon...
I don't think trying to gauge someone's level of knowledge counts the same as credential-brandishing. If you'd said you had, then we cold have a more involved discussion about these APIs. If not, then you can disregard my claim of experience, but you're still left with the API links to judge for yourself. If I had been trying to play a game of credentials, I wouldn't have backed my statements with any evidence because I'd simply be expecting you to accept my authority on the matter.

What do you mean by "implemented SIMD at just x8 granularity"?
That each vector engine is 8-wide, like the EUs of previous generations. This is a lot narrower than the 32-wide that RDNA and recent Nvidia GPUs are both using.

Oh really? Well, colour me impressed on your insight, LOL.
I meant that in the vein of comparing Alchemist with GCN, on the mere basis that they both require high occupancy. They can have the same sensitivity to occupancy for different reasons - it doesn't make the architectures similar, on its own.

It was more of a general comment on the topic, than meant in direct answer to what you said.

I am not giving him the benefit of the doubt on architectural decisions gone wrong, as Intel hired him because of that experience; or so they thought?
Whoever sits at the top gets the blame and the credit. Also, the fat compensation. So, he's ultimately responsible, no matter what. However, we gain no insight if we simply pin all the failures on his head and move on. I think it's worth looking at how Alchemist ended up looking the way it does. Perhaps you disagree.

I think the misfortune of Alchemist cannot be seen entirely independent of Intel's 10 nm woes. The more interesting part is to consider how that might've shaped it and what effects it's had.
 

DaveLTX

Commendable
Aug 14, 2022
104
66
1,660
This was clear to me, way back when I read that their vector engines still implemented SIMD at just x8 granularity. Not only that, but if you followed their shader compiler patches, it was clear that the ISA was largely the same as before (aside from the register scoreboarding change). This indicated it was very much an evolutionary design, rather than a clean-sheet/ground-up redesign.

As I said above, I think their narrow SIMD width is a liability, particularly for power-efficiency. ChipsAndCheese found it to be a bottleneck in another respect:
"Each Xe Core’s L1 cache therefore has to arbitrate between requests from eight “Send” ports. Each of those “Send” ports is arbitrating between requests from two Vector Engines. Contrast that with Nvidia SM, where the L1 only has to handle requests from four SMSPs, or AMD’s CU, where the L1 only has to handle requests form two SIMDs. Intel’s Xe Core has a very complex load/store system because of the small subdivisions within it, and complex things are harder to do well."​


Requiring high occupancy is a symptom of a weak architecture, but one that can have multiple causes.


On the basis of when he joined and how much legacy is still found in it, I actually wonder how much blame he deserves for this. It could be that he went along with most of the design decisions the team had already made, with the intention of launching DG2 (Alchemist) much sooner. I believe it was Intel's original intention to make these on their 10 nm process, and switching to TSMC would've introduced further delays. Had he known the ultimate launch date, going into it, he probably would've make a few more changes to Alchemist.

It didn't help that the team was also juggling Ponte Vecchio, and that might actually have commanded more of their time & attention.
The amazing thing is that someone I was debating with insisted that the occupancy issues could be fixed with drivers
Sure, everything bad can be fixed with drivers
 

bit_user

Titan
Ambassador
The amazing thing is that someone I was debating with insisted that the occupancy issues could be fixed with drivers
Sure, everything bad can be fixed with drivers
I still find it amusing to hear people talk about "drivers" as if they're just one thing. That one umbrella is used to encompass everything from the actual device driver, to userspace Direct3D modules, their shader compiler, game-specific optimizations, and even GPU firmware.

On Linux, the Mesa driver is separate from the kernel driver. Firmware blobs are yet another thing. This provides a bit more visibility into what's happening where. At least, if you use the open source components.

Regarding performance, the open source components typically perform very close to (and sometimes even better than) the proprietary driver package, for AMD. Intel has no proprietary graphics driver package, as far as I'm aware. Nvidia's open source situation is improving, but I think still far from usable for anything serious. That situation arose because Intel and AMD are largely responsible for doing the main open source development to support their hardware, whereas Nvidia went the proprietary route and basically starved the open source effort of any information or support. For about the past year, Nvidia has now been following the Intel/AMD playbook and developing their own open source driver, again pushing aside the community-led Nouveau effort.
 
Last edited: