Discussion CPU instruction set explanation thread

NorbertPlays · Feb 8, 2024

Order 66 said:
It does work though? How much of a performance penalty is it?

First, I think you may be misunderstanding the "code written for a CPU does run on a GPU" to mean you can just pass CPU code to as GPU and have it execute. I guess it would have been more accurate to say "the algorithm from CPU code can be translated to a GPU equivalent", but the actual opcodes and stuff, no.

As to performance impact, how long is a piece of string? What did the original code do? How has it been translated?

Order 66 · Feb 8, 2024

NorbertPlays said:
First, I think you may be misunderstanding the "code written for a CPU does run on a GPU" to mean you can just pass CPU code to as GPU and have it execute. I guess it would have been more accurate to say "the algorithm from CPU code can be translated to a GPU equivalent", but the actual opcodes and stuff, no.

As to performance impact, how long is a piece of string? What did the original code do? How has it been translated?

I am misunderstanding, that was the main reason I created this thread, so that I can understand what people are talking about. Can you at least give a rough estimate for how much the performance impact would be for running something like cinebench r23 on the GPU?

NorbertPlays · Feb 8, 2024

The list of instructions (aka opcodes) that a CPU can understand are completely different to ones a GPU understands; there's some overlap, and in principle you can replicate the functionality between the two, but it's like, say, English and Japanese. They can both achieve the same thing - allowing people to communicate - but the vocabularies and grammar are completely different, and there's not much one-to-one equivalent between a lot of the words in one language to the other; a simple word in English may require a whole sentence in Japanese or vice versa.

Order 66 · Feb 8, 2024

NorbertPlays said:
The list of instructions (aka opcodes) that a CPU can understand are completely different to ones a GPU understands; there's some overlap, and in principle you can replicate the functionality between the two, but it's like, say, English and Japanese. They can both achieve the same thing - allowing people to communicate - but the vocabularies and grammar are completely different, and there's not much one-to-one equivalent between a lot of the words in one language to the other; a simple word in English may require a whole sentence in Japanese or vice versa.

Interesting, I didn't think about it like that.

Order 66 · Feb 8, 2024

I feel possibly stupid that I just realized that AVX is the main thing that allows CPUs to process the vector workloads that are responsible mainly (what I think of when I hear vector, anyway) for graphics.

bit_user said:
Keep in mind that people started doing computer graphics before specialized hardware existed for it. The first example of a ray-traced image was done on a VAX, like 45 years ago.

Um, how? I understand that with the right instruction sets, raytracing can be done on CPUs, but I don't understand how it was done in any reasonable amount of time. Also, how does that work, considering it would have had to have been written from the ground up, because it didn't exist at the time? I have so many questions with regards to how things were done in a reasonable amount of time (raytracing in movies before dedicated RT hardware comes to mind) before specialized hardware was a thing.

NorbertPlays · Feb 8, 2024

Nobody said it was done in a reasonable amount of time. Sometimes you just need patience...

Order 66 · Feb 8, 2024

NorbertPlays said:
Nobody said it was done in a reasonable amount of time. Sometimes you just need patience...

Is there a way to do that in a browser? I would love to attempt raytracing on the lowly intel celeron CPU I'm using right now.

NorbertPlays · Feb 8, 2024

There's this?

Order 66 · Feb 8, 2024

Order 66 said:
Is there a way to do that in a browser? I would love to attempt raytracing on the lowly intel celeron CPU I'm using right now.

I used the webGL raytracing demo

WebGL Ray Tracer Demo - Liam Gray

I get about 200ms at max settings

palladin9479 · Feb 8, 2024

TerryLaze said:
The fault in that is that 4k already is too much resolution for a comfortable desktop experience so going above that will not be met with a lot of acceptance, at least that's what I think, I have a hard enough time looking even at 1080p without reading glasses.
1440 is the sweet spot I believe for most people and GPUs are already capable of pushing enough pixels at that resolution without AI, sure it's a good thing for cheaper cards and people that want to save money but for the high end it's not going to add anything.

Now, couch gaming is a different thing but that's way more console territory than it is PC.

2160p desktop displayers are now common and manufacturers have started marketing 4320p as "the next thing". If your needing glasses then you need to adjust your scaling percentage. You can make a 2160p desktop look identical to 1440 or 1080 by just adjusting that scaling value.

As for couch gaming, I am in the process of upgrading my 5600g HTPC platform to the new 8600g AM5 one. Lots of fun stuff is done on that system.

palladin9479 · Feb 8, 2024

Order 66 said:
It does work though? How much of a performance penalty is it?

Several orders of magnitude. GPU's simply do not general purpose scalar logic well, the only reason those functions exist to for High Level Shader Language (HLSL) support.

Order 66 · Feb 8, 2024

palladin9479 said:
High Level Shader Language (HLSL) support.

What exactly is HLSL used for?

palladin9479 · Feb 8, 2024

Order 66 said:
What exactly is HLSL used for?

It's what it's name says, High Level Shader Language. Shaders are miniature programs that get loaded into the GPU and used to create special effects. Usually the game designers will build a bunch but you can also inject them into the rendering pipeline. The modding community frequently makes shaders to do cool stuff.

https://reshade.me

It's even used by the retro gaming community to dynamically change a programs graphics API.

General - Dege's stuffs

dege.freeweb.hu

Here is a code snippet of a HLSL that enhances ambient light in a game. This is code that gets compiled at runtime and is then run inside the GPU.

Code:

    if (AL_Adaptation)
    {
        //DetectLow
        float4 detectLow = tex2D(detectLowColor, 0.5) / 4.215;
        float low = sqrt(0.241 * detectLow.r * detectLow.r + 0.691 * detectLow.g * detectLow.g + 0.068 * detectLow.b * detectLow.b);
        //.DetectLow

        low = pow(low * 1.25f, 2);
        adapt = low * (low + 1.0f) * alAdapt * alInt * 5.0f;

        if (alDebug)
        {
            float mod = (texcoord.x * 1000.0f) % 1.001f;
            //mod = abs(mod - texcoord.x / 4.0f);

            if (texcoord.y < 0.01f && (texcoord.x < low * 10f && mod < 0.3f))
                return float4(1f, 0.5f, 0.3f, 0f);

            if (texcoord.y > 0.01f && texcoord.y < 0.02f && (texcoord.x < adapt / (alInt * 1.5) && mod < 0.3f))
                return float4(0.2f, 1f, 0.5f, 0f);
        }
    }

Order 66 · Feb 8, 2024

palladin9479 said:
It's what it's name says, High Level Shader Language. Shaders are miniature programs that get loaded into the GPU and used to create special effects. Usually the game designers will build a bunch but you can also inject them into the rendering pipeline. The modding community frequently makes shaders to do cool stuff.

https://reshade.me

It's even used by the retro gaming community to dynamically change a programs graphics API.

General - Dege's stuffs

dege.freeweb.hu

Here is a code snippet of a HLSL that enhances ambient light in a game. This is code that gets compiled at runtime and is then run inside the GPU.

Code:

if (AL_Adaptation) { //DetectLow float4 detectLow = tex2D(detectLowColor, 0.5) / 4.215; float low = sqrt(0.241 * detectLow.r * detectLow.r + 0.691 * detectLow.g * detectLow.g + 0.068 * detectLow.b * detectLow.b); //.DetectLow low = pow(low * 1.25f, 2); adapt = low * (low + 1.0f) * alAdapt * alInt * 5.0f; if (alDebug) { float mod = (texcoord.x * 1000.0f) % 1.001f; //mod = abs(mod - texcoord.x / 4.0f); if (texcoord.y < 0.01f && (texcoord.x < low * 10f && mod < 0.3f)) return float4(1f, 0.5f, 0.3f, 0f); if (texcoord.y > 0.01f && texcoord.y < 0.02f && (texcoord.x < adapt / (alInt * 1.5) && mod < 0.3f)) return float4(0.2f, 1f, 0.5f, 0f); } }

I know that it’s a shader, and I know what a shader does, but I’m always interested in what specific things technology is used for. Thanks for sharing, it’s funny you should mention reshade, I use it.

bit_user · Feb 9, 2024

NorbertPlays said:
View: https://youtu.be/Tm4i6D3XXBQ?si=RMx2oEUvEcu1XUmv

Thanks for sharing! That's so wild to hear how pioneering it was. Especially the part about not winning an Oscar for special effects, because the film industry didn't even understand what they did well enough to truly appreciate it!

I have a copy of TRON: Legacy on blu-ray that I never got around to watching. This inspired me to move it to the top of my queue!

Tron: Legacy (2010) ⭐ 6.8 | Action, Adventure, Sci-Fi

2h 5m | 12

www.imdb.com

bit_user · Feb 9, 2024

palladin9479 said:
Order 66 said:

It does work though? How much of a performance penalty is it?

Click to expand...

Several orders of magnitude.

IMO, this is sometimes fun to think about. If you consider a RTX 4090 is essentially 512 in-order cores, each with SIMD-32 and a scalar pipe, running at about 2.235 GHz, then they ought to be able to execute multi-threaded scalar code pretty well, so long as you can run enough threads (or what Nvidia calls "Warps"). The cost of letting the SIMD units idle is only like 1.5 orders of magnitude - not several.

palladin9479 said:
Shaders are miniature programs that get loaded into the GPU and used to create special effects.

It's not only special effects. If we consider just the rendering pipeline, nearly everything a modern GPU does is implemented via programmable logic. The only fixed-function parts are texture lookups, ROPs, Tessellation (not sure if that's still true, actually), and ray tracing. I don't include tensor cores in that list, since they're essentially just a fancy name for a matrix-product instruction.

palladin9479 said:
Here is a code snippet of a HLSL that enhances ambient light in a game. This is code that gets compiled at runtime and is then run inside the GPU.

Code:

if (AL_Adaptation) { //DetectLow float4 detectLow = tex2D(detectLowColor, 0.5) / 4.215; ...

What's funny is that modern GPUs don't have hardware for things like float4. They operate on them using scalar arithmetic, but what makes it fast is that you're processing like 32 of them at a time, via the magic of SIMD.

bit_user · Feb 9, 2024

Order 66 said:
How long did it take you to learn all of this?

I guess about 30 years. I started reading textbooks & trade magazines about 3D graphics in the mid-90's. The only job I had that ever involved 3D graphics was my first, where we developed and optimized 3D renderers to run on arrays of floating point DSPs, which you could think of as a sort of precursor to modern GPUs. Since then, most of my career involved graphics or video in some respect, but the only chances I had to dabble with 3D or GPU programming were on my personal time. I did get to do some MMX/SSE/AVX programming on the job, at least.

If you manage to find something you're passionate about, it's easy to learn a lot. Once you reach a certain level, you can hang out with other practitioners in the field and learn even more from them.

Order 66 said:
I've been into computers (mainly strictly hardware) for about 5 years, and while I've learned a lot, I realize just how much I still have to learn.

There's multiple lifetimes of stuff to learn, depending on how broad & deep you want to go.

Order 66 said:
my problem is that I get frustrated when I can't figure out how to do something, and when I try researching the problem, if I don't find anything, I kinda just give up. I would love to do it, but I have issues with staying with it. I realize that it's a personal problem, but I still get very frustrated with it.

Well, I think that potentially highlights the value of a good course, which incrementally builds you up.

Also, I think there's no substitute for good foundations. If you master the fundamentals, before moving up, then you're better equipped to devise your own solutions to problems you encounter. Or, at least, you have the capacity to describe the problem, making it easier to search for solutions (and hopefully recognize good or bad ones, along the way).

Order 66 · Feb 9, 2024

bit_user said:
DSPs

DSPs? What does that stand for?I know what AVX stands for, but I have no idea what the MMX or SSE is in this context. I know they are instruction sets, but that’s all I know.

bit_user said:
I guess about 30 years. I started reading textbooks & trade magazines about 3D graphics in the mid-90's. The only job I had that ever involved 3D graphics was my first, where we developed and optimized 3D renderers to run on arrays of floating point DSPs, which you could think of as a sort of precursor to modern GPUs. Since then, most of my career involved graphics or video in some respect, but the only chances I had to dabble with 3D or GPU programming were on my personal time. I did get to do some MMX/SSE/AVX programming on the job, at least.

If you manage to find something you're passionate about, it's easy to learn a lot. Once you reach a certain level, you can hang out with other practitioners in the field and learn even more from them.

There's multiple lifetimes of stuff to learn, depending on how broad & deep you want to go.

Well, I think that potentially highlights the value of a good course, which incrementally builds you up.

Also, I think there's no substitute for good foundations. If you master the fundamentals, before moving up, then you're better equipped to devise your own solutions to problems you encounter. Or, at least, you have the capacity to describe the problem, making it easier to search for solutions (and hopefully recognize good or bad ones, along the way).

I am incredibly passionate about computers and technology in general, which means that learning it is very easy. It’s funny, I know a decent amount about pc hardware despite never having built one myself. The main reason I haven’t is because currently, I don’t want to potentially mess up and get stuck. If only I had the opportunity to have a pc building class where I wouldn’t be risking my own money if I broke something.

bit_user · Feb 9, 2024

Order 66 said:
DSPs? What does that stand for?

In this case:

https://en.wikipedia.org/wiki/Digital_signal_processor

Back in the 1990's, they had a similar status as GPUs currently enjoy. They packed a lot of compute horsepower into a cheaper, embeddable package than your typical CPU of the era (i.e. it was pretty self-contained and didn't need a bunch of other chips around it). The ones we used were somewhat unusual, in that they had 6 link ports that you could use for intercommunication. We connected them in a 2D toroidal topology (think of it like if you drew a grid on the skin of a doughnut) and then used the other two ports for broadcast. They had a special mode where a "master" node could broadcast instructions to all the others, which is how SIMD actually worked. Each node had just 256 kB of SRAM.

Order 66 said:
I know what AVX stands for, but I have no idea what the MMX or SSE is in this context. I know they are instruction sets, but that’s all I know.

They're all vector instruction set extensions to x86. MMX (Multi-Media eXtensions) was the first on the block and was just 64 bits. It operated exclusively on integers and launched on a later iteration of Pentium CPUs, back in 1997. MMX registers were the same as the x87 floating point registers, which caused issues when switching back and forth between MMX instructions and FPU instructions (you had to do an expensive state reset).

SSE was a massive leap ahead and came on the Pentium 3, in 1999. It introduced a new set of registers, supported floating-point arithmetic, and operated at 128-bits. That's how wide the registers were, but we didn't have CPUs (from Intel, at least) that processed the full 128 bits at a time until Core 2.

An interesting fact about AVX is that it was rumored to have been spurred on by Apple. Steve Jobs was said to have flipped out, when he heard that XBox 360 had a 3-core PowerPC CPU, running at like 3.2 GHz. That was way faster than what Macs had, at the time. Meanwhile, Mac CPU performance was falling behind Intel CPUs like Core 2 Duo. However, PowerPC had an advantage in that its AltiVec instructions were superior to SSE. So, AVX supposedly became a negotiating point between Jobs and Intel. I don't know how much of that might've been mere speculation.

https://en.wikipedia.org/wiki/AltiVec#Comparison_to_x86-64_SSE

Order 66 said:
I am incredibly passionate about computers and technology in general, which means that learning it is very easy. It’s funny, I know a decent amount about pc hardware despite never having built one myself. The main reason I haven’t is because currently, I don’t want to potentially mess up and get stuck. If only I had the opportunity to have a pc building class where I wouldn’t be risking my own money if I broke something.

My first job was actually fixing PCs, after school. It didn't last terribly long (mostly due to a management change), but gave me some useful hands-on experience. There was a brief period, at my first professional job, where I felt a bit overwhelmed and I toyed with the idea of quitting software development and going back to just fixing PCs. I'm glad I didn't, though.

Anyway, maybe you could find some cheap, older hardware to play with on ebay or at a local flea market. The first PC I built myself was mostly from parts I bought at a computer fair. Good times!

Another idea, if you have lots of time on your hands, would be to see about getting involved in a charity that refurbishes PCs for education and low-income folks. I know charities like that exist, so you could search around for something like that in your area and see if they could use any help.

JeffreyP55 · Feb 9, 2024

Order 66 said:
I was looking at this thread, https://forums.tomshardware.com/threads/intels-latest-lower-powered-cpus-give-ryzen-rivals-a-run-for-their-money-—-core-i9-14900t-beats-ryzen-9-7900-in-geekbench-6-benchmark.3835713/page-2#post-23197213 and while I enjoy watching the debate progress, I don't really understand anything about instruction sets (in this case AVX), or really anything beyond the hardware aspects of PCs in general. I know people will say to google it, but I am really looking for it to be dumbed down a little bit, so that I can grasp the idea in the first place. I think I would enjoy the debate between @TerryLaze and @bit_user if I understood a bit more of what they were talking about.

Start with an 8088 instruction set. 16bit is easier to deal with. My electronics cool school CPU 45 years ago.. LoL

Order 66 · Feb 20, 2024

bit_user said:
The only fixed-function parts are texture lookups, ROPs, Tessellation (not sure if that's still true, actually), and ray tracing.

What exactly are ROPs? I hear the spec on GPUs all the time, but I don't know what it stands for or what it does.

bit_user · Feb 20, 2024

Order 66 said:
What exactly are ROPs? I hear the spec on GPUs all the time, but I don't know what it stands for or what it does.

I had made some assumptions, but I don't actually know.

The first search results I get are either Wikipedia or descriptions that seem ripped from this Wikipedia page:

https://en.wikipedia.org/wiki/Render_output_unit

Another source I found is this, which also defines other elements of the graphics pipeline. Just beware of its age.

A trip through the Graphics Pipeline 2011, part 9

This post is part of the series “A trip through the Graphics Pipeline 2011”. Welcome back! This post deals with the second half of pixel processing, the “join phase”. The pr…

fgiesen.wordpress.com

Order 66 · Feb 20, 2024

bit_user said:
I had made some assumptions, but I don't actually know.

The first search results I get are either Wikipedia or descriptions that seem ripped from this Wikipedia page:

https://en.wikipedia.org/wiki/Render_output_unit

Another source I found is this, which also defines other elements of the graphics pipeline. Just beware of its age.

A trip through the Graphics Pipeline 2011, part 9

This post is part of the series “A trip through the Graphics Pipeline 2011”. Welcome back! This post deals with the second half of pixel processing, the “join phase”. The pr…

fgiesen.wordpress.com

Interesting. Also, what exactly is tesselation? On the topic of CPUs, I've heard that ARM can't do division, (can't remember if it can't at all, or it's just not very good at it) is this true, or was I misinterpreting what I had heard?

Order 66 · Feb 20, 2024

@jnjnilson6, do you have any thoughts to add on this subject? I am curious to know your thoughts as you seem to know a lot about CPUs.

bit_user · Feb 20, 2024

Order 66 said:
Interesting. Also, what exactly is tesselation?

In a GPU rendering pipeline, tessellation is where you have a patch that's mathematically defined and the GPU subdivides it into polygons. It's a more memory-efficient representation and works well for dynamically adjusting the geometric Level of Detail (LoD) based on the resolution and how large an object appears on-screen.

A cool thing about Tessellation is that you can procedurally manipulate the generated vertices, thereby creating rough surfaces (i.e. instead of faking it through the use of bump mapping).

https://learnopengl.com/Guest-Articles/2021/Tessellation/Tessellation

Order 66 said:
On the topic of CPUs, I've heard that ARM can't do division, (can't remember if it can't at all, or it's just not very good at it) is this true, or was I misinterpreting what I had heard?

That's old. Not relevant for current-gen ARM CPUs, or probably any ARM Cortex-series cores from the past couple decades!

ARM has been around since the 1980's. Unlike x86, they've periodically updated their instruction set, while dropping backward compatibility for older versions. The main reason they could get away with doing that is that it hasn't been used in general-purpose computing.

Starting with ARMv8-A (first implemented roughly a decade ago) and AArch64, I think dropping backward compatibility won't be an option. So, just make sure anything you're reading about it is specific to that generation or newer.

Discussion CPU instruction set explanation thread

Proper

Grand Moff

Proper

Grand Moff

Grand Moff

Proper

Grand Moff

Proper

Grand Moff

Splendid

Splendid

Grand Moff

Splendid

Grand Moff

Titan

Titan

Titan

Grand Moff

Titan

Distinguished

Grand Moff

Titan

Grand Moff

Grand Moff

Titan

Share this page