# How "Accelerators" work?

#### Mjl3434

##### Distinguished
Question:

My question is derived from a question I had about the limitations of an AGP 4x bus vs. an AGP 8x bus. Originally I wanted to know "What's the fastest video card that I can run given the limitation of a 4x AGP slot?" But that got me thinking about this long discussion here:

Basically I have a very rudimentary (and maybe incorrect) understanding of how a graphics accelerator works but I think it's like this: The CPU has graphics instructions which it could do it it's own but since there's heavy load on the processor for graphics intensive computations the 'acceleration' model of processing them was developed where the instructions are offloaded onto the graphics card, though a bus like the PCI, AGP, or PCIe bus and sent to the card. The card then has several parallel high-speed floating point calculation pipelines inside the GPU which connect to the on-board graphics card memory. So there's another bus there--the bus between the graphics card's GPU and the graphic's card's memory. I'm confused on how this all works because it seems like there's a severe mismatch between the bus bandwidth.

The AGP bus is a 32-bit, 66Mhz bus, and at AGP 4x it is quad pumping, so that amounts to 32 bits * 66 Mhz * 4 / (8bits/byte) = 1056 MB/s bandwidth.

A GeForce 6 6800 Ultra has a 256-bit interface, running at 366.66Mhz of GDDR3, which amounts to 256 bits * 366.66 Mhz * 3 / (8bits/byte) = 35,200 MB/s or 35.2 GB/s of bandwidth.

So here already it seems like you've got a 35.2 GB of bandwidth on a card with a GPU running at something like 400 Mhz, which is all connected to this slow AGP 4x bus. I know these cards are old now but at the time around 2002 or 2004 this was a perfectly sensible combination. So what am I missing here? What's the explanation on why the graphics cards have so much power and bandwidth compared to the relatively slow buses that they run on? How are instructions offloaded to the GPU accelerator and returned back to the processor?

Even modern GPUs like the 192 GB/s of memory bandwidth GeForce GTX 580 still run on the PCIe 16x slot which has a bandwidth of 16GB/s.

Clearly I know just enough to be dangerous. Can someone please straighten me out on what is important, how accelerators work in general, and what the various bottlenecks would be in different situations? Am I looking at the wrong metrics here? It seems like I dont quite understand how the frames are drawn and how they get from instructions and data on a CPU & main memory through the PCIe bus to the GPU and GPU memory and back, and then eventually to your monitor.

Back-story:

I have an Intel D850EMV2 motherboard which has an "AGP connector supporting 1.5 V 4X AGP cards only." A while back I had my GeForce 6800 Ultra burn out and I needed to replace it in a hurry so the only AGP card I could find to replace it was a GeForce FX5500 which sadly was slower. I'm saving up to buy a new system but recently noticed that prices on some of these old parts had come down to the point of where I don't mind spending \$20 for another 1/2GB of memory. So It got me thinking about re-upgrading my video card. So my original question is what's the fastest video card that I can run given the limitation of a 4x AGP slot.

#### hunter315

##### Champion
The larger bandwidth is between the GPU and the Graphics RAM which is on the GPU itself, it does not travel over the PCI-E bus so the mismatch in bandwidths isnt real.

The phrase "graphics accelerator" hasnt been used in a while since graphics are rarely done by the CPU at all anymore its entirely done on the Graphics card which has dedicated hardware which is much faster at dealing with it, it then exports the data directly to the screen without passing it through the CPU again.

#### Mjl3434

##### Distinguished
Well that makes sense. There is one thing that still doesn't however. It seems like the software of a first person shooter for example, would still need (at least) two components: one which is the 3D representation of the world itself (i.e. there's a dude here, a building there, etc) and a second component which is responsible for taking this 3D representation and deciding what is visible and drawing the actual vertices, lines, textures, effects etc before pumping it out to the screen. I'm pretty sure that second component would be done on the graphics card itself but I'm not sure where that first component would be done. Ultimately some form of software needs to be run in the CPU before the data is sent to the GPU hence the bus.

Also if the mismatch between the bandwidths is not real then what determines when we need to upgrade the bus to something faster? Why not just use modern graphics cards with the ISA bus from the 80s (16 bits and 4Mhz)?

It seems somewhat logical that the CPU speed and indirectly the system memory performance would affect how fast you can send "whatever you send to the graphics card" before it goes off and renders images. Is this correct?

Also then what determines the relationship between the graphics card's bus and the graphics card you should choose? I.e. my original question of what is the fastest graphics card for an AGP 4X bus that's not too fast (pay more for extra performance which is wasted)?

#### hunter315

##### Champion
The graphics card memory stores the textures and some of the model information, the position information and most of the physics calculations of where things will go to is done on the CPU. The data about where everything is is sent to the graphics card over the bus, the older buses with slower transfer speeds have higher latency and arent able to push the data from the CPU and the texture information to the GPU fast enough with low enough latency to allow for smooth game play.

Graphics cards are built for a specific bus as that bus enables additional features, AGP provided much higher bandwidth than PCI which was good for older cards which had smaller memory and needed to call back to the system memory more often. PCI-E provides 75W over the bus itself which allows for more powerful cards that dont need an extra power connector. It also allows for more powerful workstation level cards which perform some of the harder processing on the card. Most fluid dynamics simulation are offloaded to workstation graphics cards, this results in much larger throughput across the PCI-E bus than simple gaming would.

While we tend to be deluded into beliving that games stress out a system, if you really want to stress out a system have it try to do a real time highly detailed simulation in solidworks or something similar, that will crush your average system and is why companies will pay for multithousand dollar workstations.

As for your original question, the fastest card you can get for it is an HD 3870, you may be able to find a 4650 or 4670 for less which are AGP 4x/8x, they werent originally made that way but some manufacturers made versions like that.