2.3.3.1 Global Data Share (GDS)
The SI-GPU contains a 64 kB global data share memory that can be used by wavefronts of running a kernel on all compute units. This memory enables sharing of data across multiple workgroups. The GDS is configured with 32 banks, each with 512 entries of 4 bytes. It provides full access to any location for use by any wavefront. The GDS also supports 32-bit integer atomic operations to enable fast, unordered atomics. Data can be reloaded from memory prior to kernel launch and written to memory after kernel completion. The GDS block contains support logic for unordered append/consume and domain-launch-ordered append/consume operations through the global wave sync (GWS). These dedicated circuits enable fast compaction of data or the creation of complex data structures in memory.
2.3.2Global Data Share (GDS)
The AMD Sea Islands series of devices uses a 64 kB global data share (GDS) memory that can be used by wavefronts of a kernel on all compute units. This memory enables 128 bytes of low-latency bandwidth to all the processing elements. The GDS is configured with 32 banks, each with 512 entries of 4 bytes each. It provides full access to any location for any processor. The shared memory contains 32 integer atomic units to enable fast, unordered atomic operations. This memory can be used as a software cache to store important control data for compute kernels, reduction operations, or a small global shared surface. Data can be preloaded from memory prior to kernel launch and written to memory after kernel completion. The GDS block contains support logic for unordered append/consume and domain launch ordered append/consume operations to buffers in memory. These dedicated circuits enable fast compaction of data or the creation of complex data structures in memory.
The answer to the FCAT question, FCAT is not the end-all-be-all of frame pacing. There are many provable scenarios where the observed performance is stutter-free, whereas FCAT results suggest that it should be stuttering like crazy. If I recall correctly, Tomb Raider is an instance of this. People should use their own eyeballs vs. relying on an automated test, because the automated testing absolutely does not tell the whole story.
What's clear, however. is that we had an issue with the consistency in frame delivery, and we've largely resolved that problem. The remaining scenarios will be resolved this quarter with a driver update that intelligently and algorithmically normalizes frame times.
So also no real fundamental solution of the problem.
Too poor, that nVidia more like it to blame AMD than to work with them hand in hand to give the developers some API calls so that they can solve the problem with the divergent timeing of the ingametime and the displaytime in there engines -.-
This is one of the reasons why i don't like nVidias company politic. If it helps to sell hardware, they just score off somebody. And that with a longtime plan... The same with there proprietary "standards". They don't care if it is bad for the whole industry or not. They just look if they sell more cards...
I hope AMD don't go the same path in the future. Mantle can be something like this...
TressFX ist the right way, and i really really really hope all the people out there understands soon, that proprietary is no choice in the long term view. Proprietary kills advancement.
I am really interested in OpenCL 2.0. And btw it is not bad to spread out, that the Kepler GPU don't support OpenCL official I think nVidia will repent there decision in the future.