1) regarding decompression. What I first stated was a memory-wall. Decompression isn't itself the bottleneck. It's the memory latency and bandwidth itself.I know a fair bit about FPGAs and GPU programming. What I asked for was data. If you can't provide any recent data, then there's nothing more to discuss on the matter.
Not exactly what you're talking about, but a related example involving GPU processing of compressed data is how they support compressed textures!
Depends on the size of the tables, if using table-based compression. There are some decompression benchmarks where AMD's X3D processors smoke anything else, just because they happen to be able to fit the whole tables in their L3 cache. FPGAs don't usually have that much on-die memory.
You seem so sure of this, but you can't cite a single HPC paper or reference that indicates decompression is a bottleneck?
Being able to do computation on compressed data is a huge win in over coming the memory wall.
I suspect the interesting middle comments on compression speeds and on textures are related to transforms and not computations.
We can transform data (compress it) but like a one way cryptographic hash it is rarely computable.
In fact we might say compressed data is a one way hash. It can be reversed only if you know the algorithm. But any compute to it will irreversibly lose information. In the same way any computation on a hash will irreversibly destroy the data, you can't hash 2 and 2 and add them and compute a hash that is equal to 4. For instance.
Anyway, when referring to data about FPGAs I'm simply confused what you're exactly asking for? I wouldn't say there's a benchmark comparison or something.
So rather I'm logically articulating where and why GPUs are strong, and where and why FPGAs are strong. Or even other types of accelerators.
And then hypothesizing based on what I know about programming orbital computations why we would choose one accelerator over another.
I'd want a circuit that can best parallelize the data. But it won't be a bunch of vector transforms...
It won't be a bunch of independent variables all needing computation acted on them.
As stated before, orbits especially are very interative. Iteration 1 must complete before iteration 2 begins.