Hi Jarred, did you use FP32 or FP16 for your benchmarks, especially for AMD?
The results you got show only about half of the it/s you can get on AMD 6000 GPUs in my experience, which suggests you used FP32. Is this consistent among all benchmarks or are some using FP32 and some FP16?
As noted in the text, the RX 6000 results are very low, and the same goes for the RTX 40 results. I received an email from Nod.ai stating, "SHARK is currently running tuned models on RDNA3 and untuned models on RDNA2 and we plan to offer similar tuned models for RDNA2 in the future." Whether that's moving from FP32 to FP16 on RDNA 2, or just tuning the algorithm to extract more performance, the net result will be the same: Much better performance.
Also note what is required to get AMD's GPUs working right now. You have to use a less popular project (or run Linux), and you have to use a specific beta driver that's for AI/ML (that has known bugs). If you want to try Automatic 1111's project via Linux, you end up with ROCm stuff and that only supports Navi 21 (possibly Navi 31 now, though I haven't checked), which eliminates a bunch of GPUs from the list as well. So, AMD's support is lagging right now, so is Intel's, but hopefully things will improve and I'll be revisiting this in the coming months.