Could you use a translation layer, or would it just be simpler to rewrite it from the beginning?
It's a radically different language and while theoretically a translation layer could be used, it would be very inefficient.
So quick class on scalar vs vector computing.
Scalar instructions are instructions done on single values at a time. They consist of things like "load X data into memory", "store Y data into memory", "compare X value against Y value", "if previously comparison was true, go to this memory location", "add value X to memory address Y", "subtract value X from memory address Y", and so forth. It's all logical operations that define almost all computing, very much a single continuous line of logic. You want to add a value to 4000 elements, you do it one at a time in a loop.
Vector instructions are instructions on multiple values at a time. They consist of things like "add X value to A,B,C,D registers", "multiple A and B, then add result to memory location W,X,Y,Z", along with a whole host of floating point and calculous operations. This is useful for doing math on large datasets.
CPU's are designed to be extremely good at scalar instructions. We have secondary extensions to assist with vector instructions when necessary but their bread and butter is scalar operations which represent over 90% of all computing workloads. GPU's are designed to be extremely good at vector instructions, and while they have scalar instructions they are kinda slow and only there to assist with figuring out which datasets to execute on. So view CPU's are having 6~14 core that are good at single lots of scalar instructions. GPU's as having a thousand cores good at doing massive amounts of vector instructions.
With this in mind, code written for CPU's does not work very well on GPU's. Instead you need entirely separate code written specifically to take advantage of the GPU's vector nature.