Humans actually know they're playing a game. An AI model does not. And I don't think the original researcher is saying, "wow, I'm shocked it didn't hesitate" but rather "hey, you know this model could easily be used to create kill-bots for the real world?"
You don't even need AI for that, OpenCV has been recognising people in images it for a good decade.
And if you're concerned about entire autonomous weapon systems that can make no-human-in-the-loop attack/abort decisions, that's many years late to the party, cruise missiles have been doing that for a good half a century (specifically terminal phase identification, discrimination, and selection of aircraft and ship targets). Demonstration of low-cost cruise missiles with consumer components was also done two decades ago by Bruce Simpson, pre-dating the more recent 'drone' craze. The technology is all old hat, it just took a while for hollywood sci-fi to catch up to reality for people to bother caring about it.
Where the scary-scary-killer-drones story falls apart however is when you actually try and implement any of it. Turns out, the reliability requirements for terminal guidance are well above what a datacentre-backed ML system can do with nice clean high-resolution-high-framerate machine vision systems. Which is why the current state-of-the-art for operational drone usage is reverting back to the days of the TV-guided missile: a human looks at a grainy unreliable low-resolution video feed and manually pilots the drone, because AI is
really bad at either of those tasks.