I wonder if the distinction was around the type of simulation.
When we talk about flight simulations or combat simulations, we think of something realtime, like MS Flight Simulator, where actual humans are conducting and observing a mission.
However, when training AI to accomplish a task, you often use a digital simulation of the thing you want it to do, and that lets you run at much faster than realtime and many instances in parallel. In this case, the "human operator" would be a set of preprogrammed commands that arose under different circumstances.
So, maybe it turned out that the trained AI had learned the behavior of taking out operators or the comms towers, because that maximized the objective function. In that case, just change the objective function and you've fixed the problem.