News AI makes Doom in its own game engine — Google's GameNGen project uses Stable Diffusion to simulate gameplay

Status
Not open for further replies.
"AI-generating a complete game engine with consistent logic is a unique achievement."

But that's not what they did though, right?
Did they have an AI (presumably a LLM) make a game engine, or did they make an engine that uses Stable Diffusion to guess what the next frame will look like.
... or did they " generate original Doom gameplay on a neural network" - which would be by far the easiest way to pretend like they've accomplished something.

These are 3 different claims that would use 3 very different methods to accomplish, all of which are currently getting lumped together under the AI buzzword.

Based on the video, I'm not sure what they did, but the Doom guy's eyebrow twitches a lot so I doubt it's a level generator. Also they can't seem to get more than a couple seconds of consistent-looking gameplay before they either have to cut to a new clip, or the game teleports the player to a completely different area/state in a way that looks like they cut to a new clip.

But whatever they did, I'm sure they wasted an absolutely unfathomable amount of money and compute time on something that cannot possibly be used to generate revenue for their business.
 
  • Like
Reactions: NinoPino
"AI-generating a complete game engine with consistent logic is a unique achievement."

But that's not what they did though, right?
Did they have an AI (presumably a LLM) make a game engine, or did they make an engine that uses Stable Diffusion to guess what the next frame will look like.
... or did they " generate original Doom gameplay on a neural network" - which would be by far the easiest way to pretend like they've accomplished something.
The game engine is entirely implemented as a neural network. The article says they just feed your mouse & keyboard input into it, and then it renders the image. The neural network model contains all of the game logic, level designs, and is effectively the renderer.

But whatever they did, I'm sure they wasted an absolutely unfathomable amount of money and compute time on something that cannot possibly be used to generate revenue for their business.
As an AI company, doing AI research is part of their business. There's no question that image generators and video generators have real commercial value. If you read the article, it sounds like they made some real advances in addressing the problems with using a Stable Diffusion network to generate video.
 
  • Like
Reactions: jp7189
As a AI practictioner I can say a few things about this. First and foremost it is not a game engine. It is image generate AI that acts on a long window of frames (think of video generation) trained on hours of actual game play video so it can generate doom world. In a sense it is a renderer engine.

Now you can say what is the different? A renderer engine doesnt know about player, enermy, item, stat, line of sight and so on. It shows in the demo video when you see corspes disappere when moving out of frame, bullet count doesnt increase when player pick up, tank doesnt blow up when shot ... This is fundamental limit of this approach. At least they solve the game rendering part.
 
As a AI practictioner I can say a few things about this. First and foremost it is not a game engine. It is image generate AI that acts on a long window of frames (think of video generation) trained on hours of actual game play video so it can generate doom world. In a sense it is a renderer engine.

Now you can say what is the different? A renderer engine doesnt know about player, enermy, item, stat, line of sight and so on. It shows in the demo video when you see corspes disappere when moving out of frame, bullet count doesnt increase when player pick up, tank doesnt blow up when shot ... This is fundamental limit of this approach. At least they solve the game rendering part.
I disagree with your characterization. The input to a rendering engine is geometry and textures. This is not simply a rendering engine.

By contrast, a game engine knows the concept of physics, game rules, how to interpret game levels, AI for the enemies, etc. For a game engine, the input is essentially the player actions and it does pretty much all of the rest (with customizations by the game, as necessary).

They said its input is the game controller actions by the player. The model is doing all the work of a game engine, including the rendering phase. Just because it has flaws in the way it implements some of the game engine logic doesn't mean it's not a game engine - it just means that it's a flawed one.

I guess it would be more proper to say the AI model implemented the game, not just a game engine. The distinction is that a game engine abstracts certain things and provides hooks for game-specific customizations, whereas this is a complete game (again, accepting its various flaws and limitations).
 
"Google Research solved this problem by training new frames with a more extended sequence of user inputs and frames that preceded them—rather than just a single prompt image—and corrupting these context frames using Gaussian noise. Now, a separate but connected neural network fixes its context frames, ensuring a constantly self-correcting image and high levels of visual stability that remain for long periods."


This was the most interesting part to me. It sounds like they have a new approach that's yielding results. Once this is perfected, it may have profound impact on all types of video generation.
 
  • Like
Reactions: bit_user and gg83
I could see the 'wobblyness' of this being leant into rather than fixed. Train a NN on your game world, then enter a 'dream/nightmare' sequence where you are navigating a version where everything is slightly off, the map starts to diverge the longer time you spend there, nothing quite looks or works right, etc.
 
  • Like
Reactions: bit_user
"Google Research solved this problem by training new frames with a more extended sequence of user inputs and frames that preceded them—rather than just a single prompt image—and corrupting these context frames using Gaussian noise. Now, a separate but connected neural network fixes its context frames, ensuring a constantly self-correcting image and high levels of visual stability that remain for long periods."


This was the most interesting part to me. It sounds like they have a new approach that's yielding results. Once this is perfected, it may have profound impact on all types of video generation.
Maybe games will be released complete? Without 50gb update.
 
I could see the 'wobblyness' of this being leant into rather than fixed. Train a NN on your game world, then enter a 'dream/nightmare' sequence where you are navigating a version where everything is slightly off, the map starts to diverge the longer time you spend there, nothing quite looks or works right, etc.
Feed it the style of A Scanner Darkly and voice dialogue from everything else by Keanu, and have chatGPT generate a script = hilarious sequel?

View: https://www.youtube.com/watch?v=hkjDUERgCQw
 
I could see the 'wobblyness' of this being leant into rather than fixed. Train a NN on your game world, then enter a 'dream/nightmare' sequence where you are navigating a version where everything is slightly off, the map starts to diverge the longer time you spend there, nothing quite looks or works right, etc.
Exactly. I had a similar idea about an AI model trained to do ray tracing. I'd imagine it could get the geometry pretty good, but not exact. So, it could be used as an alternative for dream sequences or ethereal projections of objects.
 
Status
Not open for further replies.