Having taken graphics courses and done ray tracing, I saw the limitations of this a long way off. Even in ray tracing you really have to use an octree, otherwise a scene with 10,000 triangles takes 10 times longer to render than a scene with 1,000 triangles because it has to check every ray against every triangle.
However, that doesn't mean that a scene has to be static. You can have objects moving around, it just means that the tree can become unbalanced and start to slow down. Alternately, you can rebuild the tree every time you're going to re-render, sorting millions of objects in 3D space. This can be done dozens of times on a GPU for fairly simple scenes (read: dozens of frames per second). However, I don't think that would play too nicely with the sort of MegaTexture voxel equivalent they're talking about, gigabytes of voxel octrees stored and streamed as needed. Certainly, for optical media like on consoles this would be a no-go (though with any luck, by the time voxel octrees are viable, even consoles will use hard drives to store all game data).
I think the idea is very cool, and certainly a fun mental puzzle, but at the expense of saying something which will make me look stubborn and short-sighted 10 years from now, I can't see how it's a real improvement over what we have today. Triangle rastarization is fast. If this sort of voxel ray casting doesn't give us improved interactivity and destructability, and it doesn't do the secondary ray casting of ray tracing to yield better effects, I don't see why it's an appealing alternative in the slightest. Perhaps if they used voxels to simulate object and environment destruction, and made triangle meshes on the fly to cover the newly created surfaces, they could be useful for something. Or maybe I'm just missing the whole point of this rendering approach.