The above can happen, but doesn't always happen as there are techniques to mitigate it. This is known as Triple Buffering.
As above, if you turn on V-sync with a graphics engine that only uses two buffers, and your card fails to render frames faster than 16.7ms, then those frames must wait for the next time the monitor refreshes it's image, which happens every 16.7ms on a 60hz monitor. That means if your card consistently fails to render frames in under 16.7ms, there will be 33.3ms between every frame, which is 30 FPS.
With triple buffering, things change. If you get 50 FPS without V-sync, you will continue to get about 50 FPS even with V-sync. This is because the GPU can start rendering a new frame even when it is waiting to send the image to the monitor. This is the system which causes stutter. Because V-sync forces your card to wait for the monitor to not be updating the image on the screen before you can send it to the monitor, some frames get delayed up to 16.7ms (assuming this is a 60hz monitor), but with the 3rd buffer, the GPU can start rendering a new frame while it waits to send the completed one to the screen. As a result, the next frame will likely be ready in time for the next refresh rate. This means there will be 16.7ms between some frames, and 33.3ms between other frames. This results in stutter.
You may also see some stutter with two buffers in the case were the GPU is able to render some frames under 16.7ms and not others.
What it boils down to, is if you are getting FPS between 30 and 60 while V-sync is on, the frame times will jump between 16.7ms and 33.3ms resulting in stutter.