How does branching in GPU architecture work? For example if I have a simple if else statement and the execution gets serialized. If I need to run both execution parts(if and else), why don't I simply compare first? Why do threads need to run at different time during execution, why not at the same time?