There are other major limitations to stacking more than 2 layers in CFET, specifically routing. Each transistor needs 3 contacts (source, drain, gate) and sometimes transistors can share multiple. CFET works because most transistors in CMOS architecture are paired so they can share contacts (usually the source and the gate). If you start adding 3 or 4 transistors in a stack, you quickly run out of surface area for all the contacts to connect to all the transistors in a stack. When transistors in a stack aren't paired, it also becomes challenging (but surmountable) to vertically isolate their gates from each other.
3DNAND is a special case because little routing is needed. Each stack shares 1 source and 1 drain and the gates are shared by thousands of transistors in a plane, each. Even still, the staircase structures needed to contact each NAND gate are complex and non-trivial to manufacture.
Mono CFET also presents challenges to create the nanosheets and then etch deep enough that will probably limit the number of ribbons in a stack for a while. Sequential CFET with several layers of interconnects between the ribbons would be a possibility for solving both the routing problems and the stack pattern problems, but at that point, you're looking at something more in common with Foveros/active interposer than a monolithic chip solution.
Bottom line, although possible, I don't see it very likely to move beyond a stack of 2 paired transistors of CFET quickly.
I agree with your assessment of Forksheet. It's basically GAAFET with a dielectric to split each fin in half so that both PMOS and NMOS operate on the same fin.
I'm a former technical trainer in semiconductor manufacturing. I trained new hires on process flow and capabilities. I've been writing articles on medium about finfet, gaafet and cfet for a more general audience. I won't link them, here, but you can look me up.