Ablation study results.
We validate the effectiveness of our core designs: the
geo-motion augmented latents and the
joint optimization strategy.
As demonstrated in the visual comparison:
-
Base Finetuned: a finetuned version of the CogVideoX model. Results show unnatural motion and inconsistent appearance across frames.
-
w/o Geo-Motion: relying solely on the RGB-only model (without geo-motion latents) leads to noticeable
visual artifacts and ghosting, degrading scene stability.
-
w/o Joint: skipping the joint training and regularization stage results in reduced subject consistency, causing
incoherent motion particularly for dynamic objects and humans.
-
WorldReel (Ours): by effectively aligning appearance and geometry, our full model produces the
smoothest non-rigid motion and maintains natural, high-quality complex dynamics.