Auteur: Language-Driven Cinematographic Framing for Human-Centric Video Generation


M. Burak Kizil1 Enes Sanli1 Niloy J. Mitra2,4 Xuelin Chen4 Erkut Erdem3 Aykut Erdem1 Duygu Ceylan4

1Koç University 2University College London 3Hacettepe University 4Adobe Research
arXiv Code (Coming Soon)

Auteur Teaser

We replace world-space camera trajectories with actor-relative shot compositions: a cinematographic DSL and a fine-tuned LLM map natural language and human motion into camera keyframes that are geometrically consistent and narratively intentional.

Motion and Framing Primitives

  • ← Jitter
    Arc
    Dolly in →
    The camera arcs from the left side to the front and then right, shifting the perspective smoothly across the space.
    Initial DSL
    Orientation: Left
    Depth: Medium
    Camera Level: Eye
    Lookat Level: Eye
    Dutch Angle: Normal
    Framing: Center
    Jitter: None
    Ease: None
    Final DSL
    Orientation: Right
    Depth: Medium
    Camera Level: Eye
    Lookat Level: Eye
    Dutch Angle: Normal
    Framing: Center

Model Comparisons

  • ← Tail Track
    Side Track
    Dolly In →
    A woman runs forward through the scene. The camera tracks from the right side and switches to left, maintaining a consistent side profile of the subject.
    Pulp Motion
    LAMP
    Auteur (Ours)

VerseCrafter Pipeline

  • ← Tail Track
    Over-the-Shoulder
    Anchor Middle →
    A man and a woman are sitting faced to each other. Camera changes from over-the-shoulder perspective to a close-up to person sitting in front.
    Generated Trajectory / Guidance
    Final Output

VACE

  • ← Over-the-Shoulder
    Arc
    Curved Path →
    Wild West scene set in a vast sunlit desert town, with a rugged cowboy standing still while adjusting his hat in the exact center of the frame, wearing .... The background .... Golden sand drifts lightly across the ground while the warm afternoon sun casts long dramatic shadows. The camera performs a smooth cinematic arc around the cowboy, beginning from his left side profile, slowly circling in front of him, then continuing toward his right side, revealing more of the town and desert landscape as it moves.
    Control Video
    Ours
    Without Guidance (T2V)