Sora 2 Prompting Guide

Prompting for video is like briefing a cinematographer—clear direction helps Sora 2 understand your creative intent. This guide, based on official OpenAI documentation, systematically introduces how to control every aspect of video generation through prompts.

Official Source
This guide is based on the “Sora 2 Prompting Guide” from OpenAI Cookbook (published October 6, 2025).
Original authors: Robin Koenig (OpenAI), Joanne Shin (OpenAI)
English version organized by SoratoAI community with practical insights.

Before You Prompt

Think of Your Prompt as a Creative Wish List, Not a Contract

Think of prompting like briefing a cinematographer who has never seen your storyboard. If you leave out details, they’ll improvise—and you may not get what you envisioned. By being specific about what the “shot” should achieve, you give the model more control and consistency to work with.

But leaving some details open can be just as powerful. Giving the model more creative freedom can lead to surprising variations and unexpected, beautiful interpretations. Both approaches are valid:

Detailed prompts → Give you control and consistency
Lighter prompts → Open space for creative outcomes

The right balance depends on your goals and the result you’re aiming for.

Embrace Variety and Be Prepared to Iterate

Like with ChatGPT, using the same prompt multiple times will lead to different results—this is a feature, not a bug. Each generation is a fresh take, and sometimes the second or third option is better.

Small changes to camera, lighting, or action can shift the outcome dramatically. Collaborate with the model: you provide direction, and the model delivers creative variations.

Important Note
This isn’t an exact science—think of the guidance below as helpful suggestions learned from working with the model, not strict rules.

API Parameters

The prompt controls the content of the video, but certain attributes are governed only by API parameters and cannot be requested in prose:

Required Parameters

Parameter	Description	Options
model	Model version	`sora-2` or `sora-2-pro`
size	Video resolution	See table below
seconds	Clip length	`4`, `8`, `12` (default: 4)

Supported Resolutions

sora-2 model

1280x720 (landscape 720p)
720x1280 (portrait 720p)

sora-2-pro model

1280x720 (landscape 720p)
720x1280 (portrait 720p)
1024x1792 (portrait HD)
1792x1024 (landscape HD)

These parameters are the video’s container—resolution, duration, and quality will not change based on prose like “make it longer.” Set them explicitly in the API call; your prompt controls everything else (subject, motion, lighting, style).

Video Resolution Impact

Video resolution directly influences visual fidelity and motion consistency in Sora:

Higher resolutions generate detail, texture, and lighting transitions more accurately
Lower resolutions compress visual information, often introducing softness or artifacts

Video Length Best Practices

The model generally follows instructions more reliably in shorter clips. For best results:

Aim for concise shots
If your project allows, you may see better results by stitching together two 4-second clips in editing instead of generating a single 8-second clip

Effective Prompt Anatomy

A clear prompt describes a shot as if you were sketching it onto a storyboard:

State the camera framing - Specify camera angle and composition
Note depth of field - Define focus and background blur
Describe the action in beats - Use executable steps to describe movement
Set the lighting and palette - Define light sources, direction, and color tone

Anchor your subject with a few distinctive details to keep it recognizable, while a single, plausible action makes the shot easier to follow.

Single Shot vs. Multi-Shot

Single Shot Description

A clear shot unit contains:

One camera setup
One subject action
One lighting recipe

Multi-Shot Sequences

Describing multiple shots in a single prompt is valid if you need to cover a sequence. When you do this, keep each shot block distinct. This gives you flexibility to:

Generate short standalone clips for editing
Or let them play out as a sequence in one go

Treat each shot as a creative unit.

Prompt Length Trade-offs

Shorter Prompts ✨

Give the model more creative freedom
Expect surprising results

Longer Prompts ⚙️

Restrict the model’s creativity
It will try to follow your guidance, but might not always do so reliably

Short Prompt Example

In a 90s documentary-style interview, an old Swedish man sits in a study and says, 
"I still remember when I was young."

Why This Prompt Works:

90s documentary - Sets the style of the video. The model will choose variables like camera lens, lighting and color grade accordingly
an old Swedish man sits in a study - Describes subject and setting in minor detail, letting the model take creative liberties
says, "I still remember when I was young." - Describes the dialogue. Sora will likely be able to follow this exactly

Important Note: This prompt will reliably produce videos that match these requirements. However, it might not match your vision exactly as many details are left open:

Time of day, weather
Outfits, tone
Look and age of the character
Camera angles, cuts
Set design
Many other factors

Unless you describe these details, Sora will make them up.

Going Ultra-Detailed (Cinematic Level)

For complex, cinematic shots, you can go beyond the standard prompt structure and specify the look, camera setup, grading, soundscape, and even shot rationale in professional production terms.

This is similar to how a director briefs a camera crew or VFX team. Detailed cues for lensing, filtration, lighting, grading, and motion help the model lock onto a very specific aesthetic.

You might describe:

What the viewer notices first
Camera platform and lens
Lighting direction
Color palette
Texture qualities
Diegetic sound
Shot timing

Cinematic Prompt Example

Format & Look
Duration 4s; 180° shutter; digital capture emulating 65mm photochemical contrast; 
fine grain; subtle halation on speculars; no gate weave.

Lenses & Filtration
32mm/50mm spherical primes; Black Pro-Mist 1/4; 
slight CPL rotation to manage glass reflections on train windows.

Grade / Palette
Highlights: clean morning sunlight with amber lift
Mids: balanced neutrals with slight teal cast in shadows
Blacks: soft, neutral with mild lift for haze retention

Lighting & Atmosphere
Natural sunlight from camera left, low angle (07:30AM)
Bounce: 4×4 ultrabounce silver from trackside
Negative fill from opposite wall
Practical: sodium platform lights on dim fade
Atmos: gentle mist; train exhaust drift through light beam

Location & Framing
Urban commuter platform, dawn
Foreground: yellow safety line, coffee cup on bench
Midground: waiting passengers silhouetted in haze
Background: arriving train braking to a stop
Avoid signage or corporate branding

Wardrobe / Props / Extras
Main subject: mid-30s traveler, navy coat, backpack slung on one shoulder, 
holding phone loosely at side
Extras: commuters in muted tones; one cyclist pushing bike
Props: paper coffee cup, rolling luggage, LED departure board (generic destinations)

Sound
Diegetic only: faint rail screech, train brakes hiss, 
distant announcement muffled (-20LUFS), low ambient hum
Footsteps and paper rustle; no score or added foley

Optimized Shot List (2 shots /4s total)

0.00–2.40 — "Arrival Drift" (32mm, shoulder-mounted slow dolly left)
Camera slides past platform signage edge; shallow focus reveals traveler mid-frame 
looking down tracks. Morning light blooms across lens; train headlights flare softly 
through mist. Purpose: establish setting and tone, hint anticipation.

2.40–4.00 — "Turn and Pause" (50mm, slow arc in)
Cut to tighter over-shoulder arc as train halts; traveler turns slightly toward camera, 
catching sunlight rim across cheek and phone screen reflection. Eyes flick up toward 
something unseen. Purpose: create human focal moment with minimal motion.

Camera Notes (Why It Reads)
Keep eyeline low and close to lens axis for intimacy
Allow micro flares from train glass as aesthetic texture
Preserve subtle handheld imperfection for realism
Do not break silhouette clarity with overexposed flare; retain skin highlight roll-off

Finishing
Fine-grain overlay with mild chroma noise for realism; restrained halation on practicals; 
warm-cool LUT for morning split tone
Mix: prioritize train and ambient detail over footstep transients
Poster frame: traveler mid-turn, golden rim light, arriving train soft-focus in background haze

This approach works well when you want to match real cinematography styles (e.g., IMAX aerials, 35mm handheld, vintage 16mm documentary) or maintain strict continuity across shots.

Visual Cues That Steer the Look

When writing prompts, style is one of the most powerful levers for guiding the model toward your desired outcome. Describing the overall aesthetic—for example:

“1970s film”
“Epic, IMAX-scale scene”
“16mm black-and-white film”

This sets a visual tone that frames all other choices. Establish this style early so the model can carry it through consistently.

How Style Affects Interpretation

The same details will read very differently depending on whether you call for:

A polished Hollywood drama
A handheld smartphone clip
A grainy vintage commercial

Once the tone is set, layer in specifics with shot, action, and light.

Clarity Wins: Specific Over Vague

Use verbs and nouns that point to visible results, avoiding vague cues:

Weak Example ❌	Strong Example ✅
“A beautiful street at night”	“Wet asphalt, zebra crosswalk, neon signs reflecting in puddles”
“Person moves quickly”	“Cyclist pedals three times, brakes, and stops at crosswalk”
“Cinematic look”	“Anamorphic 2.0x lens, shallow DOF, volumetric light”

Camera Direction and Framing

Camera direction and framing shape how a shot feels:

Wide shot from above - Emphasizes space and context
Close-up at eye level - Focuses attention on emotion

Depth of field adds another layer:

Shallow focus - Makes subject stand out against blurred background
Deep focus - Keeps both foreground and background sharp

Lighting sets tone just as strongly:

Soft, warm key - Creates something inviting
Single hard light with cool edges - Pushes toward drama

Weak vs. Strong Example

Weak:

Camera shot: cinematic look

Strong:

Camera shot: wide shot, low angle
Depth of field: shallow (sharp on subject, blurred background)
Lighting + palette: warm backlight with soft rim

Good Framing Instructions

wide establishing shot, eye level
wide shot, tracking left to right with the charge
aerial wide shot, slight downward angle
medium close-up shot, slight angle from behind

Good Camera Motion Instructions

slowly tilting camera
handheld eng camera

Character Consistency Notes

When introducing characters, expect some unpredictability—small changes in phrasing can alter:

Identity
Pose
Focus of the scene itself

Maintaining consistency:

Keep descriptions consistent across shots
Reuse phrasing for continuity
Avoid mixing traits that may compete

Control Motion and Timing

Movement is often the hardest part to get right, so keep it simple.

One Shot, One Thing

Each shot should have:

One clear camera move
One clear subject action

Describe Actions in Beats

Actions work best when described in beats or counts—small steps, gestures, or pauses—so they feel grounded in time.

Weak Example ❌:

Actor walks across the room.

Strong Example ✅:

Actor takes four steps to the window, pauses, and pulls the curtain in the final second.

The second example makes the timing precise and achievable.

Lighting and Color Consistency

Light determines mood as much as action or setting.

Light Quality Impact

Diffuse light across the frame - Feels calm and neutral
Single strong source - Creates sharp contrast and tension

Key to Seamless Editing

When you want to cut multiple clips together, keeping lighting logic consistent is what makes the edit seamless.

Best Practices for Describing Light

Describe both the quality of the light and the color anchors that reinforce it.

Weak Example ❌:

Lighting + palette: brightly lit room

Strong Example ✅:

Lighting + palette: soft window light with warm lamp fill, cool rim from hallway 
Palette anchors: amber, cream, walnut brown

Naming 3-5 colors helps keep the palette stable across shots.

Use Image Input for More Control

For even more fine-grained control over the composition and style of a shot, you can use an image input as a visual reference.

What Image Input Can Lock

Character design
Wardrobe
Set dressing
Overall aesthetic

The model uses the image as an anchor for the first frame, while your text prompt defines what happens next.

How to Use It

Include an image file as the input_reference parameter in your POST /videos request.

Requirements:

The image must match the target video’s resolution (size)
Supported file formats: image/jpeg, image/png, and image/webp

Example Comparison

Input Image (Generated with GPT Image)	Generated Video (Sora 2)
	Prompt: “She turns around and smiles, then slowly walks out of the frame.”
	Prompt: “The fridge door opens. A cute, chubby purple monster comes out of it.”

Experimentation Tip

If you don’t already have visual references, OpenAI’s image generation model is a powerful way to create them. You can:

Quickly produce environments and scene designs
Pass them into Sora as references
Test aesthetics and generate beautiful starting points for your videos

Dialogue and Audio

Dialogue Writing Guidelines

Dialogue must be described directly in your prompt. Place it in a separate block below your prose description so the model clearly distinguishes visual description from spoken lines.

Dialogue Writing Points

Keep lines concise and natural - Avoid long, complex speeches
Limit exchanges - Try to limit to a handful of sentences to match your clip length
Label speakers consistently - For multi-character scenes, use alternating turns
Consider duration matching:
- 4-second shot - Usually accommodates one or two short exchanges
- 8-second clip - Can support a few more

Long, complex speeches are unlikely to sync well and may break pacing.

Prompt Example with Dialogue

A cramped, windowless room with walls the color of old ash. 
A single bare bulb dangles from the ceiling, its light pooling onto 
the scarred metal table at the center. Two chairs face each other across it. 
On one side sits the Detective, trench coat draped across the back of his chair, 
eyes sharp and unblinking. Across from him, the Suspect slouches, 
cigarette smoke curling lazily toward the ceiling. 
The silence presses in, broken only by the faint hum of the overhead light.

Dialogue:
- Detective: "You're lying. I can hear it in your silence."
- Suspect: "Or maybe I'm just tired of talking."
- Detective: "Either way, you'll talk before the night's over."

Silent Shot Audio Cues

If your shot is silent, you can still suggest pacing with one small sound, such as:

“distant traffic hiss”
“a crisp snap”

Think of it as a rhythm cue rather than a full soundtrack.

Background Sound Description Example

The hum of espresso machines and the murmur of voices form the background.

Iterate with Remix Functionality

Remix is for nudging, not gambling.

Remix Best Practices

Make controlled changes—one at a time
Say what you’re changing:
- “same shot, switch to 85mm”
- “same lighting, new palette: teal, sand, rust”

Strategy When Close to Target

When a result is close:

Pin it as a reference
Describe only the tweak

That way, everything that already works stays locked.

Handling Problem Shots

If a shot keeps misfiring:

Strip it back - Freeze the camera, simplify the action, clear the background
Verify - Once it works
Iterate - Layer additional complexity step by step

Remix Examples

Original Video	Remix Generated Video
	Prompt: “Change the color of the monster to orange”
	Prompt: “A second monster comes out right after”

Prompt Templates and Examples

Standard Prompt Structure

One effective way to write prompts is to separate different kinds of information. This is not a one-size-fits-all recipe, but it gives you a clear framework and makes it easier to be consistent.

Not every detail needs to be included - If something doesn’t matter for the shot, you can leave it out.

In fact, leaving certain elements open-ended will encourage the model to be more creative. The less tightly you specify every visual choice, the more room the model has to interpret and surprise you with unexpected but often beautiful variations.

Descriptive Level Trade-off

Highly descriptive prompts → More consistent, controlled results
Lighter prompts → Unlock diverse outcomes that feel fresh and imaginative

Universal Template

[Prose scene description in plain language. Describe characters, costumes, scenery, 
weather and other details. Be as descriptive to generate a video that matches your vision.]

Cinematography:
Camera shot: [framing and angle, e.g. wide establishing shot, eye level]
Depth of field: [shallow/deep]
Lens/style cues: [e.g. anamorphic lens, handheld]
Mood: [overall tone, e.g. cinematic and tense, playful and suspenseful, luxurious anticipation]

Actions:
- [Action 1: a clear, specific beat or gesture]
- [Action 2: another distinct beat within the clip]
- [Action 3: another action or dialogue line]

Dialogue:
[If the shot has dialogue, add short natural lines here or as part of the actions list. 
Keep them brief so they match the clip length.]

Complete Examples

Example 1: Robot Workshop Scene

Style: Hand-painted 2D/3D hybrid animation with soft brush textures, 
warm tungsten lighting, and a tactile, stop-motion feel. The aesthetic evokes 
mid-2000s storybook animation—cozy, imperfect, full of mechanical charm. 
Subtle watercolor wash and painterly textures; warm-cool balance in grade; 
filmic motion blur for animated realism.

Inside a cluttered workshop, shelves overflow with gears, bolts, and yellowing blueprints. 
At the center, a small round robot sits on a wooden bench, its dented body patched with 
mismatched plates and old paint layers. Its large glowing eyes flicker pale blue as it 
fiddles nervously with a humming light bulb. The air hums with quiet mechanical whirs, 
rain patters on the window, and the clock ticks steadily in the background.

Cinematography:
Camera: medium close-up, slow push-in with gentle parallax from hanging tools
Lens: 35mm virtual lens; shallow depth of field to soften background clutter
Lighting: warm key from overhead practical; cool spill from window for contrast
Mood: gentle, whimsical, a touch of suspense

Actions:
- The robot taps the bulb; sparks crackle
- It flinches, dropping the bulb, eyes widening
- The bulb tumbles in slow motion; it catches it just in time
- A puff of steam escapes its chest—relief and pride
- Robot says quietly: "Almost lost it… but I got it!"

Background Sound:
Rain, ticking clock, soft mechanical hum, faint bulb sizzle.

Example 2: Rooftop Romance Scene

Style: 1970s romantic drama, shot on 35mm film with natural flares, soft focus, 
and warm halation. Slight gate weave and handheld micro-shake evoke vintage intimacy. 
Warm Kodak-inspired grade; light halation on bulbs; film grain and soft vignette 
for period authenticity.

At golden hour, a brick tenement rooftop transforms into a small stage. 
Laundry lines strung with white sheets sway in the wind, catching the last rays of sunlight. 
Strings of mismatched fairy bulbs hum faintly overhead. A young woman in a flowing red silk 
dress dances barefoot, curls glowing in the fading light. Her partner—sleeves rolled, 
suspenders loose—claps along, his smile wide and unguarded. Below, the city hums with 
car horns, subway tremors, and distant laughter.

Cinematography:
Camera: medium-wide shot, slow dolly-in from eye level
Lens: 40mm spherical; shallow focus to isolate the couple from skyline
Lighting: golden natural key with tungsten bounce; edge from fairy bulbs
Mood: nostalgic, tender, cinematic

Actions:
- She spins; her dress flares, catching sunlight
- Woman (laughing): "See? Even the city dances with us tonight."
- He steps in, catches her hand, and dips her into shadow
- Man (smiling): "Only because you lead."
- Sheets drift across frame, briefly veiling the skyline before parting again

Background Sound:
Natural ambience only: faint wind, fabric flutter, street noise, muffled music. 
No added score.

Troubleshooting Common Issues

Results Too Random?

Solution: Add more framing, depth of field, and lighting anchor descriptions

Motion Unreadable?

Solution: Converge to “one camera move + one action”

Edits Don’t Flow?

Solution: Fix lighting logic and color palette

Character Inconsistent?

Solution: Reuse the same identity descriptions and phrasing

Summary and Best Practices

Core Takeaways

API Parameters First - model, size, seconds must be explicitly set
Concise vs. Detailed - Balance control and creative space based on needs
One Shot, One Thing - One camera move + one subject action
Visual Anchors - Use specific, visible descriptions instead of vague words
Lighting Consistency - Maintain stable lighting logic across shots
Iterate to Optimize - Use Remix for fine-tuning, not regeneration

Recommended Workflow

Clarify Goals → What should this shot achieve?
Set Parameters → Choose appropriate model, size, seconds
Write Initial Prompt → Start concise, or use template
Generate and Evaluate → Review multiple variants, select closest
Remix Optimize → Targeted adjustments to selected version
Edit and Integrate → Combine satisfying clips into project

Reference Resources

Official Guide: OpenAI Cookbook - Sora 2 Prompting Guide
API Documentation: Sora API Reference
Image Generation: GPT Image Generation
Prompt Library: Visit Sora2 Prompt Library for curated prompts and creative inspiration
Community Discussion: Visit SoratoAI Community to exchange experiences with other creators

Copyright Notice: This guide is based on official OpenAI documentation (by Robin Koenig & Joanne Shin), localized and practice-optimized by the SoratoAI community.

Sora 2 Features Guide Sora2 Prompting Techniques