Sora 2 Prompting Guide
Prompting for video is like briefing a cinematographer—clear direction helps Sora 2 understand your creative intent. This guide, based on official OpenAI documentation, systematically introduces how to control every aspect of video generation through prompts.
This guide is based on the “Sora 2 Prompting Guide” from OpenAI Cookbook (published October 6, 2025).
Original authors: Robin Koenig (OpenAI), Joanne Shin (OpenAI)
English version organized by SoratoAI community with practical insights.
Before You Prompt
Think of Your Prompt as a Creative Wish List, Not a Contract
Think of prompting like briefing a cinematographer who has never seen your storyboard. If you leave out details, they’ll improvise—and you may not get what you envisioned. By being specific about what the “shot” should achieve, you give the model more control and consistency to work with.
But leaving some details open can be just as powerful. Giving the model more creative freedom can lead to surprising variations and unexpected, beautiful interpretations. Both approaches are valid:
- Detailed prompts → Give you control and consistency
- Lighter prompts → Open space for creative outcomes
The right balance depends on your goals and the result you’re aiming for.
Embrace Variety and Be Prepared to Iterate
Like with ChatGPT, using the same prompt multiple times will lead to different results—this is a feature, not a bug. Each generation is a fresh take, and sometimes the second or third option is better.
Small changes to camera, lighting, or action can shift the outcome dramatically. Collaborate with the model: you provide direction, and the model delivers creative variations.
This isn’t an exact science—think of the guidance below as helpful suggestions learned from working with the model, not strict rules.
API Parameters
The prompt controls the content of the video, but certain attributes are governed only by API parameters and cannot be requested in prose:
Required Parameters
Parameter | Description | Options |
---|---|---|
model | Model version | sora-2 or sora-2-pro |
size | Video resolution | See table below |
seconds | Clip length | 4 , 8 , 12 (default: 4) |
Supported Resolutions
sora-2 model
- 1280x720 (landscape 720p)
- 720x1280 (portrait 720p)
sora-2-pro model
- 1280x720 (landscape 720p)
- 720x1280 (portrait 720p)
- 1024x1792 (portrait HD)
- 1792x1024 (landscape HD)
Video Resolution Impact
Video resolution directly influences visual fidelity and motion consistency in Sora:
- Higher resolutions generate detail, texture, and lighting transitions more accurately
- Lower resolutions compress visual information, often introducing softness or artifacts
Video Length Best Practices
The model generally follows instructions more reliably in shorter clips. For best results:
- Aim for concise shots
- If your project allows, you may see better results by stitching together two 4-second clips in editing instead of generating a single 8-second clip
Effective Prompt Anatomy
A clear prompt describes a shot as if you were sketching it onto a storyboard:
- State the camera framing - Specify camera angle and composition
- Note depth of field - Define focus and background blur
- Describe the action in beats - Use executable steps to describe movement
- Set the lighting and palette - Define light sources, direction, and color tone
Anchor your subject with a few distinctive details to keep it recognizable, while a single, plausible action makes the shot easier to follow.
Single Shot vs. Multi-Shot
Single Shot Description
A clear shot unit contains:
- One camera setup
- One subject action
- One lighting recipe
Multi-Shot Sequences
Describing multiple shots in a single prompt is valid if you need to cover a sequence. When you do this, keep each shot block distinct. This gives you flexibility to:
- Generate short standalone clips for editing
- Or let them play out as a sequence in one go
Treat each shot as a creative unit.
Prompt Length Trade-offs
Shorter Prompts ✨
- Give the model more creative freedom
- Expect surprising results
Longer Prompts ⚙️
- Restrict the model’s creativity
- It will try to follow your guidance, but might not always do so reliably
Short Prompt Example
In a 90s documentary-style interview, an old Swedish man sits in a study and says,
"I still remember when I was young."
Why This Prompt Works:
90s documentary
- Sets the style of the video. The model will choose variables like camera lens, lighting and color grade accordinglyan old Swedish man sits in a study
- Describes subject and setting in minor detail, letting the model take creative libertiessays, "I still remember when I was young."
- Describes the dialogue. Sora will likely be able to follow this exactly
Important Note: This prompt will reliably produce videos that match these requirements. However, it might not match your vision exactly as many details are left open:
- Time of day, weather
- Outfits, tone
- Look and age of the character
- Camera angles, cuts
- Set design
- Many other factors
Unless you describe these details, Sora will make them up.
Going Ultra-Detailed (Cinematic Level)
For complex, cinematic shots, you can go beyond the standard prompt structure and specify the look, camera setup, grading, soundscape, and even shot rationale in professional production terms.
This is similar to how a director briefs a camera crew or VFX team. Detailed cues for lensing, filtration, lighting, grading, and motion help the model lock onto a very specific aesthetic.
You might describe:
- What the viewer notices first
- Camera platform and lens
- Lighting direction
- Color palette
- Texture qualities
- Diegetic sound
- Shot timing
Cinematic Prompt Example
Format & Look
Duration 4s; 180° shutter; digital capture emulating 65mm photochemical contrast;
fine grain; subtle halation on speculars; no gate weave.
Lenses & Filtration
32mm/50mm spherical primes; Black Pro-Mist 1/4;
slight CPL rotation to manage glass reflections on train windows.
Grade / Palette
Highlights: clean morning sunlight with amber lift
Mids: balanced neutrals with slight teal cast in shadows
Blacks: soft, neutral with mild lift for haze retention
Lighting & Atmosphere
Natural sunlight from camera left, low angle (07:30AM)
Bounce: 4×4 ultrabounce silver from trackside
Negative fill from opposite wall
Practical: sodium platform lights on dim fade
Atmos: gentle mist; train exhaust drift through light beam
Location & Framing
Urban commuter platform, dawn
Foreground: yellow safety line, coffee cup on bench
Midground: waiting passengers silhouetted in haze
Background: arriving train braking to a stop
Avoid signage or corporate branding
Wardrobe / Props / Extras
Main subject: mid-30s traveler, navy coat, backpack slung on one shoulder,
holding phone loosely at side
Extras: commuters in muted tones; one cyclist pushing bike
Props: paper coffee cup, rolling luggage, LED departure board (generic destinations)
Sound
Diegetic only: faint rail screech, train brakes hiss,
distant announcement muffled (-20LUFS), low ambient hum
Footsteps and paper rustle; no score or added foley
Optimized Shot List (2 shots /4s total)
0.00–2.40 — "Arrival Drift" (32mm, shoulder-mounted slow dolly left)
Camera slides past platform signage edge; shallow focus reveals traveler mid-frame
looking down tracks. Morning light blooms across lens; train headlights flare softly
through mist. Purpose: establish setting and tone, hint anticipation.
2.40–4.00 — "Turn and Pause" (50mm, slow arc in)
Cut to tighter over-shoulder arc as train halts; traveler turns slightly toward camera,
catching sunlight rim across cheek and phone screen reflection. Eyes flick up toward
something unseen. Purpose: create human focal moment with minimal motion.
Camera Notes (Why It Reads)
Keep eyeline low and close to lens axis for intimacy
Allow micro flares from train glass as aesthetic texture
Preserve subtle handheld imperfection for realism
Do not break silhouette clarity with overexposed flare; retain skin highlight roll-off
Finishing
Fine-grain overlay with mild chroma noise for realism; restrained halation on practicals;
warm-cool LUT for morning split tone
Mix: prioritize train and ambient detail over footstep transients
Poster frame: traveler mid-turn, golden rim light, arriving train soft-focus in background haze
This approach works well when you want to match real cinematography styles (e.g., IMAX aerials, 35mm handheld, vintage 16mm documentary) or maintain strict continuity across shots.
Visual Cues That Steer the Look
When writing prompts, style is one of the most powerful levers for guiding the model toward your desired outcome. Describing the overall aesthetic—for example:
- “1970s film”
- “Epic, IMAX-scale scene”
- “16mm black-and-white film”
This sets a visual tone that frames all other choices. Establish this style early so the model can carry it through consistently.
How Style Affects Interpretation
The same details will read very differently depending on whether you call for:
- A polished Hollywood drama
- A handheld smartphone clip
- A grainy vintage commercial
Once the tone is set, layer in specifics with shot, action, and light.
Clarity Wins: Specific Over Vague
Use verbs and nouns that point to visible results, avoiding vague cues:
Weak Example ❌ | Strong Example ✅ |
---|---|
“A beautiful street at night” | “Wet asphalt, zebra crosswalk, neon signs reflecting in puddles” |
“Person moves quickly” | “Cyclist pedals three times, brakes, and stops at crosswalk” |
“Cinematic look” | “Anamorphic 2.0x lens, shallow DOF, volumetric light” |
Camera Direction and Framing
Camera direction and framing shape how a shot feels:
- Wide shot from above - Emphasizes space and context
- Close-up at eye level - Focuses attention on emotion
Depth of field adds another layer:
- Shallow focus - Makes subject stand out against blurred background
- Deep focus - Keeps both foreground and background sharp
Lighting sets tone just as strongly:
- Soft, warm key - Creates something inviting
- Single hard light with cool edges - Pushes toward drama
Weak vs. Strong Example
Weak:
Camera shot: cinematic look
Strong:
Camera shot: wide shot, low angle
Depth of field: shallow (sharp on subject, blurred background)
Lighting + palette: warm backlight with soft rim
Good Framing Instructions
- wide establishing shot, eye level
- wide shot, tracking left to right with the charge
- aerial wide shot, slight downward angle
- medium close-up shot, slight angle from behind
Good Camera Motion Instructions
- slowly tilting camera
- handheld eng camera
Character Consistency Notes
When introducing characters, expect some unpredictability—small changes in phrasing can alter:
- Identity
- Pose
- Focus of the scene itself
Maintaining consistency:
- Keep descriptions consistent across shots
- Reuse phrasing for continuity
- Avoid mixing traits that may compete
Control Motion and Timing
Movement is often the hardest part to get right, so keep it simple.
One Shot, One Thing
Each shot should have:
- One clear camera move
- One clear subject action
Describe Actions in Beats
Actions work best when described in beats or counts—small steps, gestures, or pauses—so they feel grounded in time.
Weak Example ❌:
Actor walks across the room.
Strong Example ✅:
Actor takes four steps to the window, pauses, and pulls the curtain in the final second.
The second example makes the timing precise and achievable.
Lighting and Color Consistency
Light determines mood as much as action or setting.
Light Quality Impact
- Diffuse light across the frame - Feels calm and neutral
- Single strong source - Creates sharp contrast and tension
Key to Seamless Editing
When you want to cut multiple clips together, keeping lighting logic consistent is what makes the edit seamless.
Best Practices for Describing Light
Describe both the quality of the light and the color anchors that reinforce it.
Weak Example ❌:
Lighting + palette: brightly lit room
Strong Example ✅:
Lighting + palette: soft window light with warm lamp fill, cool rim from hallway
Palette anchors: amber, cream, walnut brown
Naming 3-5 colors helps keep the palette stable across shots.
Use Image Input for More Control
For even more fine-grained control over the composition and style of a shot, you can use an image input as a visual reference.
What Image Input Can Lock
- Character design
- Wardrobe
- Set dressing
- Overall aesthetic
The model uses the image as an anchor for the first frame, while your text prompt defines what happens next.
How to Use It
Include an image file as the input_reference
parameter in your POST /videos request.
Requirements:
- The image must match the target video’s resolution (size)
- Supported file formats:
image/jpeg
,image/png
, andimage/webp
Example Comparison
Input Image (Generated with GPT Image) | Generated Video (Sora 2) |
---|---|
Prompt: “She turns around and smiles, then slowly walks out of the frame.” | |
Prompt: “The fridge door opens. A cute, chubby purple monster comes out of it.” |
Experimentation Tip
If you don’t already have visual references, OpenAI’s image generation model is a powerful way to create them. You can:
- Quickly produce environments and scene designs
- Pass them into Sora as references
- Test aesthetics and generate beautiful starting points for your videos
Dialogue and Audio
Dialogue Writing Guidelines
Dialogue must be described directly in your prompt. Place it in a separate block below your prose description so the model clearly distinguishes visual description from spoken lines.
Dialogue Writing Points
- Keep lines concise and natural - Avoid long, complex speeches
- Limit exchanges - Try to limit to a handful of sentences to match your clip length
- Label speakers consistently - For multi-character scenes, use alternating turns
- Consider duration matching:
- 4-second shot - Usually accommodates one or two short exchanges
- 8-second clip - Can support a few more
Prompt Example with Dialogue
A cramped, windowless room with walls the color of old ash.
A single bare bulb dangles from the ceiling, its light pooling onto
the scarred metal table at the center. Two chairs face each other across it.
On one side sits the Detective, trench coat draped across the back of his chair,
eyes sharp and unblinking. Across from him, the Suspect slouches,
cigarette smoke curling lazily toward the ceiling.
The silence presses in, broken only by the faint hum of the overhead light.
Dialogue:
- Detective: "You're lying. I can hear it in your silence."
- Suspect: "Or maybe I'm just tired of talking."
- Detective: "Either way, you'll talk before the night's over."
Silent Shot Audio Cues
If your shot is silent, you can still suggest pacing with one small sound, such as:
- “distant traffic hiss”
- “a crisp snap”
Think of it as a rhythm cue rather than a full soundtrack.
Background Sound Description Example
The hum of espresso machines and the murmur of voices form the background.
Iterate with Remix Functionality
Remix is for nudging, not gambling.
Remix Best Practices
- Make controlled changes—one at a time
- Say what you’re changing:
- “same shot, switch to 85mm”
- “same lighting, new palette: teal, sand, rust”
Strategy When Close to Target
When a result is close:
- Pin it as a reference
- Describe only the tweak
That way, everything that already works stays locked.
Handling Problem Shots
If a shot keeps misfiring:
- Strip it back - Freeze the camera, simplify the action, clear the background
- Verify - Once it works
- Iterate - Layer additional complexity step by step
Remix Examples
Original Video | Remix Generated Video |
---|---|
Prompt: “Change the color of the monster to orange” | |
Prompt: “A second monster comes out right after” |
Prompt Templates and Examples
Standard Prompt Structure
One effective way to write prompts is to separate different kinds of information. This is not a one-size-fits-all recipe, but it gives you a clear framework and makes it easier to be consistent.
Not every detail needs to be included - If something doesn’t matter for the shot, you can leave it out.
In fact, leaving certain elements open-ended will encourage the model to be more creative. The less tightly you specify every visual choice, the more room the model has to interpret and surprise you with unexpected but often beautiful variations.
Descriptive Level Trade-off
- Highly descriptive prompts → More consistent, controlled results
- Lighter prompts → Unlock diverse outcomes that feel fresh and imaginative
Universal Template
[Prose scene description in plain language. Describe characters, costumes, scenery,
weather and other details. Be as descriptive to generate a video that matches your vision.]
Cinematography:
Camera shot: [framing and angle, e.g. wide establishing shot, eye level]
Depth of field: [shallow/deep]
Lens/style cues: [e.g. anamorphic lens, handheld]
Mood: [overall tone, e.g. cinematic and tense, playful and suspenseful, luxurious anticipation]
Actions:
- [Action 1: a clear, specific beat or gesture]
- [Action 2: another distinct beat within the clip]
- [Action 3: another action or dialogue line]
Dialogue:
[If the shot has dialogue, add short natural lines here or as part of the actions list.
Keep them brief so they match the clip length.]
Complete Examples
Example 1: Robot Workshop Scene
Style: Hand-painted 2D/3D hybrid animation with soft brush textures,
warm tungsten lighting, and a tactile, stop-motion feel. The aesthetic evokes
mid-2000s storybook animation—cozy, imperfect, full of mechanical charm.
Subtle watercolor wash and painterly textures; warm-cool balance in grade;
filmic motion blur for animated realism.
Inside a cluttered workshop, shelves overflow with gears, bolts, and yellowing blueprints.
At the center, a small round robot sits on a wooden bench, its dented body patched with
mismatched plates and old paint layers. Its large glowing eyes flicker pale blue as it
fiddles nervously with a humming light bulb. The air hums with quiet mechanical whirs,
rain patters on the window, and the clock ticks steadily in the background.
Cinematography:
Camera: medium close-up, slow push-in with gentle parallax from hanging tools
Lens: 35mm virtual lens; shallow depth of field to soften background clutter
Lighting: warm key from overhead practical; cool spill from window for contrast
Mood: gentle, whimsical, a touch of suspense
Actions:
- The robot taps the bulb; sparks crackle
- It flinches, dropping the bulb, eyes widening
- The bulb tumbles in slow motion; it catches it just in time
- A puff of steam escapes its chest—relief and pride
- Robot says quietly: "Almost lost it… but I got it!"
Background Sound:
Rain, ticking clock, soft mechanical hum, faint bulb sizzle.
Example 2: Rooftop Romance Scene
Style: 1970s romantic drama, shot on 35mm film with natural flares, soft focus,
and warm halation. Slight gate weave and handheld micro-shake evoke vintage intimacy.
Warm Kodak-inspired grade; light halation on bulbs; film grain and soft vignette
for period authenticity.
At golden hour, a brick tenement rooftop transforms into a small stage.
Laundry lines strung with white sheets sway in the wind, catching the last rays of sunlight.
Strings of mismatched fairy bulbs hum faintly overhead. A young woman in a flowing red silk
dress dances barefoot, curls glowing in the fading light. Her partner—sleeves rolled,
suspenders loose—claps along, his smile wide and unguarded. Below, the city hums with
car horns, subway tremors, and distant laughter.
Cinematography:
Camera: medium-wide shot, slow dolly-in from eye level
Lens: 40mm spherical; shallow focus to isolate the couple from skyline
Lighting: golden natural key with tungsten bounce; edge from fairy bulbs
Mood: nostalgic, tender, cinematic
Actions:
- She spins; her dress flares, catching sunlight
- Woman (laughing): "See? Even the city dances with us tonight."
- He steps in, catches her hand, and dips her into shadow
- Man (smiling): "Only because you lead."
- Sheets drift across frame, briefly veiling the skyline before parting again
Background Sound:
Natural ambience only: faint wind, fabric flutter, street noise, muffled music.
No added score.
Troubleshooting Common Issues
Results Too Random?
Solution: Add more framing, depth of field, and lighting anchor descriptions
Motion Unreadable?
Solution: Converge to “one camera move + one action”
Edits Don’t Flow?
Solution: Fix lighting logic and color palette
Character Inconsistent?
Solution: Reuse the same identity descriptions and phrasing
Summary and Best Practices
Core Takeaways
- API Parameters First - model, size, seconds must be explicitly set
- Concise vs. Detailed - Balance control and creative space based on needs
- One Shot, One Thing - One camera move + one subject action
- Visual Anchors - Use specific, visible descriptions instead of vague words
- Lighting Consistency - Maintain stable lighting logic across shots
- Iterate to Optimize - Use Remix for fine-tuning, not regeneration
Recommended Workflow
- Clarify Goals → What should this shot achieve?
- Set Parameters → Choose appropriate model, size, seconds
- Write Initial Prompt → Start concise, or use template
- Generate and Evaluate → Review multiple variants, select closest
- Remix Optimize → Targeted adjustments to selected version
- Edit and Integrate → Combine satisfying clips into project
Reference Resources
- Official Guide: OpenAI Cookbook - Sora 2 Prompting Guide
- API Documentation: Sora API Reference
- Image Generation: GPT Image Generation
- Prompt Library: Visit Sora2 Prompt Library for curated prompts and creative inspiration
- Community Discussion: Visit SoratoAI Community to exchange experiences with other creators
Copyright Notice: This guide is based on official OpenAI documentation (by Robin Koenig & Joanne Shin), localized and practice-optimized by the SoratoAI community.