baz media
Manage media files in your project
baz media [options] [command]| Option | Description |
|---|---|
-h, --help | display help for command |
baz media upload
Upload an image, video, or audio file to the media library
baz media upload [options] <file>| Option | Description |
|---|---|
--name <name> | Custom name for the uploaded file |
-h, --help | display help for command |
baz media list
List uploaded media files for the active project
baz media list [options]| Option | Description |
|---|---|
--type <type> | Filter by type: image, video, audio, logo |
--limit <n> | Max items to show Default: 50. |
-h, --help | display help for command |
baz media compress
Generate lightweight preview renditions for all project video media (run after batch-uploading clips)
baz media compress [options]| Option | Description |
|---|---|
--info | Report master vs preview sizes without compressing anything |
--fix-masters | Losslessly remux non-faststart masters in place first (fixes slow Lambda frame extraction on first export) |
-h, --help | display help for command |
baz media transcribe-voiceover
Transcribe an uploaded audio asset with Whisper word timestamps and persist voiceover alignment metadata
baz media transcribe-voiceover [options] <assetId>| Option | Description |
|---|---|
-h, --help | display help for command |
baz media tts
Generate a voiceover audio asset directly in Bazaar
Alias: voiceover
baz media tts [options] <text>| Option | Description |
|---|---|
--provider <provider> | TTS provider: gemini, minimax, or elevenlabs Default: gemini. |
--voice-id <voiceId> | Provider voice ID. Gemini accepts Gemini voice names; ElevenLabs requires a connected voice |
--voice-style <style> | Gemini voice direction, e.g. warm documentary narrator, calm British presenter |
--emotion <emotion> | MiniMax emotion hint, e.g. happy, sad, calm, excited |
--speed <speed> | MiniMax speed: 0.5-2.0 |
--pitch <pitch> | MiniMax pitch: -12 to 12 |
--raw | Use text exactly as the spoken script; disables MiniMax AI voice interpretation |
-h, --help | display help for command |
What it does:
Generates one voiceover/audio asset directly in Bazaar and attaches it to the active project.
Use this when you want reusable narration audio without going through the chat agent.
Pricing estimate:
--provider gemini $0.08 per started minute (Gemini 3.1 Flash TTS Preview)
--provider minimax $0.06 per started 1k characters (Minimax Speech)
--provider elevenlabs $0.44 per estimated minute, or $0.00/min with connected BYOK
Provider notes:
gemini is the default and uses Google Gemini TTS voices through Bazaar.
gemini supports --voice-id plus --voice-style for delivery direction.
If --voice-style is omitted, Bazaar reuses the project's stored voice style or derives one from project context.
Supported Gemini voice IDs include Kore (default), Zephyr, Puck, Charon, Aoede, Leda, Orus, Fenrir.
minimax supports emotion, speed, pitch, and optional AI voice interpretation.
elevenlabs requires a connected ElevenLabs key/default voice or a --voice-id.
Examples:
baz media tts --provider gemini "Welcome to the listing. This home opens with warm natural light."
baz media tts --provider gemini --voice-id Zephyr --voice-style "warm, calm, real-estate documentary narrator" "A calm editorial narration for a finance explainer."
baz media tts --provider minimax --emotion happy --speed 1.05 "Three reasons this product launch matters."
Live pricing:
baz balance --pricingbaz media imagegen2
Generate an Image Gen 2 image directly in Bazaar (verbatim prompt, no agent rewriting)
baz media imagegen2 [options] <prompt>| Option | Description |
|---|---|
--provider <provider> | Image provider: openai, nanobanana, or flux Default: openai. |
--aspect-ratio <ratio> | Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4, 21:9 Default: 1:1. |
--format <fmt> | Output format: jpg, png, webp Default: jpg. |
--input-image <path> | Repeatable reference image input. Limits: openai=1, nanobanana=3, flux=8 |
-h, --help | display help for command |
What it does:
Generates or edits one image directly through Bazaar Image Gen 2.
The prompt is sent verbatim; the autonomous video agent does not rewrite it.
The output is uploaded to Bazaar media and returned as a reusable URL/asset id.
Inputs:
prompt Required text prompt.
--input-image <path> Repeatable reference image input. Limits: openai=1, nanobanana=3, flux=8.
--aspect-ratio Output shape. Common: 1:1, 16:9, 9:16, 4:3, 3:4, 21:9.
--format Output image format: jpg, png, or webp.
Pricing estimate:
--provider openai $0.10 per image (OpenAI GPT Image 2)
--provider nanobanana $0.08 per image (NanoBanana)
--provider flux $0.08 per image (FLUX 2 Pro)
If --provider is omitted, the CLI uses openai/Image Gen 2.
Provider notes:
openai accepts at most one --input-image.
flux is best for clean text-to-image generation.
nanobanana is available as a separate provider/fallback.
Examples:
baz media imagegen2 --aspect-ratio 16:9 "Luxury real estate dusk exterior, cinematic"
baz media imagegen2 --input-image ./kitchen.jpg "Make this kitchen brighter and editorial"
baz media imagegen2 --provider flux --format webp "Minimal SaaS dashboard hero image"
Live pricing:
baz balance --pricingbaz media seedance2
Generate a Seedance 2.0 video directly in Bazaar
baz media seedance2 [options] <prompt>| Option | Description |
|---|---|
--model <model> | Seedance model ID |
--aspect-ratio <ratio> | Aspect ratio: 16:9, 9:16, 1:1 Default: 16:9. |
--resolution <resolution> | Resolution: 480p, 720p, 1080p Default: 720p. |
--duration <seconds> | Duration in seconds: 2-12 Default: 5. |
--image <path> | First-frame image for image-to-video |
--last-frame <path> | Last-frame image for frame interpolation |
-h, --help | display help for command |
What it does:
Generates a short Seedance 2.0 AI video directly through Bazaar media generation.
The prompt is sent verbatim; the autonomous video agent does not rewrite it.
The output is uploaded to Bazaar media and returned as a reusable URL/asset id.
Models:
doubao-seedance-2-0-260128 Seedance 2.0 on Volcengine Ark
doubao-seedance-2-0-fast-260128 Seedance 2.0 Fast on Volcengine Ark
dreamina-seedance-2-0-260128 Seedance 2.0 on BytePlus ModelArk
dreamina-seedance-2-0-fast-260128 Seedance 2.0 Fast on BytePlus ModelArk
If --model is omitted, Bazaar uses the server-configured default model.
Inputs:
prompt Required text prompt.
--image <path> Optional first-frame image for image-to-video.
--last-frame <path> Optional ending frame for interpolation when supported.
--aspect-ratio Output video shape: 16:9, 9:16, or 1:1.
--resolution Output resolution: 480p, 720p, or 1080p.
--duration Output duration in seconds: 2-12.
Pricing estimate:
Seedance 2 is billed from actual Ark token usage returned by the completed task.
Bazaar price uses a 50% gross margin over the provider token cost.
Standard, no video input: 480p/720p $0.0140/1K tokens, 1080p $0.0154/1K tokens
Standard, video input: 480p/720p $0.0086/1K tokens, 1080p $0.0094/1K tokens
Fast, no video input: 480p/720p $0.0112/1K tokens
Fast, video input: 480p/720p $0.0066/1K tokens
Seedance 2.0 Fast does not support 1080p output; use standard Seedance 2.0 for 1080p.
Reference frames:
--image uploads a first-frame image for image-to-video.
--last-frame uploads an ending frame for interpolation when supported by the model.
Image reference frames do not count as Ark video input for pricing; the CLI normally uses the no-video-input token rate.
Examples:
baz media seedance2 --resolution 720p --duration 5 "Slow dolly through a sunlit real estate living room"
baz media seedance2 --aspect-ratio 9:16 --duration 6 --image ./exterior.jpg "Subtle cinematic push-in at golden hour"
baz media seedance2 --model dreamina-seedance-2-0-260128 --resolution 720p "Polished product teaser"
baz media seedance2 --model doubao-seedance-2-0-fast-260128 --resolution 480p "Fast social teaser for a property listing"
Live pricing:
baz balance --pricing