baz.studio API

baz media

Manage media files in your project

baz media [options] [command]
OptionDescription
-h, --helpdisplay help for command

baz media upload

Upload an image, video, or audio file to the media library

baz media upload [options] <file>
OptionDescription
--name <name>Custom name for the uploaded file
-h, --helpdisplay help for command

baz media list

List uploaded media files for the active project

baz media list [options]
OptionDescription
--type <type>Filter by type: image, video, audio, logo
--limit <n>Max items to show Default: 50.
-h, --helpdisplay help for command

baz media compress

Generate lightweight preview renditions for all project video media (run after batch-uploading clips)

baz media compress [options]
OptionDescription
--infoReport master vs preview sizes without compressing anything
--fix-mastersLosslessly remux non-faststart masters in place first (fixes slow Lambda frame extraction on first export)
-h, --helpdisplay help for command

baz media transcribe-voiceover

Transcribe an uploaded audio asset with Whisper word timestamps and persist voiceover alignment metadata

baz media transcribe-voiceover [options] <assetId>
OptionDescription
-h, --helpdisplay help for command

baz media tts

Generate a voiceover audio asset directly in Bazaar

Alias: voiceover

baz media tts [options] <text>
OptionDescription
--provider <provider>TTS provider: gemini, minimax, or elevenlabs Default: gemini.
--voice-id <voiceId>Provider voice ID. Gemini accepts Gemini voice names; ElevenLabs requires a connected voice
--voice-style <style>Gemini voice direction, e.g. warm documentary narrator, calm British presenter
--emotion <emotion>MiniMax emotion hint, e.g. happy, sad, calm, excited
--speed <speed>MiniMax speed: 0.5-2.0
--pitch <pitch>MiniMax pitch: -12 to 12
--rawUse text exactly as the spoken script; disables MiniMax AI voice interpretation
-h, --helpdisplay help for command
What it does:
  Generates one voiceover/audio asset directly in Bazaar and attaches it to the active project.
  Use this when you want reusable narration audio without going through the chat agent.

Pricing estimate:
  --provider gemini      $0.08 per started minute  (Gemini 3.1 Flash TTS Preview)
  --provider minimax     $0.06 per started 1k characters  (Minimax Speech)
  --provider elevenlabs  $0.44 per estimated minute, or $0.00/min with connected BYOK

Provider notes:
  gemini is the default and uses Google Gemini TTS voices through Bazaar.
  gemini supports --voice-id plus --voice-style for delivery direction.
  If --voice-style is omitted, Bazaar reuses the project's stored voice style or derives one from project context.
  Supported Gemini voice IDs include Kore (default), Zephyr, Puck, Charon, Aoede, Leda, Orus, Fenrir.
  minimax supports emotion, speed, pitch, and optional AI voice interpretation.
  elevenlabs requires a connected ElevenLabs key/default voice or a --voice-id.

Examples:
  baz media tts --provider gemini "Welcome to the listing. This home opens with warm natural light."
  baz media tts --provider gemini --voice-id Zephyr --voice-style "warm, calm, real-estate documentary narrator" "A calm editorial narration for a finance explainer."
  baz media tts --provider minimax --emotion happy --speed 1.05 "Three reasons this product launch matters."

Live pricing:
  baz balance --pricing

baz media imagegen2

Generate an Image Gen 2 image directly in Bazaar (verbatim prompt, no agent rewriting)

baz media imagegen2 [options] <prompt>
OptionDescription
--provider <provider>Image provider: openai, nanobanana, or flux Default: openai.
--aspect-ratio <ratio>Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4, 21:9 Default: 1:1.
--format <fmt>Output format: jpg, png, webp Default: jpg.
--input-image <path>Repeatable reference image input. Limits: openai=1, nanobanana=3, flux=8
-h, --helpdisplay help for command
What it does:
  Generates or edits one image directly through Bazaar Image Gen 2.
  The prompt is sent verbatim; the autonomous video agent does not rewrite it.
  The output is uploaded to Bazaar media and returned as a reusable URL/asset id.

Inputs:
  prompt                 Required text prompt.
  --input-image <path>   Repeatable reference image input. Limits: openai=1, nanobanana=3, flux=8.
  --aspect-ratio         Output shape. Common: 1:1, 16:9, 9:16, 4:3, 3:4, 21:9.
  --format               Output image format: jpg, png, or webp.

Pricing estimate:
  --provider openai      $0.10 per image  (OpenAI GPT Image 2)
  --provider nanobanana  $0.08 per image  (NanoBanana)
  --provider flux        $0.08 per image  (FLUX 2 Pro)
  If --provider is omitted, the CLI uses openai/Image Gen 2.

Provider notes:
  openai accepts at most one --input-image.
  flux is best for clean text-to-image generation.
  nanobanana is available as a separate provider/fallback.

Examples:
  baz media imagegen2 --aspect-ratio 16:9 "Luxury real estate dusk exterior, cinematic"
  baz media imagegen2 --input-image ./kitchen.jpg "Make this kitchen brighter and editorial"
  baz media imagegen2 --provider flux --format webp "Minimal SaaS dashboard hero image"

Live pricing:
  baz balance --pricing

baz media seedance2

Generate a Seedance 2.0 video directly in Bazaar

baz media seedance2 [options] <prompt>
OptionDescription
--model <model>Seedance model ID
--aspect-ratio <ratio>Aspect ratio: 16:9, 9:16, 1:1 Default: 16:9.
--resolution <resolution>Resolution: 480p, 720p, 1080p Default: 720p.
--duration <seconds>Duration in seconds: 2-12 Default: 5.
--image <path>First-frame image for image-to-video
--last-frame <path>Last-frame image for frame interpolation
-h, --helpdisplay help for command
What it does:
  Generates a short Seedance 2.0 AI video directly through Bazaar media generation.
  The prompt is sent verbatim; the autonomous video agent does not rewrite it.
  The output is uploaded to Bazaar media and returned as a reusable URL/asset id.

Models:
  doubao-seedance-2-0-260128          Seedance 2.0 on Volcengine Ark
  doubao-seedance-2-0-fast-260128     Seedance 2.0 Fast on Volcengine Ark
  dreamina-seedance-2-0-260128        Seedance 2.0 on BytePlus ModelArk
  dreamina-seedance-2-0-fast-260128   Seedance 2.0 Fast on BytePlus ModelArk
  If --model is omitted, Bazaar uses the server-configured default model.

Inputs:
  prompt                 Required text prompt.
  --image <path>         Optional first-frame image for image-to-video.
  --last-frame <path>    Optional ending frame for interpolation when supported.
  --aspect-ratio         Output video shape: 16:9, 9:16, or 1:1.
  --resolution           Output resolution: 480p, 720p, or 1080p.
  --duration             Output duration in seconds: 2-12.

Pricing estimate:
  Seedance 2 is billed from actual Ark token usage returned by the completed task.
  Bazaar price uses a 50% gross margin over the provider token cost.
  Standard, no video input: 480p/720p $0.0140/1K tokens, 1080p $0.0154/1K tokens
  Standard, video input:    480p/720p $0.0086/1K tokens, 1080p $0.0094/1K tokens
  Fast, no video input:     480p/720p $0.0112/1K tokens
  Fast, video input:        480p/720p $0.0066/1K tokens
  Seedance 2.0 Fast does not support 1080p output; use standard Seedance 2.0 for 1080p.

Reference frames:
  --image uploads a first-frame image for image-to-video.
  --last-frame uploads an ending frame for interpolation when supported by the model.
  Image reference frames do not count as Ark video input for pricing; the CLI normally uses the no-video-input token rate.

Examples:
  baz media seedance2 --resolution 720p --duration 5 "Slow dolly through a sunlit real estate living room"
  baz media seedance2 --aspect-ratio 9:16 --duration 6 --image ./exterior.jpg "Subtle cinematic push-in at golden hour"
  baz media seedance2 --model dreamina-seedance-2-0-260128 --resolution 720p "Polished product teaser"
  baz media seedance2 --model doubao-seedance-2-0-fast-260128 --resolution 480p "Fast social teaser for a property listing"

Live pricing:
  baz balance --pricing