2026-05-08
The previous post on AI tools that actually moved the needle covered which tools earned a spot. This one covers how they're plugged together. Less hype, more "here's the boring stack, here's where it lives in the workflow, here's what's still hands-on."
The shape of the stack matters because the breathless AI-replaces-everything pitch and the dismissive AI-is-just-hype pitch both miss the actual story: AI is most useful when it's invisible inside an otherwise normal production workflow. Not the centerpiece. Just another tool on the truck.
The studio runs an agent layer called OpenClaw on top of the core production stack. In practical terms, this means: Slack, Discord, email, calendar, and file-system events all flow through an agent that can answer, route, draft, summarize, schedule, and act on them with explicit approval. It's the connective tissue between "human knows the work" and "AI handles the parts that don't need human judgment."
Where this shows up in client work:
The non-obvious part: the agent is more useful for what it prevents than for what it produces. Missed messages, scheduling overlaps, things falling through cracks — gone. The cost of context-switching between channels has dropped to near zero.
All image and video generation routes through Krea — a unified API that brokers across the major model providers (Kling, Hailuo, Seedance, gpt-image, nano-banana, ideogram, Flux Kontext, and dozens more) without the studio needing to maintain accounts or quotas with each upstream provider individually.
Why route through one platform instead of going direct:
What the studio uses Krea for in practice:
What the studio still does the hard way: the actual hero deliverable. Final commercials, final case-study video, final brand spots — those are shot, edited, and graded by humans on real cameras. Krea accelerates the path to those deliverables; it doesn't replace them.
The voiceover work on owned channels is still mine. AI voice cloning got tested for the brand voice work and got dropped — see the previous post for why.
Where AI voice does earn its keep: transcription. Whisper-class models running locally (or in the cloud through the same agent layer) handle interview transcription, edit-pass subtitle generation, and rough metadata for archive search. The voice generation lane is closed for client deliverables. The voice recognition lane is wide open and saves hours per project.
The edit room uses AI selectively — auto-cut to a music track for a v0 assembly, auto-pull selects from interview footage, auto-arrange B-roll to a narrative arc. None of these are final-cut quality. All of them save real hours getting to the first watchable rough cut, which is when the actual creative editorial work begins.
The pattern is consistent: AI for the v0, human for the v1+. The first watchable assembly happens 50–70% faster. The actual taste, pacing, and story decisions are still done by an editor who knows what the brand actually sounds like.
Take a typical mid-week shoot day for a multi-property brand client:
The human is doing the hard work. The agent is removing every form of friction that isn't the hard work. That's the entire pitch.
Studios that try to AI-replace the actual creative work are going to ship forgettable content and lose to studios that don't. Studios that refuse to integrate AI into pre-production and post-production are going to lose to studios that ship in half the time.
The middle path — AI as connective tissue, AI as v0-accelerator, AI as ideation surface, human as the last mile — is going to look like the obvious move in three years. Right now it's still a competitive advantage because most working studios haven't built the stack.
If you're a peer producer reading this, the question I'd start with isn't "what AI tool should I try?" It's "what part of my workflow eats my calendar that has nothing to do with the actual creative work?" That's where the agent layer earns its keep. The visual-gen and editorial-AI tools come second.
If you're a brand or operator reading this, the question is simpler: does the studio I'm working with show up faster, more consistent, and more responsive than the last one? The stack is one explanation for why the answer is yes here. The other explanation is the human keeps showing up.