Video content creation sounds simple, but what if you don’t have time to:
• Write the script, • Prepare the visuals, • Generate the voiceover, • Create the subtitles, • And finally render the video?
This is why we built Noustiny on top of @NousResearch Hermes Agent by adding 12 generic Hermes tools + 13 generic Hermes skills, bringing the whole process into one single flow.
How does it work? Let’s take a closer look 👇
————
1- Story state: context, tree, motifs: Hermes had no built-in narrative-state primitive for tracking canon, branching story structure, and recurring motifs.
So we added three generic Hermes tools for this:
→ story_tree_graph: Manages the story tree structure. It handles operations like canon path, descendants, and splice insertion points. → narrative_context_builder: Walks the canon chain and returns the live context every narrative skill should reason against. This includes recent chain, mood, and character state. → motif_tracker: Remembers recurring motifs across the story arc. For example, a sword introduced in beat 2 can reappear meaningfully in later scenes.
————
2- Character / cast pipeline:
Hermes had no built-in primitive for cast extraction or character continuity.
So we added a four-tool character pipeline:
→ story_copyright_detector: Handles IP scrubbing. For example, “Iron Man” is converted into an IP-free character description before the image API ever sees it. → character_sheet_builder: Produces 1 to 4 characters. For each character, it creates an IP-free visual description and a hero-portrait prompt. These portraits become the reference frames used across later storyboard scenes. → character_registry_lookup: Finds a character by name inside the cast sheet and attaches the correct portrait reference to each beat. → character_alias_resolver: Resolves aliases like “Mr. Stark” into the main character name. This way, the same character keeps one portrait reference even if they appear under different names.
————
3- Voice pipeline:
Hermes had no built-in primitive for audio acquisition or voice cloning.
So we added the full voice chain, and the agent dispatches it autonomously in order:
→ narration_voice_director: The director-agent reads the seed + story and returns persona_label, search_query, and fallback_query. → voice_sample_builder: Uses yt-dlp + ffmpeg. It accepts a URL, an 11-character ID, or a free-text query. It runs ytsearch5 with dead-video tolerance and normalizes the audio to 24 kHz mono PCM. → voice_clone_synthesize: Wraps ElevenLabs IVC + timestamps. The voice ID is cached by reference SHA. Per-character alignment comes through the same audio call at no extra cost. → voice_clone_cleanup: Frees the cached voice ID after render so orphan voices do not accumulate.
————
4- Render:
Hermes had no built-in video-render entry.
So we added the final render tool:
→ noustiny_storybook: The agent dispatches it as the final step of the chain. One tool call drives the FastAPI render service end to end and emits the mp4.
————
5- Skills: 13 generic Hermes skills added into skills/creative/:
The branching engine in Noustiny works like a council of narrative skills.
Each skill is loaded by the gateway as a system prompt and orchestrated in this order:
→ narrative-brainstorm: Proposes 2 to 3 next-checkpoint options from the canon chain. → narrative-writer-assist: Writes a spliced insert beat that fits the parent and child. → narrative-continuity-critic: Audits downstream beats against the new insert. → narrative-rewriter: Updates the stale beats flagged by the continuity critic. → narrative-judge: Approves or rejects the rewrite against the original flow. → narrative-scene-qa: Checks each beat for consistency, length, and register. → narrative-writer: Finalizes the chosen branch as polished prose.
After one splice, this cascade walks downstream by itself until the canon becomes coherent again.
————
6- Visual + IP pipeline:
On the visual side, the goal is not just generating scenes. It is also preserving character continuity and IP safety.
This pipeline runs through these skills:
→ visual-prompt-builder: Turns a beat into an IP-free image prompt and reads the character-sheet references. → scene-composition: Defines shot framing, scene composition, and layout rules. → story-copyright-detector: Skill counterpart of the same-named tool. It can be used for direct slash-command invocation. → character-sheet-builder: Skill counterpart of the same-named tool. Defines cast extraction rules and the IP-free portrait-prompt format used to seed character consistency across the storyboard. → storybook-intro: Generates the cinematic intro page for the render.
————
7- Voice skill:
→ narration-voice-director: Defines persona reasoning rules and supports the decision logic behind the same-named voice tool.
————
8- Pattern:
Hermes baseline already had the gateway, agent loop, skill registry, and tool registry.
We extended that foundation with 12 generic Hermes tools + 13 generic Hermes skills and organized the system into four main pipelines:
• story-state • character continuity • voice • render
The important part is this: Noustiny is not a hardcoded system locked inside a single app. A Telegram bot, Discord bot, CLI session, or third-party Next.js app can call the same gateway and use the same tool + skill chains.
- No app glue. - No hardcoded prompts. - A drop-in, registry-compatible, agent-native video creation flow.
✅Github: https://github.com/UfukNode/Noustiny