Zhiyuan Song
← Home

Novel Studio

Novel Studio is an AI-driven serial fiction workbench and a canonical multi-agent system: you steer like a director while Chat Agent, a deterministic orchestrator, and LLM workers (plan / write / QA / summarize) execute behind the scenes. World bible, character sheets, timelines, and thread hooks live in a Canon store as the single source of truth; chapters flow blueprint → draft → QA → canonization. Retrieval aims at story-state RAG long term: the MVP ships L0 precise recall, then layers L1 semantics after validation—overall a hybrid RAG roadmap (see “Two-layer RAG plan” below).

Repository: https://github.com/songzhiyuan98/Novel-Studio (public GitHub: A Multi-Agent System for Long-Form Fiction Generation)

Features & implementation progress

Core pipeline runs end-to-end; some capabilities still iterate. Table aligned with README.

Capability Status
Chat with AI (ideation, intent routing) Shipped
Chapter pipeline: direction → blueprint → write → QA → canonize Shipped
Canon memory: characters, rules, timelines, thread tracking Shipped
Layered character sheets (core / important / episodic), editable Shipped
World settings, chapter reader (scene tags + summaries) Shipped
Multi-model: DeepSeek (write/QA) + GPT-4o-mini (plan/chat) Shipped
Cost tracking, orchestration traces (per-agent visibility) Shipped
Project templates Shipped
Per-scene rewriting Backend done / frontend wiring pending
Streaming output in UI Backend done / frontend partial
Impact analysis for setting changes Planned

RAG strategy: two-layer plan

Authoritative spec: docs/superpowers/specs/2026-03-24-core-architecture-design.md. This page is a condensed mirror.

RAG is two layers: L0 ships with the MVP; L1 arrives after MVP validation. Experimental notes (e.g. rag-optimization-strategy.md) may discuss engineering tweaks, but product framing stays L0 / L1.

Layer Name Phase One-liner
L0 Structured canon recall MVP, day one No vectors; confirmed canon in structured storage + scene-card metadata driving exact ID/key queries
L1 Semantic retrieval Post-MVP Embedding search for conceptual links key queries miss; coexists with L0 as additive, not replacement

L0 — structured canon recall (MVP)

What it is

Not generic vector RAG. Canon lives as structured JSON in the Canon Store; scene cards carry structured tags (characters, locations, threads, callbacks, etc.) that act as database query keys. The Packet Compiler issues queries, pulls matching character state, world rules, thread state, etc., and packs them into each worker’s packet under token budget.

What L0 pulls (confirmed canon only)

  • Story bible / world rule entries
  • Character sheets and state summaries
  • Chapter summaries (Layer 0: per chapter)
  • Volume summaries (Layer 1: ~every 100 chapters; may be placeholders for the five-chapter MVP acceptance; see design doc)
  • Timeline events, open threads, development chains

Token budget (P0–P4)

Each worker call has a fixed context budget; the Packet Compiler fills by priority until exhausted:

P0 — Hard constraints (must include: current scene card, chapter goals, style guardrails, output contract, etc.)
P1 — Current state (almost always: characters/relationships in this chapter, active threads, etc.)
P2 — Recent context (important: last 3 chapter summaries L0, current volume summary L1, recent timeline, etc.)
P3 — Distant reference (as needed: historical volume summaries, full world rules, historical relationship shifts)
P4 — Supplementary (if budget remains: development chains, recurring QA themes, etc.)

Per-worker field breakdown lives in the design spec and docs/api-spec.md.

Why L0 is enough for MVP long-context

The spec’s key insight: lean on layered summaries + state tables, not MVP-stage vectors.

Example (chapter ~350, writer packet scale):

Volumes 1–3 summaries (Layer 1)   ~1500 chars each
Last 3 chapter summaries (Layer 0) ~1500 chars total
Relevant character state (live)      ~1000 chars
Active threads, etc.                 ~500 chars
────────────────────────────────────
~8000 chars ≈ ~12K tokens of canon context

That is roughly 12K tokens of canon context spanning very long serial history; scene-card tags signal what to retrieve, replacing semantic search duties in the MVP.

L1 — semantic retrieval (post-MVP)

Goal

Add embedding-based search to surface conceptual links exact key matching cannot cover.

Problems L1 solves (examples)

  • Distant-chapter foreshadow / callback motifs related to the current chapter
  • Conceptually similar conflicts or motifs
  • Thematic ties across the story
  • Development-chain nodes still relevant when scene cards omit explicit tags

Preconditions before starting L1 (per spec)

Do not start L1 implementation until these hold—avoid stacking complexity on an unstable pipeline:

  • Artifact schema is stable
  • Chapter summaries exist with reliable quality
  • Canon read paths are proven on production-like flows
  • QA report / contract schemas are stable
  • L0 packet assembly is tested across multiple consecutive chapters (spec example: 5+)

L0 + L1 coexistence: migration framing (spec)

L1 augments L0, never replaces it. The Packet Compiler pipeline is:

1. L0: structured queries from scene-card keys (always)
2. L1: semantic search fills remaining budget with supplemental context
3. Merge + dedupe
4. Truncate to total token budget and write the final packet

After L1 ships, keyword reranking, QA-triggered narrow second passes, etc. are implementation details inside L1—the product still reads “L0 first, L1 second, then merge.”

Retrieval by worker (spec)

Role Retrieval Notes
Chat Agent Does not use RAG directly Light views from Canon Store (e.g. project summary) for chat and intent routing
Planner L0 (+ future L1) Next-chapter planning: dependencies, callbacks, open threads, development nodes, etc.
Writer L0 (+ future L1) Canon + recent story state for the current scene; emphasizes in-scene character/relationship state and active threads
QA L0 (+ future L1) Pull evidence sources for potential conflicts; compare draft claims to confirmed canon entries
Summarizer No retrieval Input is the full chapter text (+ required role lists per spec); LLM compresses to structured output

MVP boundary (aligned with spec)

The core spec states the MVP excludes embeddings / vector retrieval (no product L1); it includes L0 scene-card recall, chapter summarization pipeline, artifact versioning, and state machines. Acceptance includes: chapter 5’s context packet correctly references settings confirmed since chapter 1, and drafts and rejections never enter later packets.

“In progress” here means the mainline implements L0; L1 and heavier retrieval advance after the gates above—track implementation in docs/mvp-todolist.md.

Four architecture planes

Plane Responsibilities (summary)
UX Chat + option chips, orchestration traces, approve/reject, artifact editing, status surfaces (business rules stay out of UI)
Control Chat Agent (LLM, sole user voice) + Orchestrator (pure code: state machine, packets, dispatch, canon gates)
Knowledge PostgreSQL: projects, versioned artifacts, Canon Store (confirmed: rules, character state, relationships, chapter/volume summaries, timelines, threads, development chains, etc.)
Generation Planner / Writer / QA / Summarizer (stateless single-shot calls); Vercel AI SDK multi-provider; prompts + Zod output contracts

Knowledge centers on Canon Store; MVP enables L0 structured recall only; L1 semantic retrieval stacks after spec gates—see “Two-layer RAG plan.”

Orchestration & agent roles

User ↔ Chat Agent (GPT-4o-mini) → Orchestrator (deterministic code)
                                        ├── Planner (GPT-4o-mini)
                                        ├── Writer (DeepSeek)
                                        ├── QA (DeepSeek)
                                        └── Summarizer (DeepSeek)
                                        ↕
                            Knowledge: PostgreSQL + Canon Store

Key design: the Orchestrator is testable deterministic code, not an LLM. Routing, transitions, and packet assembly are code-guaranteed; workers do not read full chat logs or dispatch to each other. The chain is always user → Chat → Orchestrator → worker → return.

  • Chat Agent: intent parsing, light ideation, hands pipeline work to the Orchestrator; cannot rewrite canon directly or bypass QA.
  • Orchestrator: state machine, token-budget packets, dispatch, safety caps, audit/cost, blueprint approval gate, triggers Summarizer after canonization; may run green/yellow/red impact analysis on setting changes (per spec).
  • Planner: expands user outlines, or brainstorms directions from canon/threads then expands into a blueprint (per-scene goals, emotional beats, dialogue reversals, etc.).
  • Writer: streams drafts under an approved blueprint; executes the blueprint strictly.
  • QA: continuity, motivation, pacing, style, foreshadow logic, blueprint coverage, character-sheet compliance; structured decision + cited evidence.
  • Summarizer: after canonization, produces chapter summaries and drives incremental Canon Store updates.

Workflow phases (summary)

Phase 0 — project bootstrap

Create project (title, genre, tone, logline, etc.) → provision DB, initial brief, ProjectTemplate formatting knobs, initial open issues.

Phase 1 — story foundation (once per project)

Feasibility pass → draft world rules, layered character sheets, relationship graph, high-level outline, development chains, etc. → user edits/rejects/confirms → write canon projections.

Phase 2–N — chapter loop

  1. Blueprint: Planner in “expand outline” or “brainstorm then expand” mode; no writing until the user approves the blueprint.
  2. Write: Orchestrator compiles Writer packet (blueprint + canon slices + style, etc.) under budget, streams draft.
  3. QA: auto-dispatch; pass / pass_with_notes / revise / block; optional per-scene rewrite loops.
  4. Canonize: on user confirm, update chapter state, call Summarizer, parse outputs to increment CharacterState, Timeline, Thread, etc., with audit logs.

Branching & revision

Uncertainty paths: orchestrator-side parsing, Chat asks 1–3 precise user questions, or continue with “provisional assumptions” (non-canon until confirmed). Mid-flight setting changes can run impact tiers (green/yellow/red) with user confirmation (see design doc).

Packets & token budget

Each worker receives a distinct, compact, task-specific packet compiled under budget; formatting knobs (chapter length, volume scale, style baselines, etc.) come from ProjectTemplate, not hard-coded defaults in compiler logic. This section matches the P0–P4 ladder and L0→L1 merge strategy in Two-layer RAG plan.

Fill order (shared across workers)

P0 — Hard constraints (blueprint/scene, goals, output contract)
P1 — Current state (character sheets & state, relationships, active threads)
P2 — Near context (last 3 chapter summaries, current volume summary)
P3 — Distant reference (historical volume summaries, full world rules, etc.)
P4 — Supplementary (development chains, historical QA, etc.)

Planner / Writer / QA / Summarizer each have agreed fields (e.g. Writer binds blueprint, per-scene character state, style_profile); output contracts live in docs/api-spec.md.

When asking follow-ups, follow a minimal question set: at most ~three high-signal questions, prefer chips/cards.

QA & release policy

QA is a narrative release gate, not cosmetic polish: stateless worker, inputs are compiled chapter + relevant canon slices, output is a structured report.

Check dimensions

  • Continuity (violations of confirmed canon, character state, chronology)
  • Motivation integrity (major actions vs known pressures and motives)
  • Pacing (scene balance)
  • Style compliance (POV, tense, density vs style profile)
  • Setup/payoff (missing important callbacks, messy new hooks)

Evidence rules

Medium/high severity issues must cite evidence: source_type, source_id, excerpts, etc.

Release paths

  • pass / pass_with_notes → show user → can canonize
  • revise → show issues + suggested fixes; user taps “revise” as a new user action (resets task counters) before rewrite or QA rerun
  • block → halt until user resolves or replans

Safety & hard rules

  • Workers must not read full raw session history; work from packets.
  • Workers must not dispatch to each other; only the Orchestrator dispatches.
  • Only explicit user confirmation promotes content into canon.
  • Writer dispatch requires a prior blueprint approval.
  • Per user action: MAX_TASKS_PER_USER_ACTION = 5, MAX_TOKENS_PER_TASK = 50,000, MAX_TOTAL_TOKENS_PER_ACTION = 150,000; a typical chapter Plan+Write+QA is three tasks; revisions reset counters via a new action to prevent unbounded loops.

Tech stack

Layer Choice
Runtime Node.js + TypeScript
LLM Vercel AI SDK (OpenAI + DeepSeek)
Database PostgreSQL + Drizzle ORM
API Hono
Frontend Next.js 15 + Tailwind CSS
Monorepo pnpm workspace
Containers Docker Compose

Repository layout (summary)

apps/
  api/     # Hono, port 3001
  web/     # Next.js, port 3002
packages/
  core/           # shared types
  orchestrator/   # workflow engine, packet compile, guardrails, canon projection
  prompts/        # prompt templates + Zod output schemas
  llm-adapter/    # Vercel AI SDK wrapper, multi-provider, token accounting
docs/             # design specs, plans, architecture

Tests & measured cost (excerpt)

  • ~98 unit tests across six packages.
  • Five-chapter end-to-end validation passed.
  • DeepSeek Writer: ~2842–4471 characters per chapter (target band ~2000–3000 Chinese characters).
  • Cost reference: ~$0.032 / 5 chapters (varies with model pricing).

Quick start

git clone https://github.com/songzhiyuan98/Novel-Studio.git
cd Novel-Studio
cp .env.example .env
# Fill OPENAI_API_KEY, DEEPSEEK_API_KEY, etc.

docker compose up -d
# PostgreSQL :5432 · API :3001 · Web :3002

# First run: install deps, migrate schema, seed data
pnpm install
cd apps/api
DATABASE_URL=postgresql://novel_studio:novel_studio_dev@localhost:5432/novel_studio npx drizzle-kit push
DATABASE_URL=postgresql://novel_studio:novel_studio_dev@localhost:5432/novel_studio npx tsx src/db/seed.ts

# Open in browser
open http://localhost:3002

Run API/Web locally (DB still via Docker)

docker compose up postgres -d
cd apps/api && pnpm dev    # :3001
cd apps/web && pnpm dev    # :3002

Documentation index (in repo)

Path Contents
docs/superpowers/specs/2026-03-25-design-v2.md Full design v2 (23 decisions)
docs/mvp-todolist.md Implementation progress
docs/data-models.md DB schema (~17 tables)
docs/agents.md Agent responsibilities
docs/workflow.md Chapter production workflow
docs/api-spec.md API & internal contracts

Disclaimer

This page excerpts a personal project description to showcase architecture and roadmap; the repository Markdown and code are authoritative. Generated content is not professional advice; third-party LLM use must follow their terms and data policies.