Weeks are planning cadence; checkboxes follow the latest commits and ROADMAP.
Phase 1: foundation (~weeks 1–2)
Goal: core document pipeline + basic generation loop.
Week 1 (infrastructure)
- Monorepo setup (done)
- PostgreSQL + pgvector (done, 2025-09-08)
- NextAuth baseline (done, 2025-09-08)
- File upload + MinIO / S3 (done, 2025-09-08)
- Current focus: PDF / TXT parsing (Node.js)
- Chunking and anchor model
Week 2 (selection & generation)
- PDF.js reader, text selection, segment management
- BullMQ queue wiring
- Baseline AI translation and summaries
- Embeddings + similarity retrieval, review UI for outputs
- Folder organization (done, 2025-09-08)
Phase 2: feature depth (~weeks 3–4)
Goal: advanced document capabilities + exam MVP.
Week 3: PPTX, OCR (Python sidecar), two-column paper layout, tables / equations, vector sidebar, exports (Markdown, CSV, Anki, etc.).
Week 4: exam blueprints, evidence-based item generation, player with timers, auto grading + rubrics, performance analysis and wrong-answer coaching.
Phase 3: production readiness (~weeks 5–6)
Goal: security, performance, deployment.
Week 5: academic integrity policy, end-to-end auditing, RBAC, cost/token budgets, encryption/privacy, compliance reporting.
Week 6: OpenTelemetry, caching/perf, Docker/K8s manifests, CI/CD, error tracking/alerts, production deployment templates.
Next actions (excerpt)
- Implement PDF text extraction (PDF.js or pdf-parse, etc.)
- PPTX parsing approach (dedicated library / server conversion)
- TXT encoding detection and reads
- Chunking strategy and anchor model design
- Preprocessing orchestration aligned with queue job taxonomy
Update this curated page after each major delivery to reflect progress and schedule shifts.