Zhiyuan Song

Projects Research Notes

← Home

Monorepo · MIT · TypeScript · Next.js · PostgreSQL · since Sep 2025

Study Assistant

Study Assistant · AI-powered learning platform

Turn course packs and study materials into a searchable, generative, testable learning flow: multi-format ingestion and chunking, pgvector semantic search with cross-document links, OpenAI-backed translation/summary/glossary and flashcard expansion, with a planned practice exam mode (multi-type questions, scoring, progress) plus academic integrity features—source traceability, auditing, and permissions.

Implementation details, environment variables, and APIs follow the GitHub README and docs/; this page is a curated site summary covering architecture, data flow, data model, and phased roadmap (aligned with docs/ARCHITECTURE.md / ROADMAP.md; repository wins on conflicts).

Repository

https://github.com/songzhiyuan98/study-assistant

Core features

Smart document processing

PDF, PPT, TXT, and more
Automatic chunking and extraction
OCR for scanned pages
Tables and equations (improves over iterations)

AI content generation

Contextual translation from selected spans
Auto study summaries
Glossary and flashcards
Context-aware knowledge expansion

Vector semantic search

PostgreSQL pgvector semantic retrieval
Cross-document linking and personalized recommendations

Practice exams (planned / in development)

Question generation from study materials (MCQ, fill-in, short answer, etc.)
Auto grading, feedback, and progress analytics

Academic integrity & compliance

Source traceability
Integrity checks and audit logs
Permissions and access control

Quick start

Requirements: Node.js 18+; PostgreSQL 15+ with pgvector; Redis 6+; object storage (e.g. MinIO, S3-compatible); optional PM2 for process management.

1. Install

git clone https://github.com/songzhiyuan98/study-assistant.git
cd study-assistant
pnpm install   # If the repo pins npm, follow README instead

2. Configure environment

cp packages/db/.env.example packages/db/.env
# Edit packages/db/.env

Common variables: DATABASE_URL (PostgreSQL), OPENAI_API_KEY, NEXTAUTH_SECRET, etc.; full list in the repo examples.

3. Database

npm run docker:up      # PostgreSQL, Redis, MinIO, etc. (see package.json scripts)
npm run db:migrate
npm run db:seed

4. Dev servers

npm run dev

Open http://localhost:3000 (port per local config).

User workflow (product)

Register (email or Google OAuth)
Create folders to organize documents by course or topic
Upload PDF / PPT / TXT
Select passages to study inside a document
Generate translations, summaries, flashcards, and other AI artifacts
(Roadmap) Build practice exams from materials and track progress

Feature progress

Snapshot timestamp 2025-09-08; if it conflicts with ROADMAP.md, follow the repository.

Snapshot (2025-09-08)

Done:

Database foundation: PostgreSQL + pgvector + Prisma
Auth: NextAuth.js (credentials / OAuth) with middleware protection
Folder organization: per-user document taxonomy
Uploads: MinIO storage + API integration (including bugfix loops)

In progress: PDF / TXT parsing pipeline—content extraction and chunking.

Completion (planning view): phase 1 ~67% (4/6 major tasks); overall ~22% (4/18 major tasks).

Capability matrix (short)

Feature	Status	Notes
Folder management	Done	Multi-level document organization
User authentication	Done	Email / Google OAuth
PDF processing	In development	Text extraction and chunking
AI translation	Planned	Multilingual
Exam system	Planned	Auto generation and grading

Tech architecture

Frontend

Next.js 14 + React 18
TypeScript
Tailwind CSS
Zustand + React Query
NextAuth.js

Backend & data

API: Fastify
Database: PostgreSQL + pgvector (IVFFLAT or other index strategies)
ORM: Prisma
Queues: BullMQ + Redis
Object storage: MinIO / S3-compatible

AI

Models: OpenAI GPT-3.5 / 4 family
Embeddings: text-embedding-ada-002 (or later per config)
Retrieval: pgvector

System architecture

Overview

Study Assistant uses a modern modular monorepo: web and API are decoupled, heavy jobs run on queues + workers, files land in object storage, structured and vector data live in PostgreSQL, and long-running inference/embeddings call external AI. The shape resembles microservice collaboration; deployment can scale horizontally by process or container.

User → Web app → API (Fastify) → PostgreSQL (business + vectors)
                    │
                    ├→ Redis queue (BullMQ) → background workers (ingest / generate / export)
                    │
                    ├→ MinIO / S3 (raw files and exports)
                    │
                    └→ OpenAI, etc. (embeddings + generation)

Data flow

1. Upload & processing

User uploads → web client
Client calls API → object storage (MinIO / S3)
API writes business rows → enqueue ingestion job
Worker: parse / chunk → embeddings → persist to DB
Update lecture / segment processing state for polling or push UI

2. Generation & learning

User selects text in reader → create Selection record
Generation request → enqueue AI job (context + source refs)
Worker calls OpenAI → translation / summary / flashcards, etc.
Persist results (payloadJson, sourceRefs) → notify or poll UI

Data model

Core relationships (conceptual)

USER
├── FOLDER
│   └── LECTURE (document)
│       └── SEGMENT (chunk + embedding vector)
├── SELECTION (user highlight of segments)
│   └── ITEM (generated artifact: translation, summary, etc.)
└── EXAM (roadmap)
    └── EXAM_ATTEMPT (responses)

Key fields (summary)

Entity	Highlights
User	`id`, email, name, optional password, role (RBAC)
Folder	`id`, name, description, `userId`
Lecture	`id`, `folderId`, title, `fileUrl`, type, processing `status`
Segment	`id`, `lectureId`, body, `embedding` vector, page / anchor
Selection	`id`, `userId`, `lectureId`, `segmentIds[]`
Item	`id`, `selectionId`, type, `payloadJson`, `sourceRefs`

Component boundaries

Frontend (Next.js App Router)

app/web/src/
├── app/
│   ├── (auth)/           # sign-in / sign-up
│   ├── dashboard/
│   ├── upload/
│   ├── library/
│   └── api/              # Next.js route handlers (if any)
├── components/
│   ├── ui/
│   ├── forms/
│   └── layout/
├── lib/
└── providers/

Backend & workers

app/
├── api/                  # Fastify: routes, plugins, middleware
├── workers/              # BullMQ: ingest, generate, export, etc.
└── sidecar/              # optional: Python OCR or other standalone services

Vector retrieval pipeline

Semantic search flow

Query: user text → embedding (e.g. OpenAI ada-002 family)
Retrieve: pgvector + IVFFlat (or other index) for approximate nearest neighbors
Rank: cosine similarity → top-K segments
Post-process: dedupe, filter, permission checks → return spans

Parameters & policy (target)

Index: IVFFlat; distance: cosine
Dimension: 1536 (ada-002; change with model)
Similarity threshold: e.g. ≥ 0.8 (tunable)
Default top: ~10 related segments

RBAC & audit

RBAC roles (target)

ADMIN: system and user administration, audit logs
INSTRUCTOR: course content management, student progress
STUDENT: access to personal documents and generated artifacts

Audit & compliance

Log key CRUD and AI calls (searchable, exportable)
Retention: e.g. 6 months active + 2 years archive (per final implementation)
Academic integrity reports, personal data access logs (privacy)

Async queues (BullMQ)

Task types (example names)

document:ingest — document ingestion & parsing
content:generate — AI artifact generation
export:data — exports
cleanup:files — cleanup
vector:index — vector index maintenance

Priority & concurrency

High: user-triggered generation
Medium: file ingestion and chunking
Low: cleanup and maintenance
Independent concurrency caps per job type to avoid starvation

Observability, SLOs, and API

Target KPIs (design)

API latency: P95 < 200 ms for light endpoints; first paint < 3 s
Document ingestion: target < 30 s per document (size-dependent)
AI: GPT-class interactions ~< 5 s; embeddings ~< 2 s
Concurrency: 100+ simultaneous users after horizontal scale
Availability target: 99.9% SLA (production phase)

Monitoring stack (planned)

Tracing / metrics: OpenTelemetry
Time series: Prometheus; dashboards: Grafana
Logs: ELK or managed equivalent
Alerts: PagerDuty or existing on-call tooling

API principles

RESTful; versioned paths like /api/v1/
Uniform JSON responses and error codes; auth: JWT Bearer

Key endpoints (summary)

POST /api/auth/register
POST /api/auth/login
GET  /api/auth/session

POST /api/lectures
GET  /api/lectures/:id
GET  /api/folders

POST /api/selections
POST /api/items/generate
GET  /api/items/:id

Repository layout (summary)

study-assistant/
├── app/
│   ├── web/          # Next.js frontend
│   ├── api/          # Fastify API
│   └── workers/      # background workers
├── packages/
│   ├── db/           # Prisma + migrations
│   ├── shared/
│   └── ui/
├── docs/             # ROADMAP, ARCHITECTURE, OPERATIONS, etc.
└── CLAUDE.md         # AI collaboration notes (if kept)

Roadmap

Weeks are planning cadence; checkboxes follow the latest commits and ROADMAP.

Phase 1: foundation (~weeks 1–2)

Goal: core document pipeline + basic generation loop.

Week 1 (infrastructure)

Monorepo setup (done)
PostgreSQL + pgvector (done, 2025-09-08)
NextAuth baseline (done, 2025-09-08)
File upload + MinIO / S3 (done, 2025-09-08)
Current focus: PDF / TXT parsing (Node.js)
Chunking and anchor model

Week 2 (selection & generation)

PDF.js reader, text selection, segment management
BullMQ queue wiring
Baseline AI translation and summaries
Embeddings + similarity retrieval, review UI for outputs
Folder organization (done, 2025-09-08)

Phase 2: feature depth (~weeks 3–4)

Goal: advanced document capabilities + exam MVP.

Week 3: PPTX, OCR (Python sidecar), two-column paper layout, tables / equations, vector sidebar, exports (Markdown, CSV, Anki, etc.).

Week 4: exam blueprints, evidence-based item generation, player with timers, auto grading + rubrics, performance analysis and wrong-answer coaching.

Phase 3: production readiness (~weeks 5–6)

Goal: security, performance, deployment.

Week 5: academic integrity policy, end-to-end auditing, RBAC, cost/token budgets, encryption/privacy, compliance reporting.

Week 6: OpenTelemetry, caching/perf, Docker/K8s manifests, CI/CD, error tracking/alerts, production deployment templates.

Next actions (excerpt)

Implement PDF text extraction (PDF.js or pdf-parse, etc.)
PPTX parsing approach (dedicated library / server conversion)
TXT encoding detection and reads
Chunking strategy and anchor model design
Preprocessing orchestration aligned with queue job taxonomy

Update this curated page after each major delivery to reflect progress and schedule shifts.

Common commands

npm run dev          # local dev
npm run build        # production build
npm run test         # tests

npm run db:migrate   # migrations
npm run db:seed      # seed data
npm run db:studio    # Prisma Studio

npm run lint
npm run type-check
npm run format

Script names follow root package.json.

Contributing & license

Fork → feature branch → commits → push → pull request; conventions in CONTRIBUTING.md.

Docs: docs/ROADMAP.md, docs/ARCHITECTURE.md, docs/OPERATIONS.md, API notes, CHANGELOG, CLAUDE guide, etc.

License: MIT.

Thanks to Next.js, Prisma, pgvector, OpenAI, and other OSS/providers.