NoRag.
RAG without vectors.

Ask your docs anything. Get an answer you can trace — full sections, exact citations. No vector DB, no embedding API, no recurring cost.

Read the blueprint GitHub

Why not RAG.

Vectors are opaque, chunks are arbitrary, and ingestion keeps paying. NoRag swaps the whole stack for Markdown the LLM can read directly.

Infrastructure

RAG

Vector DB + embedding model

NoRag

Plain Markdown files

Cost of adding a doc

RAG

Recurring (re-embed + storage)

NoRag

One-shot archivist pass

Context given to LLM

RAG

Arbitrary chunks

NoRag

Complete sections

Auditability

RAG

Opaque vectors

NoRag

Git-diffable Markdown

Citations

RAG

Approximate

NoRag

Precise [doc_id, section]

Who fixes the index?

RAG

Data scientist

NoRag

Any dev who reads MD

Criterion

RAG

NoRag

Infrastructure

Vector DB + embedding model

Plain Markdown files

Cost of adding a doc

Recurring (re-embed + storage)

One-shot archivist pass

Context given to LLM

Arbitrary chunks

Complete sections

Auditability

Opaque vectors

Git-diffable Markdown

Citations

Approximate

Precise [doc_id, section]

Who fixes the index?

Data scientist

Any dev who reads MD

L1 — two calls, done.

Call 1: a small model reads the question, the document catalog, and the agent catalog. It picks an agent and the relevant sections. Call 2: the chosen agent reads those sections and answers with citations.

Question

Router (SLM)

→ agent + docs

Answer (LLM)

[doc_id, section]

{
  "agent_id": "juriste_conformite",
  "documents": [
    { "doc_id": "contrat_acme", "sections": ["art_7", "annexe_A"] }
  ],
  "reasoning": "Contract retention question → juriste + SLA clauses"
}

Multi_L — parallel, then synthesized.

A Planner fans out N L1 layers — different agents, sub-questions, or corpora. The Aggregator names contradictions and writes the synthesis.

Planner (SLM)

emits N layer plans

Layer 1

agent: juriste_conformite

L1 → answer + citations

Layer 2

agent: analyste_technique

L1 → answer + citations

Layer 3

agent: analyste_finance

L1 → answer + citations

Aggregator (LLM)

synthesis · all citations preserved · contradictions named

Four presets. Same engine.

Configure Multi_L for your use case by picking a preset — or let the Planner decide automatically.

AMulti-Agent

Same question, different agents. Cross-perspectives in one response.

Layer 1: juriste_conformite
Layer 2: analyste_technique
Layer 3: analyste_finance

BDecomposition

Split the question into sub-questions routed independently.

L1: "AWS cloud strategy 2020-2024"
L2: "Azure cloud strategy 2020-2024"

CMulti-Corpus

Same question, different agents, different document scopes.

L1: juriste, scope=contrats
L2: analyste_technique, scope=doc_technique

DHybrid / Auto

Planner freely combines agents, sub-questions, and index scopes.

Let the Planner decide.

Under the hood.

Two Markdown files. That’s the entire “database”. Git-diffable, human-readable, zero infra.

data/index.md

## contrat_saas_acme
- **Titre** : Contrat SaaS — Acme Technologies
- **Résumé** : Accord B2B SaaS couvrant SLA, rétention des données, et sécurité.
- **Sections** :
  - `art_7` — Rétention données — mots-clés : rétention, RGPD, purge, 90 jours
  - `annexe_A` — SLA et disponibilité — mots-clés : SLA, uptime, 99.9%, crédit

data/index_system_prompt.md

## juriste_conformite
**Description** : expert juridique B2B (contrats, RGPD, SLA).
**Quand l'utiliser** : clauses, rétention, DPA, SLA.
**System prompt** :
> Tu es juriste senior. Tu cites [doc_id, section] systématiquement.

Get started.

API

Full L1 + Multi_L via FastAPI. Any client, any language.

uvicorn api.main:app --reload

Web chat

Copy a plugin prompt into ChatGPT, Claude, Gemini, or Grok. L1 only.

norag/plugins/<provider>.md

Claude Code skill

Use /norag directly in your terminal. L1 + Multi_L, reads local files.

/norag <question>

NoRag.RAG without vectors.

Why not RAG.

L1 — two calls, done.

Multi_L — parallel, then synthesized.

Four presets. Same engine.

Under the hood.

Get started.

NoRag.
RAG without vectors.