codilay·v0.2.0·open-source·python 3.11+·600+ tests ✓
// your codebase,

finally understood.

CodiLay is an AI agent that reads your entire codebase and produces a living, structured document — so any human or AI can understand what's happening, where, and why things connect.

INSTALL FROM PYPI
$pip install codilay
$pip install "codilay[all]"
$pipx install codilay
##problem

you've inherited a codebase. now what?

// Every dev has been here. Thousands of files, no docs worth reading. You grep, open random files, build a mental map that evaporates by morning.

your-brain.exe
bash
// Day 1: Trying to understand the payment flow$ grep -r "payment" src/src/routes/orders.js:14 → paymentService.charge()src/services/payment.js:1 → imports ???src/services/payment.js:8 → calls stripeClientsrc/middleware/billing.js:22 → validates paymentsrc/models/Transaction.js:3 → used somewhere?src/events/order.js:45 → emits 'payment:completed'src/listeners/notify.js:12 → listens to... what?// 7 files. No idea how they connect.// Which one calls which?// What's the actual data flow?// What did I look at 20 minutes ago?// Opens another tab. And another. And another...
ERR

mental model evaporates

You build understanding file by file — by file 15, you've forgotten what file 3 did. Human working memory doesn't scale to codebases.

ERR

connections are invisible

File A imports B which calls C which emits an event that D listens to. These relationships exist in code but nowhere else.

WARN

AI can't help either

Paste a file into ChatGPT? It has no idea what the rest of the codebase looks like. Paste the whole repo? Context window can't hold it.

WARN

docs are always stale

Hand-written documentation decays the moment it's committed. Nobody updates it. Within weeks it's misleading.

STATof dev time
0%
spent understanding existing code, not writing new
STATonboarding
0–6 mo
for a new dev to be productive on a large codebase
STATfiles deep
0+
before most devs lose their mental model completely
STATtools solve this
0
grep, IDE search, and ChatGPT are band-aids
##wire-model

the core abstraction

// CodiLay operates like a detective tracing wires through a circuit. Every unresolved reference is a wire — open until both ends are documented, then retired forever.

## step 01 — reading the first file

The agent reads routes/orders.js and encounters an import it hasn't documented yet. A reference pointing into the dark.

routes/orders.js
bash
import { PaymentService } from '../services/payment'import { Order } from '../models/Order'// agent has never seen payment.js or Order.js// doesn't know what they do yet// but knows they EXIST and are NEEDED HERE
agent-state.json
bash
processing: "routes/orders.js"open_wires: []closed_wires: []queue_next: "services/payment.js"→ reading file content...→ found 2 unknown references
##context-crisis

why every other approach fails

// The fundamental problem isn't reading code — it's managing what you know while reading more. Every existing tool either overflows the context or loses information.

ERR

without-codilay

context-windowOVERFLOW

paste entire repo → model forgets beginning by the end

  • dump full repo into LLM → context overflow, early files forgotten
  • send files one-by-one → no cross-file understanding
  • RAG-based retrieval → retrieves similar text, not connected logic
  • manual docs → decays immediately, nobody maintains it
  • IDE search / grep → finds text, not relationships
  • mental model → evaporates after 15 files
OK

with-codilay

context-windowOPTIMAL

only open wires + relevant sections → always lean

  • reads file-by-file with directed purpose — wires guide focus
  • carries only open wires — closed connections retired from context
  • doc sections loaded by relevance, not similarity
  • living doc updates on git changes, never goes stale
  • surfaces actual connections: imports, calls, events, data flow
  • persistent across sessions — survives overnight, survives team changes

// key insight: CodiLay never carries more in memory than it needs. Closed wires are gone. Processed sections are indexed, not loaded. Context is always lean — whether 50 files or 5,000.

##how-it-works

five phases. one command.

// Run codilay . and the agent handles everything — from scanning to triage to the final assembled document.

P1
./bootstrap

Parses .gitignore, loads config, runs the file tree, preloads existing markdown files. Establishes the full picture of what exists.

$ parse .gitignore → merge config ignores → tree walk → collect .md files → build raw file list
P2
./triage

A single cheap LLM call sees only filenames — no content. Categorises every file: core (document fully), skim (extract metadata), skip (ignore).

$ Flutter → skips ios/, android/ | React → skips .next/, node_modules/ | Any → skips dist/, *.lock
P3
./planning

The planner sees the curated file tree and produces an ordered processing queue. Files prioritized by architectural importance — entry points first.

$ output: ordered queue + parked files (ambiguous deps) + doc skeleton + suggested sections
P4
./processing-loop

The core agent loop. Each file is read, relevant doc chunks loaded, LLM produces a structured diff, docstore is patched, wires opened or closed.

$ per file: read → load relevant chunks → LLM call → apply diff → update wires → reprioritize queue
P5
./finalize

Final sweep resolves pending markers, documents parked files, surfaces unresolved references, assembles CODEBASE.md with the full dependency graph.

$ output: CODEBASE.md + links.json + .codilay_state.json (commit hash stored for future re-runs)
##features

everything you need. nothing you don't.

// 30+ CLI commands. 10 integrated feature modules. All built to keep your codebase documentation alive and useful.

F01

watch-mode

Save a file, documentation updates automatically. Debounced, filtered, incremental — no full re-runs.

F02

git-aware-reruns

Detects modified, added, deleted, and renamed files via git diff. Only re-processes what changed.

F03

chat-interface

Ask questions about your codebase in natural language. Three layers: doc-based, deep source reading, learning loop.

F04

3-layer-web-ui

Reader (instant, no LLM), Chatbot (answers from doc), Deep Agent (reads source when needed).

F05

doc-diff-view

See what shifted between documentation runs — section additions, removals, modifications, wire changes.

F06

parallel-processing

Files in the same dependency tier run concurrently. Central wire bus keeps context consistent. 3–8x faster.

F07

scheduled-reruns

Cron-based or commit-triggered. Documentation stays fresh automatically — no human intervention needed.

F08

ai-context-export

Export compressed docs in Markdown, XML, or JSON — optimized for feeding into another LLM's context window.

F09

graph-filters

Slice the dependency graph by wire type, layer, module, direction. Surface architectural hubs.

F10

team-memory

Shared knowledge base: facts, architectural decisions, coding conventions, file-level annotations.

F11

conversation-search

TF-IDF search across all past conversations. Find that retry logic discussion from two weeks ago instantly.

F12

triage-tuning

Correct incorrect triage decisions. Feedback stored as direct overrides injected into future LLM triage prompts.

##transformation

from chaos to clarity

// One command turns an opaque codebase into a navigable, queryable, living document.

BEFOREmanual investigation
trying-to-understand-auth.sh
bash
$ grep -r "authenticate" src/src/middleware/auth.js:4: ...somethingsrc/routes/users.js:18: ...somethingsrc/services/token.js:7: ...somethingsrc/utils/crypto.js:12: ...something// ok but which calls which?// what's the actual flow?// where does JWT get created?// where is it validated?$ # open auth.js in tab 1$ # open token.js in tab 2$ # open crypto.js in tab 3$ # hold all four in your head$ # failERROR: brain stack overflow
AFTERcodilay run ./project
CODEBASE.md — auth-section
bash
## Authentication FlowRequest → middleware/auth.js │ extracts JWT from Authorization header │ calls services/token.js → verify() │ └── uses utils/crypto.js → publicKey ├── valid → req.user populated, next() └── invalid → 401 + error event emitted### Dependencies middleware/auth.js → imports services/token.js → imports models/User.js services/token.js → imports utils/crypto.js### Used By routes/users.js, routes/orders.js, routes/admin.js (all protected routes)
~45 min
manual investigation
grep → open files → read → cross-reference → forget → repeat
~3 min
codilay first run
codilay ./project → complete doc with all connections mapped
~15 sec
subsequent updates
git-aware re-run, processes only changed files + affected wires
##architecture

built in layers. each one independent.

// 30k+ lines across 30+ source files. Every layer can operate independently — CLI without web UI, agent without watcher, chat without scheduler.

CLI
CLI Layer

30+ commands via Click. Interactive TUI with Rich. Init, run, watch, export, search, schedule, and more. Entry point for everything.

files: cli.py · settings.py

AGENT
Agent Core

The 5-phase loop: scanner → triage → planner → processor → finalizer. Wire manager tracks all open/closed wires. Docstore manages sections independently.

files: scanner.py · triage.py · planner.py · processor.py · wire_manager.py · docstore.py

LLM
Intelligence Layer

Unified LLM client across Anthropic, OpenAI, and 8+ providers. All prompts return structured JSON. Large file handling with skeleton + detail passes.

files: llm_client.py · prompts.py · large_file.py

MODULES
Feature Modules

Watch mode, doc diffing, conversation search, graph filtering, AI export, team memory, triage feedback, scheduled re-runs. Each standalone with clean interfaces.

files: watcher.py · doc_differ.py · search.py · graph_filter.py · exporter.py · team_memory.py · triage_feedback.py · scheduler.py

WEB
Web & API Layer

FastAPI server with SSE streaming for chat. 3-layer UI: Reader (static render), Chatbot (doc context), Deep Agent (reads source when needed).

files: server.py · web/index.html

EXT
Integrations

Git integration for change detection and re-runs. VSCode extension as thin API client. Output portability with configurable gitignore modes.

files: git_tracker.py · vscode-extension/

30+
source files
30k+
lines of code
30+
CLI commands
600+
tests passing
##outcome

what you get

// CodiLay doesn't just generate a document. It gives your entire team — humans and AIs — a shared understanding of your codebase that stays current.

OUT-01

complete abstract view

Every module documented: what it does, where it lives, how it connects. Cross-references link everything. The dependency graph shows the full picture.

OUT-02

onboarding in minutes, not months

New dev joins? They read CODEBASE.md, ask the chatbot, and have a working mental model before writing a single line of code.

OUT-03

AI that actually understands your code

Export the compressed doc into any LLM context window. Now ChatGPT, Claude, or Copilot knows your architecture — not just the file you pasted.

OUT-04

docs that never go stale

Git-aware re-runs, watch mode, scheduled updates. The doc evolves with your code. No manual maintenance. No decay.

OUT-05

team knowledge that compounds

Shared memory, architectural decisions, conventions — all injected into every interaction. The AI learns what your team has agreed on and respects it.

OUT-06

self-improving through questions

Every question the chatbot can't answer triggers the deep agent, which patches the doc. Documentation gets smarter with every conversation.

## learning-loop
INPUTuser asks question
L1chatbot checks doc
L2can't answer? deep agent reads source
PATCHdoc patched + answer delivered
CACHEnext time: chatbot answers directly

// doc gets smarter with every question it couldn't answer. over time, chatbot handles more without escalation.

ready to deploy|MIT license|python 3.11+

understand any codebase.

set up in minutes.

// stop guessing. stop grep-ing. stop building mental models that vanish overnight. let CodiLay trace every wire for you.

INSTALL FROM PYPI (RECOMMENDED)
basic$ pip install codilay
all features$ pip install "codilay[all]"
global cli$ pipx install codilay
setup$ codilay setup
run$ codilay .
macOS·Linux·Windows·Python 3.11+
NEW_PROJECT

new to a project?

Run CodiLay, read the doc, ask questions. Productive in minutes.

LEGACY

maintaining legacy code?

Finally understand what connects to what before you touch anything.

AI_ASSIST

using AI assistants?

Export the doc as context. Now your AI actually knows your architecture.

ONBOARD

onboarding teammates?

Hand them CODEBASE.md + the chat interface. No more 2-week shadow sessions.