codilay·v0.9.0-beta·open-source·python 3.11+·359/359 tests ✓
// your codebase,

finally understood.

CodiLay is an AI agent that reads your entire codebase and produces a living, structured document — so any human or AI can understand what's happening, where, and why things connect.

$git clone https://github.com/HarmanPreet-Singh-XYT/codilay.git && cd codilay
$pip install -e ".[all]"
##problem

you've inherited a codebase. now what?

// Every dev has been here. Thousands of files, no docs worth reading. You grep, open random files, build a mental map that evaporates by morning.

your-brain.exe
bash
// Day 1: Trying to understand the payment flow$ grep -r "payment" src/src/routes/orders.js:14 → paymentService.charge()src/services/payment.js:1 → imports ???src/services/payment.js:8 → calls stripeClientsrc/middleware/billing.js:22 → validates paymentsrc/models/Transaction.js:3 → used somewhere?src/events/order.js:45 → emits 'payment:completed'src/listeners/notify.js:12 → listens to... what?// 7 files. No idea how they connect.// Which one calls which?// What's the actual data flow?// What did I look at 20 minutes ago?// Opens another tab. And another. And another...
ERR

mental model evaporates

You build understanding file by file — by file 15, you've forgotten what file 3 did. Human working memory doesn't scale to codebases.

ERR

connections are invisible

File A imports B which calls C which emits an event that D listens to. These relationships exist in code but nowhere else.

WARN

AI can't help either

Paste a file into ChatGPT? It has no idea what the rest of the codebase looks like. Paste the whole repo? Context window can't hold it.

WARN

docs are always stale

Hand-written documentation decays the moment it's committed. Nobody updates it. Within weeks it's misleading.

STATof dev time
0%
spent understanding existing code, not writing new
STATonboarding
0–6 mo
for a new dev to be productive on a large codebase
STATfiles deep
0+
before most devs lose their mental model completely
STATtools solve this
0
grep, IDE search, and ChatGPT are band-aids
##wire-model

the core abstraction

// CodiLay operates like a detective tracing wires through a circuit. Every unresolved reference is a wire — open until both ends are documented, then retired forever.

## step 01 — reading the first file

The agent reads routes/orders.js and encounters an import it hasn't documented yet. A reference pointing into the dark.

routes/orders.js
bash
import { PaymentService } from '../services/payment'import { Order } from '../models/Order'// agent has never seen payment.js or Order.js// doesn't know what they do yet// but knows they EXIST and are NEEDED HERE
agent-state.json
bash
processing: "routes/orders.js"open_wires: []closed_wires: []queue_next: "services/payment.js"→ reading file content...→ found 2 unknown references
##context-crisis

why every other approach fails

// The fundamental problem isn't reading code — it's managing what you know while reading more. Every existing tool either overflows the context or loses information.

ERR

without-codilay

context-windowOVERFLOW

paste entire repo → model forgets beginning by the end

  • dump full repo into LLM → context overflow, early files forgotten
  • send files one-by-one → no cross-file understanding
  • RAG-based retrieval → retrieves similar text, not connected logic
  • manual docs → decays immediately, nobody maintains it
  • IDE search / grep → finds text, not relationships
  • mental model → evaporates after 15 files
OK

with-codilay

context-windowOPTIMAL

only open wires + relevant sections → always lean

  • reads file-by-file with directed purpose — wires guide focus
  • carries only open wires — closed connections retired from context
  • doc sections loaded by relevance, not similarity
  • living doc updates on git changes, never goes stale
  • surfaces actual connections: imports, calls, events, data flow
  • persistent across sessions — survives overnight, survives team changes

// key insight: CodiLay never carries more in memory than it needs. Closed wires are gone. Processed sections are indexed, not loaded. Context is always lean — whether 50 files or 5,000.

##how-it-works

five phases. one command.

// Run codilay . and the agent handles everything — from scanning to triage to the final assembled document.

P1
./bootstrap

Parses .gitignore, loads config, runs the file tree, preloads existing markdown files. Establishes the full picture of what exists.

$ parse .gitignore → merge config ignores → tree walk → collect .md files → build raw file list
P2
./triage

A single cheap LLM call sees only filenames — no content. Categorises every file: core (document fully), skim (extract metadata), skip (ignore).

$ Flutter → skips ios/, android/ | React → skips .next/, node_modules/ | Any → skips dist/, *.lock
P3
./planning

The planner sees the curated file tree and produces an ordered processing queue. Files prioritized by architectural importance — entry points first.

$ output: ordered queue + parked files (ambiguous deps) + doc skeleton + suggested sections
P4
./processing-loop

The core agent loop. Each file is read, relevant doc chunks loaded, LLM produces a structured diff, docstore is patched, wires opened or closed.

$ per file: read → load relevant chunks → LLM call → apply diff → update wires → reprioritize queue
P5
./finalize

Final sweep resolves pending markers, documents parked files, surfaces unresolved references, assembles CODEBASE.md with the full dependency graph.

$ output: CODEBASE.md + links.json + .codilay_state.json (commit hash stored for future re-runs)
##features

everything you need. nothing you don't.

// 29 CLI commands. 10 integrated feature modules. All built to keep your codebase documentation alive and useful.

F01

watch-mode

Save a file, documentation updates automatically. Debounced, filtered, incremental — no full re-runs.

F02

git-aware-reruns

Detects modified, added, deleted, and renamed files via git diff. Only re-processes what changed.

F03

chat-interface

Ask questions about your codebase in natural language. Three layers: doc-based, deep source reading, learning loop.

F04

3-layer-web-ui

Reader (instant, no LLM), Chatbot (answers from doc), Deep Agent (reads source when needed).

F05

doc-diff-view

See what shifted between documentation runs — section additions, removals, modifications, wire changes.

F06

parallel-processing

Files in the same dependency tier run concurrently. Central wire bus keeps context consistent. 3–8x faster.

F07

scheduled-reruns

Cron-based or commit-triggered. Documentation stays fresh automatically — no human intervention needed.

F08

ai-context-export

Export compressed docs in Markdown, XML, or JSON — optimized for feeding into another LLM's context window.

F09

graph-filters

Slice the dependency graph by wire type, layer, module, direction. Surface architectural hubs.

F10

team-memory

Shared knowledge base: facts, architectural decisions, coding conventions, file-level annotations.

F11

conversation-search

TF-IDF search across all past conversations. Find that retry logic discussion from two weeks ago instantly.

F12

triage-tuning

Correct incorrect triage decisions. Feedback stored as direct overrides injected into future LLM triage prompts.

##transformation

from chaos to clarity

// One command turns an opaque codebase into a navigable, queryable, living document.

BEFOREmanual investigation
trying-to-understand-auth.sh
bash
$ grep -r "authenticate" src/src/middleware/auth.js:4: ...somethingsrc/routes/users.js:18: ...somethingsrc/services/token.js:7: ...somethingsrc/utils/crypto.js:12: ...something// ok but which calls which?// what's the actual flow?// where does JWT get created?// where is it validated?$ # open auth.js in tab 1$ # open token.js in tab 2$ # open crypto.js in tab 3$ # hold all four in your head$ # failERROR: brain stack overflow
AFTERcodilay run ./project
CODEBASE.md — auth-section
bash
## Authentication FlowRequest → middleware/auth.js │ extracts JWT from Authorization header │ calls services/token.js → verify() │ └── uses utils/crypto.js → publicKey ├── valid → req.user populated, next() └── invalid → 401 + error event emitted### Dependencies middleware/auth.js → imports services/token.js → imports models/User.js services/token.js → imports utils/crypto.js### Used By routes/users.js, routes/orders.js, routes/admin.js (all protected routes)
~45 min
manual investigation
grep → open files → read → cross-reference → forget → repeat
~3 min
codilay first run
codilay ./project → complete doc with all connections mapped
~15 sec
subsequent updates
git-aware re-run, processes only changed files + affected wires
##architecture

built in layers. each one independent.

// ~10,325 lines across ~24 source files. Every layer can operate independently — CLI without web UI, agent without watcher, chat without scheduler.

CLI
CLI Layer

29 commands via Click. Interactive TUI with Rich. Init, run, watch, export, search, schedule, and more. Entry point for everything.

files: cli.py · settings.py

AGENT
Agent Core

The 5-phase loop: scanner → triage → planner → processor → finalizer. Wire manager tracks all open/closed wires. Docstore manages sections independently.

files: scanner.py · triage.py · planner.py · processor.py · wire_manager.py · docstore.py

LLM
Intelligence Layer

Unified LLM client across Anthropic, OpenAI, and 8+ providers. All prompts return structured JSON. Large file handling with skeleton + detail passes.

files: llm_client.py · prompts.py · large_file.py

MODULES
Feature Modules

Watch mode, doc diffing, conversation search, graph filtering, AI export, team memory, triage feedback, scheduled re-runs. Each standalone with clean interfaces.

files: watcher.py · doc_differ.py · search.py · graph_filter.py · exporter.py · team_memory.py · triage_feedback.py · scheduler.py

WEB
Web & API Layer

FastAPI server with SSE streaming for chat. 3-layer UI: Reader (static render), Chatbot (doc context), Deep Agent (reads source when needed).

files: server.py · web/index.html

EXT
Integrations

Git integration for change detection and re-runs. VSCode extension as thin API client. Output portability with configurable gitignore modes.

files: git_tracker.py · vscode-extension/

~24
source files
~10,325
lines of code
25+
CLI commands
359/359
tests passing
##outcome

what you get

// CodiLay doesn't just generate a document. It gives your entire team — humans and AIs — a shared understanding of your codebase that stays current.

OUT-01

complete abstract view

Every module documented: what it does, where it lives, how it connects. Cross-references link everything. The dependency graph shows the full picture.

OUT-02

onboarding in minutes, not months

New dev joins? They read CODEBASE.md, ask the chatbot, and have a working mental model before writing a single line of code.

OUT-03

AI that actually understands your code

Export the compressed doc into any LLM context window. Now ChatGPT, Claude, or Copilot knows your architecture — not just the file you pasted.

OUT-04

docs that never go stale

Git-aware re-runs, watch mode, scheduled updates. The doc evolves with your code. No manual maintenance. No decay.

OUT-05

team knowledge that compounds

Shared memory, architectural decisions, conventions — all injected into every interaction. The AI learns what your team has agreed on and respects it.

OUT-06

self-improving through questions

Every question the chatbot can't answer triggers the deep agent, which patches the doc. Documentation gets smarter with every conversation.

## learning-loop
INPUTuser asks question
L1chatbot checks doc
L2can't answer? deep agent reads source
PATCHdoc patched + answer delivered
CACHEnext time: chatbot answers directly

// doc gets smarter with every question it couldn't answer. over time, chatbot handles more without escalation.

ready to deploy|MIT license|python 3.11+

understand any codebase.

set up in minutes.

// stop guessing. stop grep-ing. stop building mental models that vanish overnight. let CodiLay trace every wire for you.

clone$ git clone https://github.com/HarmanPreet-Singh-XYT/codilay.git && cd codilay
install$ pip install -e ".[all]"
setup$ codilay setup
run$ codilay .
macOS·Linux·Windows·Python 3.11+
NEW_PROJECT

new to a project?

Run CodiLay, read the doc, ask questions. Productive in minutes.

LEGACY

maintaining legacy code?

Finally understand what connects to what before you touch anything.

AI_ASSIST

using AI assistants?

Export the doc as context. Now your AI actually knows your architecture.

ONBOARD

onboarding teammates?

Hand them CODEBASE.md + the chat interface. No more 2-week shadow sessions.