finally understood.
CodiLay is an AI agent that reads your entire codebase and produces a living, structured document — so any human or AI can understand what's happening, where, and why things connect.
git clone https://github.com/HarmanPreet-Singh-XYT/codilay.git && cd codilaypip install -e ".[all]"you've inherited a codebase. now what?
// Every dev has been here. Thousands of files, no docs worth reading. You grep, open random files, build a mental map that evaporates by morning.
mental model evaporates
You build understanding file by file — by file 15, you've forgotten what file 3 did. Human working memory doesn't scale to codebases.
connections are invisible
File A imports B which calls C which emits an event that D listens to. These relationships exist in code but nowhere else.
AI can't help either
Paste a file into ChatGPT? It has no idea what the rest of the codebase looks like. Paste the whole repo? Context window can't hold it.
docs are always stale
Hand-written documentation decays the moment it's committed. Nobody updates it. Within weeks it's misleading.
the core abstraction
// CodiLay operates like a detective tracing wires through a circuit. Every unresolved reference is a wire — open until both ends are documented, then retired forever.
The agent reads routes/orders.js and encounters an import it hasn't documented yet. A reference pointing into the dark.
why every other approach fails
// The fundamental problem isn't reading code — it's managing what you know while reading more. Every existing tool either overflows the context or loses information.
without-codilay
paste entire repo → model forgets beginning by the end
- dump full repo into LLM → context overflow, early files forgotten
- send files one-by-one → no cross-file understanding
- RAG-based retrieval → retrieves similar text, not connected logic
- manual docs → decays immediately, nobody maintains it
- IDE search / grep → finds text, not relationships
- mental model → evaporates after 15 files
with-codilay
only open wires + relevant sections → always lean
- reads file-by-file with directed purpose — wires guide focus
- carries only open wires — closed connections retired from context
- doc sections loaded by relevance, not similarity
- living doc updates on git changes, never goes stale
- surfaces actual connections: imports, calls, events, data flow
- persistent across sessions — survives overnight, survives team changes
// key insight: CodiLay never carries more in memory than it needs. Closed wires are gone. Processed sections are indexed, not loaded. Context is always lean — whether 50 files or 5,000.
five phases. one command.
// Run codilay . and the agent handles everything — from scanning to triage to the final assembled document.
Parses .gitignore, loads config, runs the file tree, preloads existing markdown files. Establishes the full picture of what exists.
A single cheap LLM call sees only filenames — no content. Categorises every file: core (document fully), skim (extract metadata), skip (ignore).
The planner sees the curated file tree and produces an ordered processing queue. Files prioritized by architectural importance — entry points first.
The core agent loop. Each file is read, relevant doc chunks loaded, LLM produces a structured diff, docstore is patched, wires opened or closed.
Final sweep resolves pending markers, documents parked files, surfaces unresolved references, assembles CODEBASE.md with the full dependency graph.
everything you need. nothing you don't.
// 29 CLI commands. 10 integrated feature modules. All built to keep your codebase documentation alive and useful.
watch-mode
Save a file, documentation updates automatically. Debounced, filtered, incremental — no full re-runs.
git-aware-reruns
Detects modified, added, deleted, and renamed files via git diff. Only re-processes what changed.
chat-interface
Ask questions about your codebase in natural language. Three layers: doc-based, deep source reading, learning loop.
3-layer-web-ui
Reader (instant, no LLM), Chatbot (answers from doc), Deep Agent (reads source when needed).
doc-diff-view
See what shifted between documentation runs — section additions, removals, modifications, wire changes.
parallel-processing
Files in the same dependency tier run concurrently. Central wire bus keeps context consistent. 3–8x faster.
scheduled-reruns
Cron-based or commit-triggered. Documentation stays fresh automatically — no human intervention needed.
ai-context-export
Export compressed docs in Markdown, XML, or JSON — optimized for feeding into another LLM's context window.
graph-filters
Slice the dependency graph by wire type, layer, module, direction. Surface architectural hubs.
team-memory
Shared knowledge base: facts, architectural decisions, coding conventions, file-level annotations.
conversation-search
TF-IDF search across all past conversations. Find that retry logic discussion from two weeks ago instantly.
triage-tuning
Correct incorrect triage decisions. Feedback stored as direct overrides injected into future LLM triage prompts.
from chaos to clarity
// One command turns an opaque codebase into a navigable, queryable, living document.
built in layers. each one independent.
// ~10,325 lines across ~24 source files. Every layer can operate independently — CLI without web UI, agent without watcher, chat without scheduler.
29 commands via Click. Interactive TUI with Rich. Init, run, watch, export, search, schedule, and more. Entry point for everything.
files: cli.py · settings.py
The 5-phase loop: scanner → triage → planner → processor → finalizer. Wire manager tracks all open/closed wires. Docstore manages sections independently.
files: scanner.py · triage.py · planner.py · processor.py · wire_manager.py · docstore.py
Unified LLM client across Anthropic, OpenAI, and 8+ providers. All prompts return structured JSON. Large file handling with skeleton + detail passes.
files: llm_client.py · prompts.py · large_file.py
Watch mode, doc diffing, conversation search, graph filtering, AI export, team memory, triage feedback, scheduled re-runs. Each standalone with clean interfaces.
files: watcher.py · doc_differ.py · search.py · graph_filter.py · exporter.py · team_memory.py · triage_feedback.py · scheduler.py
FastAPI server with SSE streaming for chat. 3-layer UI: Reader (static render), Chatbot (doc context), Deep Agent (reads source when needed).
files: server.py · web/index.html
Git integration for change detection and re-runs. VSCode extension as thin API client. Output portability with configurable gitignore modes.
files: git_tracker.py · vscode-extension/
what you get
// CodiLay doesn't just generate a document. It gives your entire team — humans and AIs — a shared understanding of your codebase that stays current.
complete abstract view
Every module documented: what it does, where it lives, how it connects. Cross-references link everything. The dependency graph shows the full picture.
onboarding in minutes, not months
New dev joins? They read CODEBASE.md, ask the chatbot, and have a working mental model before writing a single line of code.
AI that actually understands your code
Export the compressed doc into any LLM context window. Now ChatGPT, Claude, or Copilot knows your architecture — not just the file you pasted.
docs that never go stale
Git-aware re-runs, watch mode, scheduled updates. The doc evolves with your code. No manual maintenance. No decay.
team knowledge that compounds
Shared memory, architectural decisions, conventions — all injected into every interaction. The AI learns what your team has agreed on and respects it.
self-improving through questions
Every question the chatbot can't answer triggers the deep agent, which patches the doc. Documentation gets smarter with every conversation.
// doc gets smarter with every question it couldn't answer. over time, chatbot handles more without escalation.
understand any codebase.
set up in minutes.
// stop guessing. stop grep-ing. stop building mental models that vanish overnight. let CodiLay trace every wire for you.
$ git clone https://github.com/HarmanPreet-Singh-XYT/codilay.git && cd codilay$ pip install -e ".[all]"$ codilay setup$ codilay .new to a project?
Run CodiLay, read the doc, ask questions. Productive in minutes.
maintaining legacy code?
Finally understand what connects to what before you touch anything.
using AI assistants?
Export the doc as context. Now your AI actually knows your architecture.
onboarding teammates?
Hand them CODEBASE.md + the chat interface. No more 2-week shadow sessions.