Problem LLMs can't read entire codebases. A 100K-line codebase produces ~400K tokens, exceeding context windows. Even when it fits, LLMs drown in irrelevant details, wasting tokens on code that doesn't matter for the current task.
Why this work matters Distil extracts structure instead of dumping text. It provides 5 layers of code analysis (AST, Call Graph, CFG, DFG, PDG) that give LLMs exactly what they need to understand and edit code correctly—at 95% fewer tokens than raw code.
Non-goals (explicit)
- Distil does not replace language servers or IDEs
- Distil does not provide real-time incremental parsing (Salsa-style)
- Distil does not run as a daemon (uses Kindling for persistence instead)
- Distil does not include local embedding models (uses external APIs)
Those concerns are either out of scope or deferred to future versions.
Success Criteria
- A developer can get LLM-ready context for any function with a single command
- Token usage is reduced by 80%+ compared to raw file reads
- Call graph analysis enables safe refactoring with impact visibility
- Program slicing isolates only the code affecting a specific line
- The project integrates cleanly with Kindling for caching and persistence
- The project is safe to open-source under Apache-2.0
@distil/core→ depends on →tree-sitter(+ optional language parsers)@distil/cli→ depends on →@distil/core
@distil/core→ depends on →@kindling/core(M2)@distil/core→ depends on →@kindling/store-sqlite(M2)@distil/mcp→ depends on →@distil/core(M6)@distil/mcp→ depends on →@modelcontextprotocol/sdk(M6)@distil/cli→ depends on →@distil/mcp(M6, optional)
- Public repository created
- Package boundaries enforced (core / cli)
- Tree-sitter TypeScript/JavaScript parser integrated
- L1 AST extraction (functions, classes, imports, signatures)
- Core types defined and validated
- CLI
distil treeanddistil extractavailable
Target: distil extract <file> works for TS/JS files ✅
- Cross-file call graph construction
- Forward edges (what does this function call?)
- Backward edges (what calls this function?)
- Kindling integration for caching analysis results Note: Kindling caching layer is scaffolded (store, observations, types, config) but not wired into the analysis pipeline. Analysis results are not actively cached yet. See CORE-005.
- Impact analysis command
Target: distil impact <function> shows all callers ✅
- L3: Control Flow Graph extraction with cyclomatic complexity
- L4: Data Flow Graph with def-use chains
- L5: Program Dependence Graph with backward/forward slicing
-
distil cfg,distil dfg,distil slicecommands
Target: distil slice <file> <func> <line> returns relevant lines only ✅
-
.distilignorefile support (.gitignoresyntax) - Monorepo workspace detection (pnpm, npm, lerna)
- Cross-package call graph resolution
-
--packagescoping flag -
--no-ignoreoverride flag
Target: Distil works correctly in monorepos and respects project ignore patterns
- External embeddings API integration (OpenAI/Anthropic)
- Semantic search over function behaviors
- Index warming command with progress display
-
distil contextLLM-ready output command - Output formatting system (--json, --compact)
- Configuration file support (
.distil/config.json)
Target: distil semantic "validate JWT tokens" finds relevant functions
- MCP server package (
@distil/mcp) - Tool definitions for all analysis layers (6 tools)
- Resource and prompt definitions (3 prompts)
-
distil mcpCLI subcommand - Editor integration testing (Claude Code, Cursor)
Target: Editors and agents query Distil analysis via MCP protocol
- Python language support (all layers)
- Rust language support (all layers)
- C# language support (all layers)
Target: Full parity for Python, Rust, C# alongside TypeScript/JavaScript
- Path: modules/distil-core.aps.md
- Scope: CORE
- Owner: @aneki
- Status: In Progress
- Priority: high
- Tags: analysis, ast, callgraph, cfg, dfg, pdg
- Dependencies: tree-sitter (current), @kindling/core/@kindling/store-sqlite (planned)
- Path: modules/distil-cli.aps.md
- Scope: CLI
- Owner: @aneki
- Status: In Progress
- Priority: high
- Tags: cli, tooling
- Dependencies: @distil/core
- Path: modules/distil-mcp.aps.md
- Scope: MCP
- Owner: @aneki
- Status: In Progress
- Priority: medium
- Tags: mcp, integration, editor
- Dependencies: @distil/core, @modelcontextprotocol/sdk
Prioritized queue of ready work across all packages:
| # | Work Item | Module | Packages | Owner | Status | Priority |
|---|---|---|---|---|---|---|
| 1 | CLI-013 | cli | cli | @aneki | Ready | P0 |
| 2 | CORE-014 | core | core | @aneki | Ready | P1 |
| 3 | CORE-015 | core | core | @aneki | Ready | P1 |
| 4 | CORE-016 | core | core | @aneki | Ready | P1 |
| 5 | CORE-021 | core | core | @aneki | Ready | P1 |
| 6 | MCP-005 | mcp | mcp | @aneki | Ready | P1 |
| 7 | CLI-014 | cli | cli | @aneki | Ready | P1 |
| 8 | CORE-019 | core | core | @aneki | Ready | P2 |
| 9 | CLI-001 | cli | cli | @aneki | In Progress | — |
| 10 | CLI-002 | cli | cli | @aneki | In Progress | — |
| 11 | CORE-009 | core | core | @aneki | Ready | — |
| 12 | CORE-011 | core | core | @aneki | Ready | — |
| 13 | CLI-010 | cli | cli | @aneki | Ready | — |
- D-001: Distil uses Kindling for caching, not a custom daemon or file-based cache
- D-002: Tree-sitter is the parsing foundation for all language support
- D-003: Analysis layers are composable; higher layers build on lower ones
- D-004: Observation kinds in Kindling are generic (
code.symbol,code.callgraph, etc.); Distil-specific detail lives in metadata - D-005: Database location defaults to
.kindling/distil.db, configurable via CLI/env/config - D-006: Future language parsers (Python, Rust, C#) are optional peer dependencies
- D-007: Semantic search uses external embedding APIs (OpenAI/Anthropic), not local models
- D-008: TypeScript/JavaScript are the initial supported languages; others are future milestones
- D-009: MCP server uses stdio transport; started via
distil mcpor configured in editor settings - D-010:
.distilignoreuses.gitignoresyntax and is checked into version control - D-011: Monorepo detection is automatic;
--packageflag for explicit scoping - D-012: OmO review (2026-03-06) findings tracked as CORE-014..020 and CLI-013..015; prioritized P0-P3
- D-013: Review gaps (benchmarking, MCP execution tests) tracked as CORE-021 and MCP-005; both P1
No open questions at this time.
Decision: Treat JSX/TSX as variants of JS/TS, not distinct languages.
Rationale:
- Tree-sitter already handles this via grammar switching (TSX grammar for .tsx/.jsx)
- Simpler API with unified
typescript/javascriptlanguage identifiers - Matches industry standard tooling behavior
- JSX-specific analysis (components, hooks) can be added as optional metadata later
Implementation: Current approach is correct. Optionally add hasJSX: boolean to ModuleInfo if JSX-specific analysis becomes valuable.
Decision: Use nearest tsconfig.json heuristic with explicit override support.
Rationale:
- Works naturally for most monorepo structures
- Explicit
--projectflag covers edge cases - Avoids over-engineering for M1-M2
Implementation (M1-M2):
- Find nearest
tsconfig.jsonwalking up from file - Add
--project <path>CLI flag for explicit override
Implementation (M3+):
- Add workspace detection (pnpm-workspace.yaml, package.json workspaces)
- Add
.distil/config.jsonfor complex configurations - Cache tsconfig parsing per-project