Skip to content

joshuaboys/distil

Repository files navigation

Distil

Token-efficient code analysis for LLMs.

Modern codebases are massive. Even when a model's context window is large enough, dumping raw source buries signal under noise. Distil extracts structure instead of text, reducing context by ~95% while preserving what matters for accurate reasoning.

How It Works

Instead of feeding raw source files into an LLM context, Distil produces structured analysis at five layers of depth. Each layer adds more detail, so you request only what the task needs:

Raw source (10,000 tokens)
    |
    v
L1: AST      (500 tokens)   "What functions exist?"
L2: Calls    (800 tokens)   "Who calls what?"
L3: CFG      (200 tokens)   "How complex is this function?"
L4: DFG      (300 tokens)   "Where does this value flow?"
L5: Slice    (150 tokens)   "What affects line 42?"

Installation

# From source (pnpm monorepo)
git clone https://github.com/joshuaboys/distil.git
cd distil
pnpm install && pnpm build

# Link globally
pnpm -F @distil/cli link --global

Workflows

Before reading unfamiliar code

# Get the lay of the land
distil tree .

# Understand a specific file's structure
distil extract src/auth.ts

Before editing a function

# Who calls this function? What might break?
distil impact validateToken .

# What data flows through it?
distil dfg src/auth.ts validateToken

Before refactoring

# Build the full call graph to see dependencies
distil calls .

# Check complexity before deciding what to simplify
distil cfg src/auth.ts validateToken

Debugging a specific line

# What code affects line 42? (backward slice)
distil slice src/auth.ts validateToken 42

# What does line 42 affect? (forward slice)
distil slice src/auth.ts validateToken 42 --forward

Commands

Command Layer Description
distil tree [path] - File tree structure
distil extract <file> L1 Functions, classes, imports, signatures
distil calls [path] L2 Build project call graph
distil impact <func> [path] L2 Find all callers of a function
distil cfg <file> <func> L3 Control flow graph with complexity
distil dfg <file> <func> L4 Data flow graph with def-use chains
distil slice <file> <func> <line> L5 Program slice (backward/forward)

All commands support --json for programmatic use. Function names use fuzzy matching.

Analysis Precision

L1-L3 (AST, Call Graph, CFG) are structurally exact — they reflect the parse tree and control flow as written.

L4 (DFG) uses a conservative reaching-definitions approximation. For each variable use, Distil connects it to the most recent definition by source line number rather than performing full control-flow-aware reaching-definitions. This means:

  • Definitions in mutually exclusive branches (e.g. if/else) may not be distinguished
  • Loop-carried dependencies use the nearest prior definition heuristic
  • Multiple reaching definitions are marked isMayReach: true

This approximation can introduce both false positives (spurious def-use edges) and false negatives (missing valid edges), especially in the presence of complex control flow such as branching and loops.

L5 (PDG/Slicing) inherits L4's approximation — slices may include some statements that are not strictly relevant and may miss some that are, and are intended as a practical aid rather than a fully sound program analysis.

Supported Languages

Language L1 L2 L3-L5
TypeScript/JavaScript yes yes yes
Python planned - -
Rust planned - -

Architecture

packages/
  distil-core   # Analysis engine (tree-sitter parsers, L1-L5 extractors)
  distil-cli    # Command-line interface (Commander.js)
  distil-mcp    # MCP server for editor/agent integration
              Distil CLI / MCP Server
                        |
                        v
              Distil Analysis Engine
         L1 -> L2 -> L3 -> L4 -> L5
                        |
                        v
                  tree-sitter
            (language-specific grammars)

MCP Server

Distil includes an MCP server for editor and agent integration. Start it with:

distil mcp

Or add to your editor's MCP settings:

{
  "mcpServers": {
    "distil": {
      "command": "distil",
      "args": ["mcp"]
    }
  }
}

Available MCP tools:

Tool Description
distil_extract L1: Extract file structure (functions, classes)
distil_calls L2: Build project call graph
distil_impact L2: Find all callers of a function
distil_cfg L3: Control flow graph with complexity metrics
distil_dfg L4: Data flow graph with def-use chains
distil_slice L5: Program slice (backward/forward)

Workflow prompts: distil_before_editing, distil_debug_line, distil_refactor_impact

Roadmap

Planned features:

  • Semantic search -- natural language code search via embeddings
  • Index warming -- pre-build all analysis layers for fast queries
  • Monorepo support -- per-package analysis with cross-package call graphs

Roadmap details and module specs are in plans/ using APS format. Start at plans/index.aps.md.

Development

pnpm install        # Install dependencies
pnpm build          # Build all packages
pnpm test           # Run all tests
pnpm typecheck      # Type check

License

Apache 2.0

About

Token-efficient code analysis for LLMs — extract structure instead of dumping text

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors