repo-ctx Developer Guide¶

This guide covers architecture, internals, and development practices for contributors and developers extending repo-ctx.

Table of Contents¶

Architecture Overview
Project Structure
Core Components
Unified Client
Service Layer
Code Analysis System
Content Processing
Provider Abstraction
Database Schema
Development Setup
Testing
Adding Features

Architecture Overview¶

repo-ctx uses a layered architecture with a unified client for all interfaces:

flowchart TB
    subgraph Interfaces
        MCP[MCP Server<br/>mcp_server.py]
        CLI[CLI<br/>cli/]
        API[Python API]
    end

    subgraph "Unified Client Layer"
        Client[RepoCtxClient<br/>client/client.py]
        Models[Client Models<br/>client/models.py]
    end

    subgraph "Service Layer"
        RS[RepositoryService]
        IS[IndexingService]
        SS[SearchService]
        AS[AnalysisService]
        CS[ChunkingService]
        ES[EnrichmentService]
    end

    subgraph "Core Layer"
        RC[RepositoryContext<br/>Orchestration]
        Storage[(Storage<br/>SQLite)]
        Parser[Parser<br/>Markdown]
    end

    subgraph Providers
        Factory[Provider Factory]
        Local[LocalProvider]
        GitHub[GitHubProvider]
        GitLab[GitLabProvider]
    end

    subgraph Analysis
        CA[CodeAnalyzer<br/>Orchestrator]
        Joern[JoernAdapter<br/>CPG Analysis]
        TS[Tree-sitter<br/>Extractors]
        DG[DependencyGraph]
    end

    MCP --> Client
    CLI --> Client
    API --> Client

    Client --> RS
    Client --> IS
    Client --> SS
    Client --> AS
    Client --> RC

    RS --> Storage
    IS --> RC
    SS --> Storage
    AS --> CA
    CS --> Parser
    ES --> CS

    RC --> Storage
    RC --> Parser
    RC --> Factory

    Factory --> Local
    Factory --> GitHub
    Factory --> GitLab

    CA --> Joern
    CA --> TS
    CA --> DG

Stakeholder View¶

Stakeholder	Primary Entry Point	Documentation
LLM/AI Users	MCP Server	USER_GUIDE.md
CLI Users	CLI Commands	USER_GUIDE.md
Library Developers	`RepoCtxClient`	API Reference
Contributors	This guide	ADRs

Data Flow¶

Indexing:

Client → IndexingService → Provider → Parser → Storage

Search:

Client → SearchService → Storage → FuzzyMatch → Results

Analysis:

Client → AnalysisService → CodeAnalyzer → Joern/Tree-sitter → Symbols

Content Processing:

Content → ChunkingService → Chunks → EnrichmentService → EnrichedContent

Project Structure¶

repo_ctx/
├── core.py              # Legacy API: RepositoryContext
├── config.py            # Configuration management
├── storage.py           # SQLite persistence (aiosqlite)
├── mcp_server.py        # MCP protocol implementation
├── mcp_tools_ctx.py     # MCP ctx-* tools
├── parser.py            # Markdown processing
├── models.py            # Dataclasses: Library, Document
│
├── client/              # Unified client (NEW)
│   ├── __init__.py      # Package exports
│   ├── client.py        # RepoCtxClient - main interface
│   └── models.py        # Client data models
│
├── services/            # Service layer (NEW)
│   ├── __init__.py      # Factory functions
│   ├── base.py          # BaseService, ServiceContext
│   ├── repository.py    # RepositoryService
│   ├── indexing.py      # IndexingService
│   ├── search.py        # SearchService
│   ├── analysis.py      # AnalysisService
│   ├── chunking.py      # ChunkingService (content splitting)
│   ├── enrichment.py    # EnrichmentService (LLM metadata)
│   ├── embedding.py     # EmbeddingService (optional)
│   ├── llm.py           # LLMService (optional)
│   ├── combined_search.py  # CombinedSearchService
│   └── progress.py      # Progress tracking
│
├── providers/           # Git provider abstraction
│   ├── base.py          # Abstract GitProvider interface
│   ├── factory.py       # Provider factory
│   ├── local.py         # Local filesystem provider
│   ├── github.py        # GitHub API provider
│   └── gitlab.py        # GitLab API provider
│
├── analysis/            # Code analysis
│   ├── __init__.py      # Exports: CodeAnalyzer, DependencyGraph
│   ├── code_analyzer.py # Main analyzer
│   ├── models.py        # Symbol, SymbolType
│   ├── file_enhancer.py # LLM file documentation (parallel, retry)
│   ├── codebase_summarizer.py  # Business/technical summaries
│   ├── prompts.py       # LLM prompt templates
│   ├── architecture.py  # Architecture analysis
│   ├── architecture_rules.py  # Rule checking
│   ├── structural_metrics.py  # XS score
│   ├── python_extractor.py
│   ├── javascript_extractor.py
│   ├── java_extractor.py
│   ├── kotlin_extractor.py
│   ├── smalltalk_extractor.py
│   ├── dependency_graph.py
│   └── interactive_graph.py  # vis.js HTML graph generator
│
├── joern/               # Joern CPG integration
│   ├── adapter.py       # JoernAdapter
│   ├── models.py        # CPG models
│   └── queries.py       # CPGQL helpers
│
├── cli/                 # CLI implementation
│   ├── __init__.py      # CLI entry point
│   ├── context.py       # CLIContext (wraps RepoCtxClient)
│   ├── flat_commands.py # Flat command handlers
│   ├── target.py        # Target detection
│   └── interactive.py   # Interactive mode
│
└── mcp/                 # MCP server components
    ├── __init__.py
    └── server_context.py  # MCPServerContext (wraps RepoCtxClient)

Core Components¶

Configuration (`config.py`)¶

Unified configuration with priority-based loading:

config = Config.load()      # Auto-discovery
config = Config.from_env()  # Environment variables
config = Config.from_yaml("config.yaml")

Priority: CLI args > Environment > Config file > Defaults

Storage (`storage.py`)¶

Async SQLite operations via aiosqlite:

class Storage:
    async def init()
    async def save_library(library: Library)
    async def save_document(document: Document)
    async def search_libraries(query: str, limit: int)
    async def get_documentation(library_id: str, version: str)

Unified Client¶

The RepoCtxClient provides a single interface for all repo-ctx operations, used by CLI, MCP, and library consumers.

Design Principles¶

Dual-mode: Supports direct (service layer) and HTTP (REST API) modes
Async-first: All operations are async
Context manager: Proper resource lifecycle
Type-safe: Typed models with dict conversion

Client Architecture¶

class RepoCtxClient:
    """Unified client for repo-ctx operations."""

    def __init__(
        self,
        config: Config = None,
        api_url: str = None,      # HTTP mode
        api_key: str = None,
        mode: ClientMode = None,  # DIRECT or HTTP
    ):
        # Auto-detect mode based on api_url
        self.mode = mode or (ClientMode.HTTP if api_url else ClientMode.DIRECT)

    async def connect(self) -> None:
        """Initialize client and establish connections."""

    async def close(self) -> None:
        """Clean up resources."""

    # Library operations
    async def list_libraries(self, provider: str = None) -> list[Library]
    async def get_library(self, library_id: str) -> Library | None
    async def search_libraries(self, query: str, fuzzy: bool = True) -> list[SearchResult]

    # Indexing operations
    async def index_repository(self, repository: str, provider: str = None) -> IndexResult
    async def index_group(self, group: str, provider: str = None) -> list[IndexResult]

    # Documentation operations
    async def get_documentation(self, library_id: str, **kwargs) -> dict

    # Analysis operations
    async def analyze_code(self, code: str, file_path: str) -> AnalysisResult

Context Adapters¶

CLI and MCP use context adapters that wrap RepoCtxClient:

# CLI Context (cli/context.py)
class CLIContext:
    def __init__(self, config: Config):
        self._client = RepoCtxClient(config=config)

    async def init(self):
        await self._client.connect()

# MCP Server Context (mcp/server_context.py)
class MCPServerContext:
    def __init__(self, config: Config):
        self._client = RepoCtxClient(config=config)

    @property
    def legacy(self):
        return self._client._legacy_context

Service Layer¶

The service layer provides business logic between interfaces and storage.

ServiceContext¶

Dependency injection container for services:

from repo_ctx.services import create_service_context, ServiceContext

context = create_service_context(config)
# context.content_storage - SQLite storage
# context.vector_storage - Qdrant (optional)
# context.graph_storage - Neo4j (optional)

Available Services¶

Service	Factory Function	Purpose
`RepositoryService`	`create_repository_service()`	Repository management
`IndexingService`	`create_indexing_service()`	Repository indexing
`SearchService`	`create_search_service()`	Search operations
`AnalysisService`	`create_analysis_service()`	Code analysis
`ChunkingService`	`create_chunking_service()`	Content chunking
`EnrichmentService`	`create_enrichment_service()`	LLM metadata
`EmbeddingService`	`create_embedding_service()`	Vector embeddings
`LLMService`	`create_llm_service()`	LLM operations

Creating Services¶

from repo_ctx.services import (
    create_service_context,
    create_repository_service,
    create_analysis_service,
    create_chunking_service,
)

context = create_service_context()
repo_service = create_repository_service(context)
analysis_service = create_analysis_service(context)
chunking_service = create_chunking_service(default_strategy="semantic")

Code Analysis System¶

Backend Selection¶

flowchart TD
    A[analyze_file] --> B{use_treesitter flag?}
    B -->|True| C[Tree-sitter]
    B -->|False| D{Language in JOERN_LANGUAGES?}
    D -->|Yes| E{Joern available?}
    D -->|No| C
    E -->|Yes| F[Joern CPG]
    E -->|No| G{Language in TREESITTER_LANGUAGES?}
    G -->|Yes| C
    G -->|No| H[Return empty]
    F -->|Error| G

CodeAnalyzer¶

class CodeAnalyzer:
    JOERN_LANGUAGES = {
        ".c", ".cpp", ".go", ".php", ".rb", ".swift", ".cs",
        ".py", ".js", ".ts", ".java", ".kt"
    }

    TREESITTER_LANGUAGES = {
        ".py", ".js", ".ts", ".java", ".kt", ".st"
    }

    def analyze_file(code: str, file_path: str) -> tuple[list[Symbol], list[str]]
    def analyze_files(files: dict[str, str]) -> dict[str, list[Symbol]]

Architecture Analysis¶

from repo_ctx.analysis import (
    ArchitectureAnalyzer,
    StructuralMetrics,
)

# DSM, cycles, layers
analyzer = ArchitectureAnalyzer()
dsm = analyzer.compute_dsm(symbols, dependencies)
cycles = analyzer.detect_cycles(symbols, dependencies)
layers = analyzer.detect_layers(symbols, dependencies)

# XS metrics
metrics = StructuralMetrics()
score = metrics.compute_xs_score(symbols, dependencies)

Content Processing¶

ChunkingService¶

Split content optimally for LLM processing:

from repo_ctx.services import ChunkingService, Chunk, ChunkType

chunking = ChunkingService(default_strategy="semantic")

# Semantic chunking (respects code boundaries)
chunks = chunking.chunk(code, "service.py", strategy="semantic")

# Token-aware chunking
chunks = chunking.chunk_for_embedding(content, "doc.md", max_tokens=500)

# LLM context chunking
chunks = chunking.chunk_for_context(content, "doc.md", max_tokens=2000)

Strategies:

Strategy	Class	Best For
`semantic`	`SemanticChunking`	Code files
`markdown`	`MarkdownChunking`	Documentation
`token_based`	`TokenBasedChunking`	Any text
`fixed_size`	`FixedSizeChunking`	Raw text

EnrichmentService¶

Add LLM-powered metadata (with heuristic fallbacks):

from repo_ctx.services import EnrichmentService, create_llm_service

# With LLM
llm = create_llm_service(context, model="gpt-5-mini")
enrichment = EnrichmentService(context, llm_service=llm)

# Without LLM (uses heuristics)
enrichment = EnrichmentService(context, use_llm=False)

# Enrich code
metadata = await enrichment.enrich_code(code, "python", "service.py")
# metadata.summary, metadata.tags, metadata.quality_score

# Enrich symbols
enriched = await enrichment.enrich_symbol(name, qualified_name, "class", code)
# enriched.summary, enriched.tags, enriched.related_concepts

# Enrich documents
enriched = await enrichment.enrich_document(content, "doc.md", "markdown")
# enriched.summary, enriched.tags, enriched.chunks

FileEnhancer¶

Enhances symbol documentation using LLM with full source code context:

from repo_ctx.analysis.file_enhancer import FileEnhancer

enhancer = FileEnhancer(
    llm_service=llm,
    source_root=Path("./src"),
)

# Enhance a single file
result = await enhancer.enhance_file(
    file_path="mymodule/service.py",
    symbols=[{"name": "MyClass", "type": "class"}],
    language="python",
)
# result = {"file": ..., "documentation": ..., "symbols": [...]}

# Enhance multiple files in parallel (recommended)
results = await enhancer.enhance_files_parallel(
    files_data=[...],
    max_concurrency=5,  # Control parallel LLM requests
    progress_callback=lambda f, i, n: print(f"Processing {f} ({i}/{n})"),
)

Configuration Constants:

Constant	Default	Description
`MAX_CODE_CHARS`	12000	Max source code chars per prompt (~3000 tokens)
`MAX_RETRIES`	3	Retry attempts for empty LLM responses
`RETRY_DELAY`	1.0	Seconds between retry attempts

CodebaseSummarizer¶

Generates consolidated codebase summaries from symbol documentation:

from repo_ctx.analysis.codebase_summarizer import CodebaseSummarizer

summarizer = CodebaseSummarizer(llm_service=llm)

# Generate business summary (default)
summary = await summarizer.generate_summary(
    repo_ctx_path=Path(".repo-ctx"),
    project_name="MyProject",
    summary_type="business",  # or "technical"
)

# Generate and save to file
output_path = await summarizer.generate_and_save(
    repo_ctx_path=Path(".repo-ctx"),
    project_name="MyProject",
    output_filename="CODEBASE_SUMMARY.md",
)

Summary Types:

Type	Description
`business`	Non-technical summary for stakeholders (executive summary, capabilities, value)
`technical`	Architecture overview, core modules, design patterns, technical debt

Configuration Constants:

Constant	Default	Description
`MAX_CONTEXT_CHARS`	100000	Max chars for LLM context (~25000 tokens)
`MAX_README_CHARS`	8000	Max README chars to include (~2000 tokens)
`MAX_RETRIES`	3	Retry attempts for empty LLM responses
`RETRY_DELAY`	1.0	Seconds between retry attempts

InteractiveGraphGenerator¶

Generates interactive HTML dependency graphs using vis.js:

from repo_ctx.analysis.interactive_graph import InteractiveGraphGenerator
from pathlib import Path

generator = InteractiveGraphGenerator()

# Generate from nodes and edges
html_path = generator.generate(
    nodes=[
        {"id": "main.py", "type": "file"},
        {"id": "utils.py", "type": "file"},
    ],
    edges=[
        ("main.py", "utils.py", "import"),
    ],
    output_path=Path(".repo-ctx/architecture/dependencies.html"),
    title="My Project Dependencies",
    cycles=[["main.py", "utils.py"]],  # Optional: highlight cycle nodes
    layers=[{"level": 0, "nodes": ["main.py"]}],  # Optional: layer info
)

# Generate from architecture data dict (from DumpService)
html_path = generator.generate_from_arch_data(
    arch_data={
        "nodes": ["main.py", "utils.py"],
        "edges": [("main.py", "utils.py", "import")],
        "cycles": {"cycles": []},
        "layers": {"layers": []},
    },
    output_path=Path("dependencies.html"),
    title="Architecture Overview",
)

Features:

Feature	Description
Search	Find nodes by name
Filter	Show all, high-connectivity, or cycle nodes
Layout	Force-directed or hierarchical layouts
Node info	Click nodes to see connections
Color coding	Nodes colored by type (file, class, package, cycle)
Sidebar	Statistics, legend, and selected node details

Usage in DumpService:

The interactive graph is automatically generated during dump with MEDIUM or FULL level: - Output: .repo-ctx/architecture/dependencies.html - Linked from ARCHITECTURE_SUMMARY.md

Provider Abstraction¶

GitProvider Interface¶

class GitProvider(ABC):
    @abstractmethod
    async def get_project(path: str) -> Project

    @abstractmethod
    async def get_file_tree(project, ref: str) -> list[str]

    @abstractmethod
    async def read_file(project, path: str, ref: str) -> str

    @abstractmethod
    async def get_branches(project) -> list[str]

    @abstractmethod
    async def get_tags(project) -> list[str]

Adding a New Provider¶

Create providers/newprovider.py implementing GitProvider
Register in providers/factory.py
Add config support in config.py
Write tests in tests/test_providers_newprovider.py

Database Schema¶

erDiagram
    libraries ||--o{ versions : has
    versions ||--o{ documents : contains
    libraries ||--o{ symbols : has
    symbols ||--o{ dependencies : has

    libraries {
        int id PK
        string group_name
        string project_name
        string description
        string provider
        timestamp last_indexed
    }

    versions {
        int id PK
        int library_id FK
        string version_tag
        string commit_sha
    }

    documents {
        int id PK
        int version_id FK
        string file_path
        text content
        string content_type
        int tokens
    }

    symbols {
        int id PK
        int library_id FK
        string name
        string qualified_name
        string symbol_type
        string file_path
        int line_start
        int line_end
        text signature
        string visibility
        string language
        text documentation
    }

    dependencies {
        int id PK
        int library_id FK
        string source_name
        string target_name
        string dependency_type
        string file_path
        int line
    }

Development Setup¶

Prerequisites¶

Python 3.10+
uv (recommended) or pip
Git

Installation¶

git clone https://github.com/anthropics/repo-ctx.git
cd repo-ctx

# Install with development dependencies
uv pip install -e ".[dev]"

# Or using pip
pip install -e ".[dev]"

Optional: Joern¶

# Requires Java 19+
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" | bash
export PATH="$HOME/bin/joern:$PATH"
joern --version

Testing¶

Running Tests¶

# All tests
pytest tests/ -v

# With coverage
pytest tests/ -v --cov=repo_ctx

# Specific categories
pytest tests/test_client.py -v           # Unified client
pytest tests/test_chunking.py -v         # Chunking service
pytest tests/test_enrichment.py -v       # Enrichment service
pytest tests/test_mcp_server_context.py  # MCP context
pytest tests/test_analysis_*.py -v       # Analysis tests
pytest tests/test_providers_*.py -v      # Provider tests

Test Coverage Requirements¶

Overall: >80%
New features: >90%
Critical paths: 100%

Mocking Strategy¶

Git APIs: Mock python-gitlab, PyGithub responses
File System: Use tmp_path fixture
Database: Use :memory: SQLite
External Services: Mock HTTP responses

Adding Features¶

New Service¶

Create services/new_service.py:

from repo_ctx.services.base import BaseService, ServiceContext

class NewService(BaseService):
    def __init__(self, context: ServiceContext, **kwargs):
        super().__init__(context)
        # Initialize

    async def operation(self, param: str) -> Result:
        # Implementation
        pass

Add factory in services/__init__.py:

def create_new_service(context: ServiceContext, **kwargs) -> NewService:
    return NewService(context, **kwargs)

Export in __all__
Write tests in tests/test_new_service.py

New MCP Tool¶

Add tool schema in mcp_tools_ctx.py or mcp_server.py
Implement handler
Add tests

New CLI Command¶

Add command in cli/flat_commands.py
Write tests

New Language Extractor¶

Create analysis/language_extractor.py
Register in analysis/code_analyzer.py
Add tree-sitter dependency
Write tests

Key Classes Reference¶

Class	Module	Purpose
`RepoCtxClient`	`client/client.py`	Unified client interface
`CLIContext`	`cli/context.py`	CLI context wrapper
`MCPServerContext`	`mcp/server_context.py`	MCP context wrapper
`ServiceContext`	`services/base.py`	Service dependency injection
`ChunkingService`	`services/chunking.py`	Content chunking
`EnrichmentService`	`services/enrichment.py`	LLM metadata
`FileEnhancer`	`analysis/file_enhancer.py`	LLM file documentation with parallel processing
`CodebaseSummarizer`	`analysis/codebase_summarizer.py`	Business/technical codebase summaries
`InteractiveGraphGenerator`	`analysis/interactive_graph.py`	vis.js interactive HTML graph
`CodeAnalyzer`	`analysis/code_analyzer.py`	Code analysis
`ArchitectureAnalyzer`	`analysis/architecture.py`	Architecture analysis
`GitProvider`	`providers/base.py`	Provider interface