Architecture Analysis Guide¶

This guide provides in-depth documentation of repo-ctx's architecture analysis capabilities, including algorithms, data structures, and practical examples.

Table of Contents¶

Overview
Use Cases
Dependency Structure Matrix (DSM)
Cycle Detection
Layer Detection
Architecture Rules
Structural Metrics (XS)
Hotspot Detection
Dependency Graphs
Implementation Reference
CLI and MCP Tools
LLM Integration for Software Modernization
Best Practices

Overview¶

repo-ctx provides comprehensive architecture analysis capabilities for understanding, visualizing, and enforcing code structure:

                     Dependency Graph
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
         ▼                 ▼                 ▼
   ┌─────────┐      ┌──────────┐      ┌─────────┐
   │   DSM   │      │  Cycles  │      │ Layers  │
   │ Matrix  │      │ Tangles  │      │ Detect  │
   └─────────┘      └──────────┘      └─────────┘
         │                 │                 │
         └─────────────────┼─────────────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │   Architecture  │
                  │      Rules      │
                  └─────────────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │    XS Metrics   │
                  │   & Hotspots    │
                  └─────────────────┘

Analysis Types¶

Analysis	Purpose	Use Case
DSM	Visualize dependencies as matrix	Identify coupling patterns
Cycles	Detect cyclic dependencies	Find architectural tangles
Layers	Discover natural layering	Understand code structure
Rules	Enforce architecture	Prevent violations
XS Metrics	Quantify complexity	Track technical debt
Hotspots	Find problematic nodes	Prioritize refactoring

Use Cases¶

This section explains when to use each analysis tool and what problems they help solve.

When to Use DSM¶

Best for: Understanding coupling patterns, visualizing dependencies, identifying clusters of tightly-coupled code.

Scenario	DSM Helps By
"How tangled is this codebase?"	Showing dependency density as a matrix - dense matrices = high coupling
"Which modules are too interconnected?"	Highlighting off-diagonal marks that indicate cross-module dependencies
"Is this a layered architecture?"	A triangular matrix indicates clean layers; scattered marks reveal violations
"What will break if I change this?"	Column shows what depends on a component; row shows what it depends on
Code review / onboarding	Quick visual overview of codebase structure

Example Use Case: Before a major refactoring, generate a DSM to understand which components are tightly coupled. Target the densest areas first to reduce blast radius of changes.

repo-ctx dsm ./src --type module -f text

When to Use Cycle Detection¶

Best for: Finding circular dependencies that prevent modularization, block independent testing, or cause maintenance nightmares.

Scenario	Cycle Detection Helps By
"Why can't I test this module in isolation?"	Cycles mean you can't test A without B and vice versa
"Why does changing X break seemingly unrelated Y?"	Cycles create hidden coupling paths
"How do I split this monolith?"	Cycles must be broken before modules can be extracted
"Why is compilation so slow?"	Cycles prevent incremental builds
CI/CD pipeline failures	Enforce no-new-cycles policy in CI

Example Use Case: Your microservice extraction project is stuck because services and repositories have cycles. Use cycle detection to find the specific edges to break.

repo-ctx cycles ./src --type class -f json

When to Use Layer Detection¶

Best for: Understanding implicit architecture, discovering natural boundaries, planning reorganization.

Scenario	Layer Detection Helps By
"What's the structure of this inherited codebase?"	Auto-discovers layers without needing documentation
"Are we following our stated architecture?"	Compares detected layers against documented design
"How should I organize new code?"	Shows where similar code naturally belongs
"Which components are truly foundational?"	Level-0 layers are depended on by everything
"What's the dependency direction?"	Higher layers depend on lower, never reverse

Example Use Case: You joined a project with no architecture docs. Use layer detection to reverse-engineer the actual structure and understand the dependency hierarchy.

repo-ctx layers ./src --type module

When to Use Architecture Rules¶

Best for: Enforcing architectural constraints, preventing drift, defining team standards.

Scenario	Architecture Rules Help By
"Enforce Clean Architecture"	Define layers and block upward dependencies
"Prevent feature coupling"	Forbid dependencies between feature modules
"Legacy code migration"	Gradually enforce new structure while allowing old
"Team alignment"	Document and automatically enforce architecture decisions
"PR reviews"	Catch violations before they merge

Example Use Case: Your team agreed on Clean Architecture but violations keep appearing. Define rules in YAML and run in CI to catch violations early.

# architecture.yaml defines the rules
repo-ctx architecture ./src --rules architecture.yaml

When to Use XS Metrics¶

Best for: Quantifying technical debt, tracking improvement over time, prioritizing refactoring efforts.

Scenario	XS Metrics Help By
"How bad is our technical debt?"	Single score + grade for quick assessment
"Is code quality improving or degrading?"	Track XS score over commits/sprints
"Where should we focus refactoring?"	Breakdown shows cycles vs coupling vs violations
"Justify refactoring to management"	Quantifiable metrics instead of "feels bad"
"Compare modules"	Run on different directories to compare health

Example Use Case: You're planning a refactoring sprint. Use XS metrics to identify which module has the worst score, then use the breakdown to understand why.

repo-ctx metrics ./src -r architecture.yaml -f json

When to Use Hotspot Detection¶

Best for: Finding the most problematic components, prioritizing tactical fixes.

Scenario	Hotspot Detection Helps By
"What's the riskiest part of the codebase?"	High-severity hotspots = highest risk
"Which class should I refactor first?"	Hotspots sorted by severity
"What's causing cascading failures?"	Cycle participants often trigger ripple effects
"God class detection"	High coupling hotspots are often too-large classes

Example Use Case: Your bug count is concentrated in certain areas. Use hotspot detection to see if those areas correlate with architectural problems.

repo-ctx metrics ./src | grep -A 10 "Hotspots"

Decision Matrix¶

I want to...	Use
Get a visual overview of dependencies	DSM
Find and break circular dependencies	Cycles
Understand the layered structure	Layers
Enforce architectural rules in CI	Architecture
Quantify and track technical debt	Metrics
Find worst offenders to fix first	Hotspots

Combined Workflow¶

For comprehensive architecture analysis, use tools in this order:

DSM - Get the big picture
Layers - Understand the natural structure
Cycles - Identify critical problems
Architecture - Define and enforce rules
Metrics - Quantify and track progress

Dependency Structure Matrix (DSM)¶

Concept¶

A Dependency Structure Matrix (DSM) is a square matrix representation of dependencies where: - Rows and columns represent code elements (classes, modules, files) - Cell (i, j) indicates element i depends on element j - A triangular (lower or upper) matrix indicates no cycles - clean layered architecture - Non-triangular patterns reveal cyclic dependencies

Example¶

Consider this code structure:

Controller → Service → Repository → Database
     ↓
   Utility

DSM Representation:

              Controller  Service  Repository  Database  Utility
Controller         .         1          0          0        1
Service            0         .          1          0        0
Repository         0         0          .          1        0
Database           0         0          0          .        0
Utility            0         0          0          0        .

Reading: "Controller depends on Service (1) and Utility (1)"

With Cycles¶

If Repository also imports Controller (cycle):

              Controller  Service  Repository  Database  Utility
Controller         .         1          0          0        1
Service            0         .          1          0        0
Repository         1         0          .          1        0    ← Cycle!
Database           0         0          0          .        0
Utility            0         0          0          0        .

The 1 at (Repository, Controller) breaks the triangular pattern, indicating a cycle.

Algorithm¶

def build_dsm(graph):
    """Build DSM from dependency graph.

    1. Collect all nodes
    2. Sort nodes (optionally by layer/partitioning)
    3. Build NxN matrix
    4. For each edge (a → b): matrix[index(a)][index(b)] = 1
    5. Detect cycles via non-triangular cells
    """
    nodes = sorted(graph.nodes.keys())
    n = len(nodes)
    matrix = [[0] * n for _ in range(n)]
    node_index = {node: i for i, node in enumerate(nodes)}

    for edge in graph.edges:
        i = node_index[edge.source]
        j = node_index[edge.target]
        matrix[i][j] += 1  # Count dependencies

    return DSMResult(matrix, nodes)

Complexity: O(N + E) where N = nodes, E = edges

CLI Usage¶

# Generate DSM for local code
repo-ctx dsm ./src --type class

# Output as JSON
repo-ctx dsm ./src -f json

# Different graph types
repo-ctx dsm ./src --type file
repo-ctx dsm ./src --type module

Output Example¶

DSM: ./src (class graph)
Size: 5x5 | Cycles: 1

      Ctrl Svc  Repo DB   Util
Ctrl  .    1    0    0    1
Svc   0    .    1    0    0
Repo  1    0    .    1    0    ← Cycle indicator
DB    0    0    0    .    0
Util  0    0    0    0    .

Cycles detected in: Controller ↔ Repository

Cycle Detection¶

Concept¶

Cyclic dependencies (tangles) are a major source of architectural problems: - Make code harder to understand - Prevent independent testing - Create ripple effects during changes - Block modularization efforts

Tarjan's Algorithm¶

We use Tarjan's Strongly Connected Components (SCC) algorithm to detect cycles:

def tarjan_scc(graph):
    """Find all strongly connected components.

    A SCC with more than one node indicates a cycle.
    Time complexity: O(V + E)
    """
    index_counter = [0]
    stack = []
    lowlinks = {}
    index = {}
    on_stack = {}
    sccs = []

    def strongconnect(node):
        index[node] = index_counter[0]
        lowlinks[node] = index_counter[0]
        index_counter[0] += 1
        stack.append(node)
        on_stack[node] = True

        for neighbor in graph.neighbors(node):
            if neighbor not in index:
                strongconnect(neighbor)
                lowlinks[node] = min(lowlinks[node], lowlinks[neighbor])
            elif on_stack.get(neighbor, False):
                lowlinks[node] = min(lowlinks[node], index[neighbor])

        # If node is root of SCC
        if lowlinks[node] == index[node]:
            scc = []
            while True:
                w = stack.pop()
                on_stack[w] = False
                scc.append(w)
                if w == node:
                    break
            if len(scc) > 1:  # Cycle exists
                sccs.append(scc)

    for node in graph.nodes:
        if node not in index:
            strongconnect(node)

    return sccs

Breakup Suggestions¶

For each cycle, we calculate which edge removal would have minimal impact:

def suggest_breakup(cycle, graph):
    """Suggest edges to remove to break the cycle.

    Strategy: Find edge with lowest "impact score":
    - Impact = importance_of_source × importance_of_target
    - Importance = in_degree + out_degree (connectivity)

    Removing edges between less central nodes has less impact.
    """
    suggestions = []
    for edge in cycle.edges:
        source_importance = graph.degree(edge.source)
        target_importance = graph.degree(edge.target)
        impact = source_importance + target_importance

        suggestions.append(BreakupSuggestion(
            edge_to_remove=edge,
            impact_score=impact,
            reason=f"Remove {edge.source} → {edge.target}"
        ))

    return sorted(suggestions, key=lambda s: s.impact_score)

CLI Usage¶

# Detect cycles
repo-ctx cycles ./src --type class

# JSON output with breakup suggestions
repo-ctx cycles ./src -f json

Output Example¶

Cycle Detection: ./src

Found 2 cycles:

Cycle 1: (3 nodes, impact: 8.5)
  Nodes: Controller → Service → Repository → Controller
  Edges: 3
  Breakup suggestions:
    1. Remove Repository → Controller (lowest impact)
    2. Remove Service → Repository

Cycle 2: (2 nodes, impact: 4.0)
  Nodes: ModelA ↔ ModelB
  Edges: 2
  Breakup suggestions:
    1. Remove ModelB → ModelA

Layer Detection¶

Concept¶

Automatically discover the natural layering of code based on dependency patterns: - Bottom layers (level 0): Nodes with no outgoing dependencies (providers) - Top layers (higher levels): Nodes that depend on lower layers (consumers) - Cycles are collapsed into single "super-nodes" before analysis

Algorithm¶

def detect_layers(graph):
    """Detect layers using topological analysis.

    1. Detect cycles and collapse into super-nodes
    2. Calculate level for each super-node:
       level(node) = max(level(dependencies)) + 1
       level(leaf) = 0
    3. Group nodes by level
    """
    # Step 1: Collapse cycles
    cycles = tarjan_scc(graph)
    super_nodes = collapse_cycles(graph, cycles)

    # Step 2: Calculate levels (reverse BFS)
    levels = {}

    def get_level(node, visited):
        if node in levels:
            return levels[node]
        if node in visited:
            return 0  # Cycle in super-graph (shouldn't happen)

        visited.add(node)
        max_dep_level = -1

        for dep in super_nodes.dependencies(node):
            dep_level = get_level(dep, visited)
            max_dep_level = max(max_dep_level, dep_level)

        levels[node] = max_dep_level + 1
        return levels[node]

    for node in super_nodes:
        get_level(node, set())

    # Step 3: Group by level
    layers = defaultdict(list)
    for node, level in levels.items():
        layers[level].extend(super_nodes.original_nodes(node))

    return [LayerInfo(f"Layer {l}", l, nodes)
            for l, nodes in sorted(layers.items())]

Example¶

A → B → C → D
    ↓
    E → F

Detected Layers:
- Layer 0 (bottom): D, F  (no outgoing deps)
- Layer 1: C, E
- Layer 2: B
- Layer 3 (top): A

With a cycle B ↔ C, they collapse to same layer:

- Layer 0: D, F
- Layer 1: B, C, E  (B and C collapsed due to cycle)
- Layer 2: A

CLI Usage¶

# Detect layers
repo-ctx layers ./src --type class

# JSON output
repo-ctx layers ./src -f json

Output Example¶

Detected 4 layer(s) in ./src

Graph type: class
Total nodes: 42

Level 3: Layer 3
  Nodes (5): AppController, MainController, ApiController ...

Level 2: Layer 2
  Nodes (12): UserService, AuthService, DataService ...

Level 1: Layer 1
  Nodes (15): UserRepository, ConfigRepository ...

Level 0: Layer 0
  Nodes (10): DatabaseConnection, Logger, Constants ...

Architecture Rules¶

Concept¶

Define and enforce architectural constraints using a YAML-based DSL: - Layer rules: Define which layers can depend on which - Forbidden rules: Block specific dependency patterns - Allowed rules: Exceptions to forbidden rules

Rule Types¶

1. Layer Ordering Rules¶

layers:
  - name: presentation
    patterns: ["*.controller.*", "*.view.*"]
    above: business
  - name: business
    patterns: ["*.service.*", "*.usecase.*"]
    above: data
  - name: data
    patterns: ["*.repository.*", "*.dao.*"]

Meaning: presentation can depend on business, but business cannot depend on presentation.

2. Forbidden Dependency Rules¶

forbidden:
  - from: "*.controller.*"
    to: "*.repository.*"
    reason: "Controllers must not access repositories directly"
  - from: "*.data.*"
    to: "*.ui.*"
    reason: "Data layer must not depend on UI"

3. Allowed Rules (Exceptions)¶

allowed:
  - from: "*.service.*"
    to: "*.repository.*"
    reason: "Services can access repositories"

Pattern Matching¶

Patterns support: - Exact match: "UserService" matches node UserService - Wildcards: "*.service.*" matches com.app.service.UserService - Prefix match: "ui" matches ui.View, ui.Controller

Violation Detection Algorithm¶

def check_rules(graph, rules):
    """Check all rules against dependency graph.

    For each edge (source → target):
    1. Check forbidden rules: if matches both patterns → violation
    2. Check layer rules: if source in lower layer, target in upper → violation
    3. Check allowed rules: if explicitly allowed → skip violation
    """
    violations = []

    for edge in graph.edges:
        # Check forbidden rules
        for rule in rules.forbidden_rules:
            if matches(edge.source, rule.from_pattern) and \
               matches(edge.target, rule.to_pattern):
                if not is_explicitly_allowed(edge, rules):
                    violations.append(Violation(
                        rule_name="forbidden",
                        source=edge.source,
                        target=edge.target,
                        message=rule.reason
                    ))

        # Check layer rules
        for rule in rules.layer_rules:
            source_in_lower = matches(edge.source, rule.lower_layer)
            target_in_upper = matches(edge.target, rule.upper_layer)
            if source_in_lower and target_in_upper:
                violations.append(Violation(
                    rule_name="layer_order",
                    source=edge.source,
                    target=edge.target,
                    message=f"{rule.lower_layer} cannot depend on {rule.upper_layer}"
                ))

    return violations

Complete Example¶

Architecture Rules File (architecture.yaml):

name: "Clean Architecture"
description: "Layered architecture with strict boundaries"

layers:
  - name: ui
    patterns: ["*.ui.*", "*.view.*", "*.controller.*"]
    above: domain
  - name: domain
    patterns: ["*.domain.*", "*.service.*", "*.usecase.*"]
    above: data
  - name: data
    patterns: ["*.data.*", "*.repository.*", "*.dao.*"]

forbidden:
  - from: "*.data.*"
    to: "*.ui.*"
    reason: "Data layer must not depend on UI"
  - from: "*.controller.*"
    to: "*.dao.*"
    reason: "Controllers should use services, not DAOs"

allowed:
  - from: "*.ui.*"
    to: "*.domain.*"
    reason: "UI can access domain services"

CLI Usage¶

# Check architecture rules
repo-ctx architecture ./src --rules architecture.yaml

# JSON output
repo-ctx architecture ./src -r rules.yaml -f json

Output Example¶

Architecture Analysis: ./src

Graph type: class
Total nodes: 42
Rules: architecture.yaml
Architecture: Clean Architecture

Layers (3):
  Level 2: ui (8 nodes)
  Level 1: domain (18 nodes)
  Level 0: data (16 nodes)

Violations (2):
  [ERROR] layer_order: data cannot depend on ui
    data.UserRepository -> ui.UserView
    at src/data/user_repository.py:45

  [ERROR] forbidden: Controllers should use services, not DAOs
    controller.UserController -> dao.UserDao
    at src/controller/user_controller.py:23

Structural Metrics (XS)¶

Concept¶

XS (eXcess Structural complexity) quantifies architectural health as a single score: - Higher score = more complexity/problems - Score is broken down into contributing factors - Grade (A-F) provides quick assessment

XS Score Formula¶

XS = cycle_contribution + coupling_contribution + size_contribution + violation_contribution

Where:
- cycle_contribution    = cycle_count × 15.0
- coupling_contribution = max(0, avg_coupling - 3.0) × node_count × 2.0
- size_contribution     = max(0, node_count - 50) × 0.1
- violation_contribution = violation_count × 5.0

Component Explanations¶

Component	Weight	Meaning
Cycles	15/cycle	Each cycle adds significant complexity
Coupling	2.0/excess	High interconnection makes changes risky
Size	0.1/node	Very large modules harder to maintain
Violations	5/violation	Architecture violations indicate problems

Grade Thresholds¶

Grade	XS Score	Description
A	0-20	Excellent - Clean architecture
B	20-40	Good - Well-structured
C	40-60	Moderate - Notable issues
D	60-80	Poor - Significant problems
F	80+	Critical - Major refactoring needed

Algorithm¶

class XSCalculator:
    CYCLE_WEIGHT = 15.0
    COUPLING_WEIGHT = 2.0
    SIZE_WEIGHT = 0.1
    VIOLATION_WEIGHT = 5.0
    COUPLING_THRESHOLD = 3.0
    SIZE_THRESHOLD = 50

    def calculate(self, graph, violations=None):
        violations = violations or []

        # Detect cycles
        cycles = CycleDetector().detect(graph)
        cycle_contribution = len(cycles) * self.CYCLE_WEIGHT

        # Calculate coupling
        avg_coupling = len(graph.edges) / len(graph.nodes) if graph.nodes else 0
        excess_coupling = max(0, avg_coupling - self.COUPLING_THRESHOLD)
        coupling_contribution = excess_coupling * len(graph.nodes) * self.COUPLING_WEIGHT

        # Calculate size penalty
        excess_size = max(0, len(graph.nodes) - self.SIZE_THRESHOLD)
        size_contribution = excess_size * self.SIZE_WEIGHT

        # Calculate violation contribution
        violation_contribution = len(violations) * self.VIOLATION_WEIGHT

        # Total score
        xs_score = (cycle_contribution + coupling_contribution +
                    size_contribution + violation_contribution)

        # Assign grade
        grade = self.grade(xs_score)

        return XSMetrics(
            xs_score=xs_score,
            grade=grade,
            cycle_count=len(cycles),
            # ... other fields
        )

CLI Usage¶

# Calculate XS metrics
repo-ctx metrics ./src --type class

# With architecture rules (violations add to score)
repo-ctx metrics ./src --rules architecture.yaml

# JSON output
repo-ctx metrics ./src -f json

Output Example¶

Structural Metrics: ./src

Grade: C - Moderate - Notable structural issues that should be addressed
XS Score: 47.5

Nodes: 42 | Edges: 68
Cycles: 2 | Violations: 3

Score Breakdown:
  Cycles:       30.0
  Coupling:      7.5
  Size:          0.0
  Violations:   10.0

Hotspots (3):
  ServiceManager (cycle_participant) - severity: 6.0
  DataAccess (high_coupling) - severity: 5.5
  Controller (cycle_participant) - severity: 5.0

Hotspot Detection¶

Concept¶

Hotspots are nodes that contribute disproportionately to complexity: - High coupling: Nodes with many incoming/outgoing dependencies - Cycle participants: Nodes involved in cyclic dependencies

Detection Algorithm¶

class HotspotDetector:
    HIGH_COUPLING_THRESHOLD = 5  # Total connections

    def detect(self, graph):
        hotspots = []

        # Calculate node degrees
        in_degree = defaultdict(int)
        out_degree = defaultdict(int)
        for edge in graph.edges:
            out_degree[edge.source] += 1
            in_degree[edge.target] += 1

        # Detect high coupling hotspots
        for node_id, node in graph.nodes.items():
            total = in_degree[node_id] + out_degree[node_id]
            if total >= self.HIGH_COUPLING_THRESHOLD:
                hotspots.append(Hotspot(
                    node_id=node_id,
                    reason="high_coupling",
                    severity=min(10.0, total / 2.0),
                    details={"connections": total}
                ))

        # Detect cycle participants
        cycles = CycleDetector().detect(graph)
        cycle_counts = defaultdict(int)
        for cycle in cycles:
            for node_id in cycle.nodes:
                cycle_counts[node_id] += 1

        for node_id, count in cycle_counts.items():
            hotspots.append(Hotspot(
                node_id=node_id,
                reason="cycle_participant",
                severity=min(10.0, count * 2.0 + 3.0),
                details={"cycle_count": count}
            ))

        return sorted(hotspots, key=lambda h: h.severity, reverse=True)

Severity Scale¶

Severity	Meaning	Action
8-10	Critical	Immediate refactoring
5-7	High	Plan refactoring
3-4	Moderate	Monitor
1-2	Low	Note for future

Dependency Graphs¶

Concept¶

Dependency graphs visualize relationships between code elements (classes, functions, files, modules) as directed graphs. repo-ctx supports multiple graph types and automatically extracts various edge types from code analysis.

Graph Types¶

Type	Description	Best For
`class`	Class-level dependencies including inheritance, calls, and usage	Understanding class relationships
`function`	Function/method call graph	Tracing execution flow
`file`	File-level import dependencies	Build order, modularization
`module`	Package/module dependencies	High-level architecture

Edge Types (Relationship Types)¶

The class dependency graph extracts these relationship types:

Edge Type	Description	Example
`INHERITS`	Class inheritance	`class Dog extends Animal`
`IMPLEMENTS`	Interface implementation	`class Service implements IService`
`CALLS`	Method/function calls between classes	`userService.findUser()`
`USES`	Type usage (field, parameter, return type)	`def process(user: User)`
`INSTANTIATES`	Object creation	`user = User()`
`IMPORTS`	Import/require statements	`import UserService from './user'`

CLI Usage¶

# Generate class dependency graph (default)
repo-ctx graph ./src --type class

# Generate function call graph
repo-ctx graph ./src --type function

# File-level dependencies
repo-ctx graph ./src --type file

# Module/package dependencies
repo-ctx graph ./src --type module

# Output formats
repo-ctx graph ./src --format json      # JSON graph data
repo-ctx graph ./src --format dot       # GraphViz DOT format
repo-ctx graph ./src --format graphml   # GraphML for yEd/Gephi
repo-ctx graph ./src --format mermaid   # Mermaid diagram syntax

Example: Class Dependency Graph¶

Consider this Python code:

# models.py
class Entity:
    pass

class User(Entity):
    def __init__(self, name: str):
        self.name = name

# services.py
from models import User

class UserRepository:
    def find(self, id: int) -> User:
        return User("John")

class UserService:
    def __init__(self, repo: UserRepository):
        self.repo = repo

    def get_user(self, id: int) -> User:
        return self.repo.find(id)

# controller.py
from services import UserService

class UserController:
    def __init__(self):
        self.service = UserService(UserRepository())

    def handle_request(self, user_id: int):
        user = self.service.get_user(user_id)
        return user.name

Generated Class Dependency Graph:

flowchart LR
    Entity[Entity]
    User[User]
    UserRepository[UserRepository]
    UserService[UserService]
    UserController[UserController]

    User -->|inherits| Entity
    UserRepository -->|calls| User
    UserService -->|calls| UserRepository
    UserService -->|calls| User
    UserController -->|calls| UserService
    UserController -->|calls| UserRepository

Running the command:

repo-ctx graph ./src --type class --format mermaid

Output:

flowchart LR
    N0[Entity]
    N1[User]
    N2[UserRepository]
    N3[UserService]
    N4[UserController]
    N1 -->|inherits| N0
    N2 -->|calls| N1
    N3 -->|calls| N2
    N3 -->|calls| N1
    N4 -->|calls| N3
    N4 -->|calls| N2

Visualizing with GraphViz¶

# Generate DOT file
repo-ctx graph ./src --type class --format dot > class_deps.dot

# Render to PNG
dot -Tpng class_deps.dot -o class_deps.png

# Render to SVG (better for large graphs)
dot -Tsvg class_deps.dot -o class_deps.svg

JSON Output Structure¶

{
  "graph_type": "class",
  "nodes": [
    {
      "id": "src/models.py:User",
      "name": "User",
      "type": "class",
      "file_path": "src/models.py",
      "labels": ["Symbol", "Class"]
    }
  ],
  "edges": [
    {
      "source": "src/models.py:User",
      "target": "src/models.py:Entity",
      "relation": "inherits",
      "metadata": {}
    },
    {
      "source": "src/services.py:UserService",
      "target": "src/services.py:UserRepository",
      "relation": "calls",
      "metadata": {"from_method": "UserService.get_user"}
    }
  ],
  "stats": {
    "node_count": 5,
    "edge_count": 6
  }
}

MCP Tool Usage¶

// Class dependency graph
await mcp.call("ctx-graph", {
  path: "./src",
  graphType: "class",
  outputFormat: "json"
});

// Function call graph for specific file
await mcp.call("ctx-graph", {
  path: "./src/services.py",
  graphType: "function",
  depth: 3  // Limit traversal depth
});

// For indexed repositories
await mcp.call("ctx-graph", {
  repoId: "/owner/project",
  graphType: "module"
});

Integration with Architecture Analysis¶

Dependency graphs are the foundation for other architecture tools:

Dependency Graph
       │
       ├──► DSM (matrix visualization)
       ├──► Cycle Detection (find tangles)
       ├──► Layer Detection (discover structure)
       ├──► XS Metrics (quantify complexity)
       └──► Architecture Rules (enforce constraints)

Export to .repo-ctx Directory¶

The dump command exports dependency graphs as part of the architecture analysis:

# Full dump includes dependency graphs
repo-ctx dump ./my-project --level full

# Created files:
# .repo-ctx/architecture/
#   ├── class_dependencies.mmd      # Mermaid class graph
#   ├── function_dependencies.mmd   # Mermaid function graph
#   ├── file_dependencies.mmd       # Mermaid file graph
#   └── architecture.md             # Summary with embedded diagrams

Persist to Neo4j Graph Database¶

For advanced querying and visualization, persist the graph to Neo4j:

# Dump with graph persistence
repo-ctx dump ./my-project --persist-graph

# Configure Neo4j connection
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USERNAME=neo4j
export NEO4J_PASSWORD=your-password

Cypher Queries After Persistence:

-- Find all classes that call UserService
MATCH (caller:Class)-[:CALLS]->(target:Class {name: 'UserService'})
RETURN caller.name, caller.file_path

-- Find inheritance hierarchy
MATCH path = (child:Class)-[:INHERITS*]->(parent:Class)
WHERE child.name = 'AdminUser'
RETURN path

-- Impact analysis: what depends on User class
MATCH (dependent)-[:CALLS|USES|INSTANTIATES]->(target:Class {name: 'User'})
RETURN dependent.name, type(r) as relationship

-- Find cycles in class dependencies
MATCH path = (a:Class)-[:CALLS*2..]->(a)
RETURN path
LIMIT 10

Implementation Reference¶

Core Classes¶

Class	File	Purpose
`DSMBuilder`	`architecture.py`	Build DSM matrix
`DSMResult`	`architecture.py`	DSM data + visualization
`CycleDetector`	`architecture.py`	Tarjan's SCC algorithm
`CycleInfo`	`architecture.py`	Cycle data + breakup suggestions
`LayerDetector`	`architecture_rules.py`	Topological layer detection
`LayerInfo`	`architecture_rules.py`	Layer data
`ArchitectureRules`	`architecture_rules.py`	Rule definition + checking
`RuleParser`	`architecture_rules.py`	YAML rule parsing
`XSCalculator`	`structural_metrics.py`	XS score calculation
`XSMetrics`	`structural_metrics.py`	Metrics data
`HotspotDetector`	`structural_metrics.py`	Complexity hotspot detection

Data Flow¶

# Typical analysis flow
from repo_ctx.analysis import (
    CodeAnalyzer, DependencyGraph, GraphType,
    DSMBuilder, CycleDetector, LayerDetector,
    RuleParser, XSCalculator, HotspotDetector
)

# 1. Analyze code
analyzer = CodeAnalyzer()
results = analyzer.analyze_files(files)
symbols = analyzer.aggregate_symbols(results)
dependencies = analyzer.aggregate_dependencies(results)

# 2. Build dependency graph
graph_builder = DependencyGraph()
graph = graph_builder.build(
    symbols=symbols,
    dependencies=dependencies,
    graph_type=GraphType.CLASS
)

# 3. Generate DSM
dsm = DSMBuilder().build(graph)

# 4. Detect cycles
cycles = CycleDetector().detect(graph)

# 5. Detect layers
layers = LayerDetector().detect(graph)

# 6. Check rules (optional)
rules = RuleParser().parse_file("architecture.yaml")
violations = rules.check(graph)

# 7. Calculate metrics
metrics = XSCalculator().calculate_from_input(
    XSInput(graph=graph, violations=violations)
)

# 8. Find hotspots
hotspots = HotspotDetector().detect(graph)

CLI and MCP Tools¶

CLI Commands¶

Command	Description
`repo-ctx graph <target>`	Generate dependency graph
`repo-ctx dsm <target>`	Generate DSM matrix
`repo-ctx cycles <target>`	Detect cycles
`repo-ctx layers <target>`	Detect layers
`repo-ctx architecture <target>`	Check architecture rules
`repo-ctx metrics <target>`	Calculate XS metrics
`repo-ctx dump <target>`	Export analysis to .repo-ctx directory

Common Options¶

--type, -t {file,module,class,function}  # Graph type (default: class)
--format, -f {text,json}                  # Output format (default: text)
--rules, -r <file>                        # Architecture rules YAML file

MCP Tools¶

Tool	Description
`ctx-graph`	Generate dependency graph (class, function, file, module)
`ctx-dsm`	Generate DSM matrix
`ctx-cycles`	Detect cycles with breakup suggestions
`ctx-layers`	Detect architectural layers
`ctx-architecture`	Check architecture rules
`ctx-metrics`	Calculate XS metrics

MCP Tool Examples¶

// DSM Analysis
await mcp.call("ctx-dsm", {
  path: "./src",
  graphType: "class",
  outputFormat: "json"
});

// Cycle Detection
await mcp.call("ctx-cycles", {
  path: "./src",
  graphType: "class"
});

// Layer Detection
await mcp.call("ctx-layers", {
  repoId: "/owner/repo",
  graphType: "module"
});

// Architecture Rules
await mcp.call("ctx-architecture", {
  path: "./src",
  rulesYaml: `
layers:
  - name: ui
    above: domain
  - name: domain
    above: data
forbidden:
  - from: "*.data.*"
    to: "*.ui.*"
`
});

// XS Metrics
await mcp.call("ctx-metrics", {
  path: "./src",
  rulesFile: "architecture.yaml",
  outputFormat: "json"
});

LLM Integration for Software Modernization¶

This section explains how to use repo-ctx's MCP tools with Large Language Models for software modernization, refactoring, and code analysis tasks.

Why MCP + LLM?¶

Challenge	How MCP Tools Help
LLMs lack codebase context	MCP provides structured code analysis on-demand
Token limits for large codebases	Filtered, focused results stay within context
Hallucinated dependencies	Real dependency graphs from actual code
Architecture decisions without data	Metrics and cycles provide quantified evidence

Core Integration Patterns¶

Pattern 1: Architecture Assessment¶

Use this pattern when evaluating a codebase for modernization.

User: "Assess this codebase's architecture health"

LLM Workflow:
1. ctx-metrics → Get XS score and hotspots
2. ctx-cycles  → Identify circular dependencies
3. ctx-layers  → Understand current structure
4. ctx-dsm     → Visualize coupling patterns
5. Synthesize findings into assessment report

MCP Tool Sequence:

// Step 1: Get overall health metrics
const metrics = await mcp.call("ctx-metrics", {
  path: "./src",
  outputFormat: "json"
});

// Step 2: Find cycles that need breaking
const cycles = await mcp.call("ctx-cycles", {
  path: "./src",
  graphType: "module"
});

// Step 3: Understand layer structure
const layers = await mcp.call("ctx-layers", {
  path: "./src",
  graphType: "class"
});

// LLM can now provide data-driven assessment

Example LLM Response:

## Architecture Assessment

**Overall Health: Grade C (XS Score: 47.5)**

### Key Findings:

1. **Cyclic Dependencies (2 cycles)**
   - `services ↔ repositories` - Blocks independent testing
   - `models ↔ validators` - Causes build order issues

2. **Detected Layers (4)**
   - API → Services → Repositories → Database
   - ⚠️ 3 violations of layer ordering detected

3. **Hotspots**
   - `ServiceManager` (cycle participant, severity: 6.0)
   - `DataAccess` (high coupling: 12 connections)

### Recommended Actions:
1. Break `services → repositories` cycle by introducing interfaces
2. Refactor `DataAccess` to reduce coupling
3. Move validation logic to dedicated layer

Pattern 2: Refactoring Planning¶

Use this pattern when planning a specific refactoring.

User: "I want to extract the user module into a microservice"

LLM Workflow:
1. ctx-cycles     → Check if user module is in a cycle
2. ctx-dsm        → See what depends on user module
3. ctx-analyze    → Get user module's public interface
4. ctx-graph      → Visualize dependencies to cut
5. Create extraction plan with specific steps

MCP Tool Sequence:

// Step 1: Check for cycles involving user module
const cycles = await mcp.call("ctx-cycles", {
  path: "./src",
  graphType: "module"
});
// Look for cycles containing "user"

// Step 2: Get DSM to see all dependencies
const dsm = await mcp.call("ctx-dsm", {
  path: "./src",
  graphType: "module",
  outputFormat: "json"
});

// Step 3: Analyze user module's public interface
const analysis = await mcp.call("ctx-analyze", {
  path: "./src/user",
  language: "python",
  outputFormat: "json"
});

// Step 4: Get detailed dependency graph
const graph = await mcp.call("ctx-graph", {
  path: "./src",
  graphType: "class",
  outputFormat: "json"
});

Example LLM Response:

## Microservice Extraction Plan: User Module

### Current State Analysis

**Dependencies TO user module (must be updated):**
- `auth.login()` → `user.authenticate()`
- `billing.get_customer()` → `user.find_by_id()`
- `notifications.send()` → `user.get_email()`

**Dependencies FROM user module (must break or replicate):**
- `user.save()` → `database.execute()`
- `user.hash_password()` → `crypto.hash()`

⚠️ **Cycle Detected:** `user ↔ auth`
Must break before extraction!

### Extraction Steps

1. **Break user ↔ auth cycle**
   - Move `auth.validate_token()` to user module
   - Or: Create shared `TokenService` interface

2. **Define API boundary**
   - Public endpoints: `find_by_id`, `authenticate`, `create`, `update`
   - Internal: `hash_password`, `validate_email`

3. **Create anti-corruption layer**
   - Replace direct calls with REST/gRPC clients
   - Update 3 calling modules

4. **Database separation**
   - Extract users table to separate schema
   - Create data migration scripts

Pattern 3: Code Quality Enforcement¶

Use this pattern in CI/CD or code review workflows.

User: "Check if this PR violates our architecture rules"

LLM Workflow:
1. ctx-architecture → Check rules against current code
2. Report violations with specific file locations
3. Suggest fixes for each violation

MCP Tool Sequence:

// Check architecture rules
const violations = await mcp.call("ctx-architecture", {
  path: "./src",
  rulesYaml: `
    name: "Clean Architecture"
    layers:
      - name: api
        patterns: ["*.api.*", "*.controller.*"]
        above: service
      - name: service
        patterns: ["*.service.*"]
        above: repository
      - name: repository
        patterns: ["*.repository.*", "*.dao.*"]
    forbidden:
      - from: "*.controller.*"
        to: "*.repository.*"
        reason: "Controllers must use services"
  `,
  outputFormat: "json"
});

Example LLM Response:

## Architecture Violation Report

❌ **2 violations detected**

### Violation 1: Layer Order
- **Rule:** Controllers must use services
- **Location:** `src/api/user_controller.py:45`
- **Issue:** `UserController` directly calls `UserRepository.find_by_id()`
- **Fix:** Inject `UserService` and call `UserService.get_user()`

### Violation 2: Forbidden Dependency
- **Rule:** Repository cannot depend on API
- **Location:** `src/data/order_repository.py:23`
- **Issue:** Imports `from api.serializers import OrderDTO`
- **Fix:** Move `OrderDTO` to `models/` or create `repository/dto.py`

Pattern 4: Legacy Code Understanding¶

Use this pattern when onboarding to an unfamiliar codebase.

User: "Help me understand this legacy codebase structure"

LLM Workflow:
1. ctx-layers    → Discover implicit architecture
2. ctx-docs      → Get any existing documentation
3. ctx-analyze   → Extract key classes and functions
4. ctx-dsm       → Visualize relationships
5. Create architectural overview document

MCP Tool Sequence:

// Step 1: Discover layers
const layers = await mcp.call("ctx-layers", {
  path: "./src",
  graphType: "module"
});

// Step 2: Get existing docs
const docs = await mcp.call("ctx-docs", {
  repository: "/owner/legacy-app",
  max_tokens: 8000,
  include: ["code", "diagrams"]
});

// Step 3: Analyze main components
const analysis = await mcp.call("ctx-analyze", {
  path: "./src",
  outputFormat: "json"
});

// Step 4: Get dependency overview
const dsm = await mcp.call("ctx-dsm", {
  path: "./src",
  graphType: "module"
});

Modernization Workflows¶

Workflow 1: Monolith to Microservices¶

graph TD
    A[Analyze Current State] --> B[Identify Bounded Contexts]
    B --> C[Detect Cycles to Break]
    C --> D[Plan Extraction Order]
    D --> E[Execute & Validate]

    A --> |ctx-metrics| A1[XS Score]
    A --> |ctx-dsm| A2[Dependency Matrix]

    B --> |ctx-layers| B1[Layer Analysis]
    B --> |ctx-analyze| B2[Module Boundaries]

    C --> |ctx-cycles| C1[Cycle Detection]

    D --> |LLM Synthesis| D1[Extraction Plan]

    E --> |ctx-architecture| E1[Validate Rules]
    E --> |ctx-metrics| E2[Track Improvement]

Prompt Template:

I'm modernizing a monolithic application. Use these MCP tools to analyze:

1. Run ctx-metrics on ./src to get the overall health
2. Run ctx-dsm on ./src to see module coupling
3. Run ctx-cycles to find circular dependencies
4. Run ctx-layers to understand the current structure

Then create a microservice extraction plan that:
- Lists modules in order of extraction (least coupled first)
- Identifies cycles that must be broken before extraction
- Estimates complexity based on coupling scores
- Suggests API boundaries based on current interfaces

Workflow 2: Framework Migration¶

graph TD
    A[Inventory Current Usage] --> B[Identify Patterns]
    B --> C[Plan Replacement Strategy]
    C --> D[Track Migration Progress]

    A --> |ctx-analyze| A1[Symbol Inventory]
    A --> |ctx-find-symbol| A2[Usage Search]

    B --> |ctx-graph| B1[Dependency Patterns]

    C --> |LLM Plan| C1[Migration Steps]

    D --> |ctx-metrics| D1[Complexity Reduction]

Prompt Template:

I need to migrate from Framework X to Framework Y. Use these MCP tools:

1. Run ctx-find-symbol to find all uses of "FrameworkX"
2. Run ctx-analyze to inventory current framework patterns
3. Run ctx-graph to see how framework usage is distributed

Then create a migration plan that:
- Lists all files/classes using Framework X
- Groups them by migration complexity (simple, moderate, complex)
- Identifies shared utilities that can be migrated once
- Suggests migration order (least dependent first)

Workflow 3: Technical Debt Reduction¶

graph TD
    A[Measure Current Debt] --> B[Prioritize Hotspots]
    B --> C[Plan Refactoring Sprint]
    C --> D[Execute & Remeasure]

    A --> |ctx-metrics| A1[XS Score Baseline]

    B --> |Hotspot Detection| B1[Priority List]
    B --> |ctx-cycles| B2[Cycle Impact]

    C --> |LLM Plan| C1[Sprint Tasks]

    D --> |ctx-metrics| D1[Score Improvement]

Prompt Template:

I want to reduce technical debt in this codebase. Use these MCP tools:

1. Run ctx-metrics to get current XS score and hotspots
2. Run ctx-cycles to find all cyclic dependencies
3. Run ctx-architecture with our rules to find violations

Then create a debt reduction plan that:
- Ranks hotspots by severity and fix effort
- Calculates expected XS score improvement per fix
- Creates sprint-sized work packages
- Defines acceptance criteria (target XS scores)

Best Practices for LLM Integration¶

1. Use JSON Output for Structured Analysis¶

// Always request JSON for programmatic processing
await mcp.call("ctx-metrics", {
  path: "./src",
  outputFormat: "json"  // Not "text"
});

2. Limit Token Usage for Large Codebases¶

// For docs, always set max_tokens
await mcp.call("ctx-docs", {
  repository: "/owner/repo",
  max_tokens: 8000,
  include: ["code"]  // Only what you need
});

3. Chain Tools Logically¶

Good: metrics → cycles → architecture (progressive detail)
Bad:  dsm → docs → analyze (unrelated sequence)

4. Cache Results for Multi-Turn Conversations¶

LLMs should store analysis results in context rather than re-running tools:

Turn 1: User asks for assessment
  → Run all analysis tools, store in context

Turn 2: User asks follow-up about cycles
  → Use cached cycle data, don't re-run

Turn 3: User asks about specific module
  → Run targeted analyze on that module only

5. Combine with CPGQL for Deep Analysis¶

// High-level architecture
const cycles = await mcp.call("ctx-cycles", {...});

// Deep data flow analysis for specific cycle
const dataflow = await mcp.call("ctx-cpg-query", {
  path: "./src/problematic_module",
  query: "cpg.method.name('save').reachableBy(cpg.method.name('validate')).l"
});

Example Prompts for Common Scenarios¶

Architecture Health Check¶

Analyze this codebase's architecture:
1. Use ctx-metrics to get the XS score
2. Use ctx-cycles to find circular dependencies
3. Use ctx-layers to understand the structure
4. Use ctx-architecture with standard Clean Architecture rules

Provide a report with:
- Overall health grade and score
- Top 3 issues to address
- Specific refactoring suggestions with file locations

Pre-PR Review¶

Before I submit this PR, check for architecture violations:
1. Run ctx-architecture with our team's rules
2. Run ctx-cycles to ensure no new cycles
3. Run ctx-metrics and compare to baseline

Flag any issues that would fail our architecture checks.

Dependency Analysis¶

I need to understand what will break if I change module X:
1. Use ctx-dsm to see what depends on X (check column for X)
2. Use ctx-graph with graphType=class for detailed view
3. Use ctx-find-symbol to find all uses of X's public APIs

List all affected files and the specific functions/classes that need updates.

Modernization Roadmap¶

Create a modernization roadmap for this legacy codebase:
1. Use ctx-metrics to assess current state
2. Use ctx-layers to understand architecture
3. Use ctx-cycles to identify blocking issues
4. Use ctx-analyze to inventory key components

Create a phased plan with:
- Phase 1: Critical cycle breaking
- Phase 2: Layer enforcement
- Phase 3: Module extraction
- Phase 4: Framework updates

Include effort estimates based on coupling and complexity.

Best Practices¶

1. Start with DSM¶

Use DSM to get a quick overview of coupling patterns:

repo-ctx dsm ./src --type module -f text

2. Address Cycles First¶

Cycles are often the root cause of other problems:

repo-ctx cycles ./src

3. Define Architecture Rules Early¶

Create architecture.yaml to enforce boundaries:

layers:
  - name: api
    patterns: ["*.api.*", "*.controller.*"]
    above: service
  - name: service
    patterns: ["*.service.*"]
    above: repository
  - name: repository
    patterns: ["*.repository.*", "*.dao.*"]

forbidden:
  - from: "*.repository.*"
    to: "*.api.*"
    reason: "Repository layer cannot depend on API layer"

4. Track Metrics Over Time¶

Regularly check XS score to catch degradation:

repo-ctx metrics ./src -r architecture.yaml -f json >> metrics_history.jsonl

5. Focus on Hotspots¶

Address highest-severity hotspots first:

repo-ctx metrics ./src | grep "Hotspots" -A 10

References¶

DSM Overview - DSM theory
Tarjan's Algorithm - SCC detection
Clean Architecture - Layered architecture