Architecture Analysis Guide¶
This guide provides in-depth documentation of repo-ctx's architecture analysis capabilities, including algorithms, data structures, and practical examples.
Table of Contents¶
- Overview
- Use Cases
- Dependency Structure Matrix (DSM)
- Cycle Detection
- Layer Detection
- Architecture Rules
- Structural Metrics (XS)
- Hotspot Detection
- Dependency Graphs
- Implementation Reference
- CLI and MCP Tools
- LLM Integration for Software Modernization
- Best Practices
Overview¶
repo-ctx provides comprehensive architecture analysis capabilities for understanding, visualizing, and enforcing code structure:
Dependency Graph
│
┌─────────────────┼─────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌─────────┐
│ DSM │ │ Cycles │ │ Layers │
│ Matrix │ │ Tangles │ │ Detect │
└─────────┘ └──────────┘ └─────────┘
│ │ │
└─────────────────┼─────────────────┘
│
▼
┌─────────────────┐
│ Architecture │
│ Rules │
└─────────────────┘
│
▼
┌─────────────────┐
│ XS Metrics │
│ & Hotspots │
└─────────────────┘
Analysis Types¶
| Analysis | Purpose | Use Case |
|---|---|---|
| DSM | Visualize dependencies as matrix | Identify coupling patterns |
| Cycles | Detect cyclic dependencies | Find architectural tangles |
| Layers | Discover natural layering | Understand code structure |
| Rules | Enforce architecture | Prevent violations |
| XS Metrics | Quantify complexity | Track technical debt |
| Hotspots | Find problematic nodes | Prioritize refactoring |
Use Cases¶
This section explains when to use each analysis tool and what problems they help solve.
When to Use DSM¶
Best for: Understanding coupling patterns, visualizing dependencies, identifying clusters of tightly-coupled code.
| Scenario | DSM Helps By |
|---|---|
| "How tangled is this codebase?" | Showing dependency density as a matrix - dense matrices = high coupling |
| "Which modules are too interconnected?" | Highlighting off-diagonal marks that indicate cross-module dependencies |
| "Is this a layered architecture?" | A triangular matrix indicates clean layers; scattered marks reveal violations |
| "What will break if I change this?" | Column shows what depends on a component; row shows what it depends on |
| Code review / onboarding | Quick visual overview of codebase structure |
Example Use Case: Before a major refactoring, generate a DSM to understand which components are tightly coupled. Target the densest areas first to reduce blast radius of changes.
When to Use Cycle Detection¶
Best for: Finding circular dependencies that prevent modularization, block independent testing, or cause maintenance nightmares.
| Scenario | Cycle Detection Helps By |
|---|---|
| "Why can't I test this module in isolation?" | Cycles mean you can't test A without B and vice versa |
| "Why does changing X break seemingly unrelated Y?" | Cycles create hidden coupling paths |
| "How do I split this monolith?" | Cycles must be broken before modules can be extracted |
| "Why is compilation so slow?" | Cycles prevent incremental builds |
| CI/CD pipeline failures | Enforce no-new-cycles policy in CI |
Example Use Case: Your microservice extraction project is stuck because services and repositories have cycles. Use cycle detection to find the specific edges to break.
When to Use Layer Detection¶
Best for: Understanding implicit architecture, discovering natural boundaries, planning reorganization.
| Scenario | Layer Detection Helps By |
|---|---|
| "What's the structure of this inherited codebase?" | Auto-discovers layers without needing documentation |
| "Are we following our stated architecture?" | Compares detected layers against documented design |
| "How should I organize new code?" | Shows where similar code naturally belongs |
| "Which components are truly foundational?" | Level-0 layers are depended on by everything |
| "What's the dependency direction?" | Higher layers depend on lower, never reverse |
Example Use Case: You joined a project with no architecture docs. Use layer detection to reverse-engineer the actual structure and understand the dependency hierarchy.
When to Use Architecture Rules¶
Best for: Enforcing architectural constraints, preventing drift, defining team standards.
| Scenario | Architecture Rules Help By |
|---|---|
| "Enforce Clean Architecture" | Define layers and block upward dependencies |
| "Prevent feature coupling" | Forbid dependencies between feature modules |
| "Legacy code migration" | Gradually enforce new structure while allowing old |
| "Team alignment" | Document and automatically enforce architecture decisions |
| "PR reviews" | Catch violations before they merge |
Example Use Case: Your team agreed on Clean Architecture but violations keep appearing. Define rules in YAML and run in CI to catch violations early.
When to Use XS Metrics¶
Best for: Quantifying technical debt, tracking improvement over time, prioritizing refactoring efforts.
| Scenario | XS Metrics Help By |
|---|---|
| "How bad is our technical debt?" | Single score + grade for quick assessment |
| "Is code quality improving or degrading?" | Track XS score over commits/sprints |
| "Where should we focus refactoring?" | Breakdown shows cycles vs coupling vs violations |
| "Justify refactoring to management" | Quantifiable metrics instead of "feels bad" |
| "Compare modules" | Run on different directories to compare health |
Example Use Case: You're planning a refactoring sprint. Use XS metrics to identify which module has the worst score, then use the breakdown to understand why.
When to Use Hotspot Detection¶
Best for: Finding the most problematic components, prioritizing tactical fixes.
| Scenario | Hotspot Detection Helps By |
|---|---|
| "What's the riskiest part of the codebase?" | High-severity hotspots = highest risk |
| "Which class should I refactor first?" | Hotspots sorted by severity |
| "What's causing cascading failures?" | Cycle participants often trigger ripple effects |
| "God class detection" | High coupling hotspots are often too-large classes |
Example Use Case: Your bug count is concentrated in certain areas. Use hotspot detection to see if those areas correlate with architectural problems.
Decision Matrix¶
| I want to... | Use |
|---|---|
| Get a visual overview of dependencies | DSM |
| Find and break circular dependencies | Cycles |
| Understand the layered structure | Layers |
| Enforce architectural rules in CI | Architecture |
| Quantify and track technical debt | Metrics |
| Find worst offenders to fix first | Hotspots |
Combined Workflow¶
For comprehensive architecture analysis, use tools in this order:
- DSM - Get the big picture
- Layers - Understand the natural structure
- Cycles - Identify critical problems
- Architecture - Define and enforce rules
- Metrics - Quantify and track progress
Dependency Structure Matrix (DSM)¶
Concept¶
A Dependency Structure Matrix (DSM) is a square matrix representation of dependencies where:
- Rows and columns represent code elements (classes, modules, files)
- Cell (i, j) indicates element i depends on element j
- A triangular (lower or upper) matrix indicates no cycles - clean layered architecture
- Non-triangular patterns reveal cyclic dependencies
Example¶
Consider this code structure:
DSM Representation:
Controller Service Repository Database Utility
Controller . 1 0 0 1
Service 0 . 1 0 0
Repository 0 0 . 1 0
Database 0 0 0 . 0
Utility 0 0 0 0 .
Reading: "Controller depends on Service (1) and Utility (1)"
With Cycles¶
If Repository also imports Controller (cycle):
Controller Service Repository Database Utility
Controller . 1 0 0 1
Service 0 . 1 0 0
Repository 1 0 . 1 0 ← Cycle!
Database 0 0 0 . 0
Utility 0 0 0 0 .
The 1 at (Repository, Controller) breaks the triangular pattern, indicating a cycle.
Algorithm¶
def build_dsm(graph):
"""Build DSM from dependency graph.
1. Collect all nodes
2. Sort nodes (optionally by layer/partitioning)
3. Build NxN matrix
4. For each edge (a → b): matrix[index(a)][index(b)] = 1
5. Detect cycles via non-triangular cells
"""
nodes = sorted(graph.nodes.keys())
n = len(nodes)
matrix = [[0] * n for _ in range(n)]
node_index = {node: i for i, node in enumerate(nodes)}
for edge in graph.edges:
i = node_index[edge.source]
j = node_index[edge.target]
matrix[i][j] += 1 # Count dependencies
return DSMResult(matrix, nodes)
Complexity: O(N + E) where N = nodes, E = edges
CLI Usage¶
# Generate DSM for local code
repo-ctx dsm ./src --type class
# Output as JSON
repo-ctx dsm ./src -f json
# Different graph types
repo-ctx dsm ./src --type file
repo-ctx dsm ./src --type module
Output Example¶
DSM: ./src (class graph)
Size: 5x5 | Cycles: 1
Ctrl Svc Repo DB Util
Ctrl . 1 0 0 1
Svc 0 . 1 0 0
Repo 1 0 . 1 0 ← Cycle indicator
DB 0 0 0 . 0
Util 0 0 0 0 .
Cycles detected in: Controller ↔ Repository
Cycle Detection¶
Concept¶
Cyclic dependencies (tangles) are a major source of architectural problems: - Make code harder to understand - Prevent independent testing - Create ripple effects during changes - Block modularization efforts
Tarjan's Algorithm¶
We use Tarjan's Strongly Connected Components (SCC) algorithm to detect cycles:
def tarjan_scc(graph):
"""Find all strongly connected components.
A SCC with more than one node indicates a cycle.
Time complexity: O(V + E)
"""
index_counter = [0]
stack = []
lowlinks = {}
index = {}
on_stack = {}
sccs = []
def strongconnect(node):
index[node] = index_counter[0]
lowlinks[node] = index_counter[0]
index_counter[0] += 1
stack.append(node)
on_stack[node] = True
for neighbor in graph.neighbors(node):
if neighbor not in index:
strongconnect(neighbor)
lowlinks[node] = min(lowlinks[node], lowlinks[neighbor])
elif on_stack.get(neighbor, False):
lowlinks[node] = min(lowlinks[node], index[neighbor])
# If node is root of SCC
if lowlinks[node] == index[node]:
scc = []
while True:
w = stack.pop()
on_stack[w] = False
scc.append(w)
if w == node:
break
if len(scc) > 1: # Cycle exists
sccs.append(scc)
for node in graph.nodes:
if node not in index:
strongconnect(node)
return sccs
Breakup Suggestions¶
For each cycle, we calculate which edge removal would have minimal impact:
def suggest_breakup(cycle, graph):
"""Suggest edges to remove to break the cycle.
Strategy: Find edge with lowest "impact score":
- Impact = importance_of_source × importance_of_target
- Importance = in_degree + out_degree (connectivity)
Removing edges between less central nodes has less impact.
"""
suggestions = []
for edge in cycle.edges:
source_importance = graph.degree(edge.source)
target_importance = graph.degree(edge.target)
impact = source_importance + target_importance
suggestions.append(BreakupSuggestion(
edge_to_remove=edge,
impact_score=impact,
reason=f"Remove {edge.source} → {edge.target}"
))
return sorted(suggestions, key=lambda s: s.impact_score)
CLI Usage¶
# Detect cycles
repo-ctx cycles ./src --type class
# JSON output with breakup suggestions
repo-ctx cycles ./src -f json
Output Example¶
Cycle Detection: ./src
Found 2 cycles:
Cycle 1: (3 nodes, impact: 8.5)
Nodes: Controller → Service → Repository → Controller
Edges: 3
Breakup suggestions:
1. Remove Repository → Controller (lowest impact)
2. Remove Service → Repository
Cycle 2: (2 nodes, impact: 4.0)
Nodes: ModelA ↔ ModelB
Edges: 2
Breakup suggestions:
1. Remove ModelB → ModelA
Layer Detection¶
Concept¶
Automatically discover the natural layering of code based on dependency patterns: - Bottom layers (level 0): Nodes with no outgoing dependencies (providers) - Top layers (higher levels): Nodes that depend on lower layers (consumers) - Cycles are collapsed into single "super-nodes" before analysis
Algorithm¶
def detect_layers(graph):
"""Detect layers using topological analysis.
1. Detect cycles and collapse into super-nodes
2. Calculate level for each super-node:
level(node) = max(level(dependencies)) + 1
level(leaf) = 0
3. Group nodes by level
"""
# Step 1: Collapse cycles
cycles = tarjan_scc(graph)
super_nodes = collapse_cycles(graph, cycles)
# Step 2: Calculate levels (reverse BFS)
levels = {}
def get_level(node, visited):
if node in levels:
return levels[node]
if node in visited:
return 0 # Cycle in super-graph (shouldn't happen)
visited.add(node)
max_dep_level = -1
for dep in super_nodes.dependencies(node):
dep_level = get_level(dep, visited)
max_dep_level = max(max_dep_level, dep_level)
levels[node] = max_dep_level + 1
return levels[node]
for node in super_nodes:
get_level(node, set())
# Step 3: Group by level
layers = defaultdict(list)
for node, level in levels.items():
layers[level].extend(super_nodes.original_nodes(node))
return [LayerInfo(f"Layer {l}", l, nodes)
for l, nodes in sorted(layers.items())]
Example¶
A → B → C → D
↓
E → F
Detected Layers:
- Layer 0 (bottom): D, F (no outgoing deps)
- Layer 1: C, E
- Layer 2: B
- Layer 3 (top): A
With a cycle B ↔ C, they collapse to same layer:
CLI Usage¶
Output Example¶
Detected 4 layer(s) in ./src
Graph type: class
Total nodes: 42
Level 3: Layer 3
Nodes (5): AppController, MainController, ApiController ...
Level 2: Layer 2
Nodes (12): UserService, AuthService, DataService ...
Level 1: Layer 1
Nodes (15): UserRepository, ConfigRepository ...
Level 0: Layer 0
Nodes (10): DatabaseConnection, Logger, Constants ...
Architecture Rules¶
Concept¶
Define and enforce architectural constraints using a YAML-based DSL: - Layer rules: Define which layers can depend on which - Forbidden rules: Block specific dependency patterns - Allowed rules: Exceptions to forbidden rules
Rule Types¶
1. Layer Ordering Rules¶
layers:
- name: presentation
patterns: ["*.controller.*", "*.view.*"]
above: business
- name: business
patterns: ["*.service.*", "*.usecase.*"]
above: data
- name: data
patterns: ["*.repository.*", "*.dao.*"]
Meaning: presentation can depend on business, but business cannot depend on presentation.
2. Forbidden Dependency Rules¶
forbidden:
- from: "*.controller.*"
to: "*.repository.*"
reason: "Controllers must not access repositories directly"
- from: "*.data.*"
to: "*.ui.*"
reason: "Data layer must not depend on UI"
3. Allowed Rules (Exceptions)¶
Pattern Matching¶
Patterns support:
- Exact match: "UserService" matches node UserService
- Wildcards: "*.service.*" matches com.app.service.UserService
- Prefix match: "ui" matches ui.View, ui.Controller
Violation Detection Algorithm¶
def check_rules(graph, rules):
"""Check all rules against dependency graph.
For each edge (source → target):
1. Check forbidden rules: if matches both patterns → violation
2. Check layer rules: if source in lower layer, target in upper → violation
3. Check allowed rules: if explicitly allowed → skip violation
"""
violations = []
for edge in graph.edges:
# Check forbidden rules
for rule in rules.forbidden_rules:
if matches(edge.source, rule.from_pattern) and \
matches(edge.target, rule.to_pattern):
if not is_explicitly_allowed(edge, rules):
violations.append(Violation(
rule_name="forbidden",
source=edge.source,
target=edge.target,
message=rule.reason
))
# Check layer rules
for rule in rules.layer_rules:
source_in_lower = matches(edge.source, rule.lower_layer)
target_in_upper = matches(edge.target, rule.upper_layer)
if source_in_lower and target_in_upper:
violations.append(Violation(
rule_name="layer_order",
source=edge.source,
target=edge.target,
message=f"{rule.lower_layer} cannot depend on {rule.upper_layer}"
))
return violations
Complete Example¶
Architecture Rules File (architecture.yaml):
name: "Clean Architecture"
description: "Layered architecture with strict boundaries"
layers:
- name: ui
patterns: ["*.ui.*", "*.view.*", "*.controller.*"]
above: domain
- name: domain
patterns: ["*.domain.*", "*.service.*", "*.usecase.*"]
above: data
- name: data
patterns: ["*.data.*", "*.repository.*", "*.dao.*"]
forbidden:
- from: "*.data.*"
to: "*.ui.*"
reason: "Data layer must not depend on UI"
- from: "*.controller.*"
to: "*.dao.*"
reason: "Controllers should use services, not DAOs"
allowed:
- from: "*.ui.*"
to: "*.domain.*"
reason: "UI can access domain services"
CLI Usage¶
# Check architecture rules
repo-ctx architecture ./src --rules architecture.yaml
# JSON output
repo-ctx architecture ./src -r rules.yaml -f json
Output Example¶
Architecture Analysis: ./src
Graph type: class
Total nodes: 42
Rules: architecture.yaml
Architecture: Clean Architecture
Layers (3):
Level 2: ui (8 nodes)
Level 1: domain (18 nodes)
Level 0: data (16 nodes)
Violations (2):
[ERROR] layer_order: data cannot depend on ui
data.UserRepository -> ui.UserView
at src/data/user_repository.py:45
[ERROR] forbidden: Controllers should use services, not DAOs
controller.UserController -> dao.UserDao
at src/controller/user_controller.py:23
Structural Metrics (XS)¶
Concept¶
XS (eXcess Structural complexity) quantifies architectural health as a single score: - Higher score = more complexity/problems - Score is broken down into contributing factors - Grade (A-F) provides quick assessment
XS Score Formula¶
XS = cycle_contribution + coupling_contribution + size_contribution + violation_contribution
Where:
- cycle_contribution = cycle_count × 15.0
- coupling_contribution = max(0, avg_coupling - 3.0) × node_count × 2.0
- size_contribution = max(0, node_count - 50) × 0.1
- violation_contribution = violation_count × 5.0
Component Explanations¶
| Component | Weight | Meaning |
|---|---|---|
| Cycles | 15/cycle | Each cycle adds significant complexity |
| Coupling | 2.0/excess | High interconnection makes changes risky |
| Size | 0.1/node | Very large modules harder to maintain |
| Violations | 5/violation | Architecture violations indicate problems |
Grade Thresholds¶
| Grade | XS Score | Description |
|---|---|---|
| A | 0-20 | Excellent - Clean architecture |
| B | 20-40 | Good - Well-structured |
| C | 40-60 | Moderate - Notable issues |
| D | 60-80 | Poor - Significant problems |
| F | 80+ | Critical - Major refactoring needed |
Algorithm¶
class XSCalculator:
CYCLE_WEIGHT = 15.0
COUPLING_WEIGHT = 2.0
SIZE_WEIGHT = 0.1
VIOLATION_WEIGHT = 5.0
COUPLING_THRESHOLD = 3.0
SIZE_THRESHOLD = 50
def calculate(self, graph, violations=None):
violations = violations or []
# Detect cycles
cycles = CycleDetector().detect(graph)
cycle_contribution = len(cycles) * self.CYCLE_WEIGHT
# Calculate coupling
avg_coupling = len(graph.edges) / len(graph.nodes) if graph.nodes else 0
excess_coupling = max(0, avg_coupling - self.COUPLING_THRESHOLD)
coupling_contribution = excess_coupling * len(graph.nodes) * self.COUPLING_WEIGHT
# Calculate size penalty
excess_size = max(0, len(graph.nodes) - self.SIZE_THRESHOLD)
size_contribution = excess_size * self.SIZE_WEIGHT
# Calculate violation contribution
violation_contribution = len(violations) * self.VIOLATION_WEIGHT
# Total score
xs_score = (cycle_contribution + coupling_contribution +
size_contribution + violation_contribution)
# Assign grade
grade = self.grade(xs_score)
return XSMetrics(
xs_score=xs_score,
grade=grade,
cycle_count=len(cycles),
# ... other fields
)
CLI Usage¶
# Calculate XS metrics
repo-ctx metrics ./src --type class
# With architecture rules (violations add to score)
repo-ctx metrics ./src --rules architecture.yaml
# JSON output
repo-ctx metrics ./src -f json
Output Example¶
Structural Metrics: ./src
Grade: C - Moderate - Notable structural issues that should be addressed
XS Score: 47.5
Nodes: 42 | Edges: 68
Cycles: 2 | Violations: 3
Score Breakdown:
Cycles: 30.0
Coupling: 7.5
Size: 0.0
Violations: 10.0
Hotspots (3):
ServiceManager (cycle_participant) - severity: 6.0
DataAccess (high_coupling) - severity: 5.5
Controller (cycle_participant) - severity: 5.0
Hotspot Detection¶
Concept¶
Hotspots are nodes that contribute disproportionately to complexity: - High coupling: Nodes with many incoming/outgoing dependencies - Cycle participants: Nodes involved in cyclic dependencies
Detection Algorithm¶
class HotspotDetector:
HIGH_COUPLING_THRESHOLD = 5 # Total connections
def detect(self, graph):
hotspots = []
# Calculate node degrees
in_degree = defaultdict(int)
out_degree = defaultdict(int)
for edge in graph.edges:
out_degree[edge.source] += 1
in_degree[edge.target] += 1
# Detect high coupling hotspots
for node_id, node in graph.nodes.items():
total = in_degree[node_id] + out_degree[node_id]
if total >= self.HIGH_COUPLING_THRESHOLD:
hotspots.append(Hotspot(
node_id=node_id,
reason="high_coupling",
severity=min(10.0, total / 2.0),
details={"connections": total}
))
# Detect cycle participants
cycles = CycleDetector().detect(graph)
cycle_counts = defaultdict(int)
for cycle in cycles:
for node_id in cycle.nodes:
cycle_counts[node_id] += 1
for node_id, count in cycle_counts.items():
hotspots.append(Hotspot(
node_id=node_id,
reason="cycle_participant",
severity=min(10.0, count * 2.0 + 3.0),
details={"cycle_count": count}
))
return sorted(hotspots, key=lambda h: h.severity, reverse=True)
Severity Scale¶
| Severity | Meaning | Action |
|---|---|---|
| 8-10 | Critical | Immediate refactoring |
| 5-7 | High | Plan refactoring |
| 3-4 | Moderate | Monitor |
| 1-2 | Low | Note for future |
Dependency Graphs¶
Concept¶
Dependency graphs visualize relationships between code elements (classes, functions, files, modules) as directed graphs. repo-ctx supports multiple graph types and automatically extracts various edge types from code analysis.
Graph Types¶
| Type | Description | Best For |
|---|---|---|
class |
Class-level dependencies including inheritance, calls, and usage | Understanding class relationships |
function |
Function/method call graph | Tracing execution flow |
file |
File-level import dependencies | Build order, modularization |
module |
Package/module dependencies | High-level architecture |
Edge Types (Relationship Types)¶
The class dependency graph extracts these relationship types:
| Edge Type | Description | Example |
|---|---|---|
INHERITS |
Class inheritance | class Dog extends Animal |
IMPLEMENTS |
Interface implementation | class Service implements IService |
CALLS |
Method/function calls between classes | userService.findUser() |
USES |
Type usage (field, parameter, return type) | def process(user: User) |
INSTANTIATES |
Object creation | user = User() |
IMPORTS |
Import/require statements | import UserService from './user' |
CLI Usage¶
# Generate class dependency graph (default)
repo-ctx graph ./src --type class
# Generate function call graph
repo-ctx graph ./src --type function
# File-level dependencies
repo-ctx graph ./src --type file
# Module/package dependencies
repo-ctx graph ./src --type module
# Output formats
repo-ctx graph ./src --format json # JSON graph data
repo-ctx graph ./src --format dot # GraphViz DOT format
repo-ctx graph ./src --format graphml # GraphML for yEd/Gephi
repo-ctx graph ./src --format mermaid # Mermaid diagram syntax
Example: Class Dependency Graph¶
Consider this Python code:
# models.py
class Entity:
pass
class User(Entity):
def __init__(self, name: str):
self.name = name
# services.py
from models import User
class UserRepository:
def find(self, id: int) -> User:
return User("John")
class UserService:
def __init__(self, repo: UserRepository):
self.repo = repo
def get_user(self, id: int) -> User:
return self.repo.find(id)
# controller.py
from services import UserService
class UserController:
def __init__(self):
self.service = UserService(UserRepository())
def handle_request(self, user_id: int):
user = self.service.get_user(user_id)
return user.name
Generated Class Dependency Graph:
flowchart LR
Entity[Entity]
User[User]
UserRepository[UserRepository]
UserService[UserService]
UserController[UserController]
User -->|inherits| Entity
UserRepository -->|calls| User
UserService -->|calls| UserRepository
UserService -->|calls| User
UserController -->|calls| UserService
UserController -->|calls| UserRepository
Running the command:
Output:
flowchart LR
N0[Entity]
N1[User]
N2[UserRepository]
N3[UserService]
N4[UserController]
N1 -->|inherits| N0
N2 -->|calls| N1
N3 -->|calls| N2
N3 -->|calls| N1
N4 -->|calls| N3
N4 -->|calls| N2
Visualizing with GraphViz¶
# Generate DOT file
repo-ctx graph ./src --type class --format dot > class_deps.dot
# Render to PNG
dot -Tpng class_deps.dot -o class_deps.png
# Render to SVG (better for large graphs)
dot -Tsvg class_deps.dot -o class_deps.svg
JSON Output Structure¶
{
"graph_type": "class",
"nodes": [
{
"id": "src/models.py:User",
"name": "User",
"type": "class",
"file_path": "src/models.py",
"labels": ["Symbol", "Class"]
}
],
"edges": [
{
"source": "src/models.py:User",
"target": "src/models.py:Entity",
"relation": "inherits",
"metadata": {}
},
{
"source": "src/services.py:UserService",
"target": "src/services.py:UserRepository",
"relation": "calls",
"metadata": {"from_method": "UserService.get_user"}
}
],
"stats": {
"node_count": 5,
"edge_count": 6
}
}
MCP Tool Usage¶
// Class dependency graph
await mcp.call("ctx-graph", {
path: "./src",
graphType: "class",
outputFormat: "json"
});
// Function call graph for specific file
await mcp.call("ctx-graph", {
path: "./src/services.py",
graphType: "function",
depth: 3 // Limit traversal depth
});
// For indexed repositories
await mcp.call("ctx-graph", {
repoId: "/owner/project",
graphType: "module"
});
Integration with Architecture Analysis¶
Dependency graphs are the foundation for other architecture tools:
Dependency Graph
│
├──► DSM (matrix visualization)
├──► Cycle Detection (find tangles)
├──► Layer Detection (discover structure)
├──► XS Metrics (quantify complexity)
└──► Architecture Rules (enforce constraints)
Export to .repo-ctx Directory¶
The dump command exports dependency graphs as part of the architecture analysis:
# Full dump includes dependency graphs
repo-ctx dump ./my-project --level full
# Created files:
# .repo-ctx/architecture/
# ├── class_dependencies.mmd # Mermaid class graph
# ├── function_dependencies.mmd # Mermaid function graph
# ├── file_dependencies.mmd # Mermaid file graph
# └── architecture.md # Summary with embedded diagrams
Persist to Neo4j Graph Database¶
For advanced querying and visualization, persist the graph to Neo4j:
# Dump with graph persistence
repo-ctx dump ./my-project --persist-graph
# Configure Neo4j connection
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USERNAME=neo4j
export NEO4J_PASSWORD=your-password
Cypher Queries After Persistence:
-- Find all classes that call UserService
MATCH (caller:Class)-[:CALLS]->(target:Class {name: 'UserService'})
RETURN caller.name, caller.file_path
-- Find inheritance hierarchy
MATCH path = (child:Class)-[:INHERITS*]->(parent:Class)
WHERE child.name = 'AdminUser'
RETURN path
-- Impact analysis: what depends on User class
MATCH (dependent)-[:CALLS|USES|INSTANTIATES]->(target:Class {name: 'User'})
RETURN dependent.name, type(r) as relationship
-- Find cycles in class dependencies
MATCH path = (a:Class)-[:CALLS*2..]->(a)
RETURN path
LIMIT 10
Implementation Reference¶
Core Classes¶
| Class | File | Purpose |
|---|---|---|
DSMBuilder |
architecture.py |
Build DSM matrix |
DSMResult |
architecture.py |
DSM data + visualization |
CycleDetector |
architecture.py |
Tarjan's SCC algorithm |
CycleInfo |
architecture.py |
Cycle data + breakup suggestions |
LayerDetector |
architecture_rules.py |
Topological layer detection |
LayerInfo |
architecture_rules.py |
Layer data |
ArchitectureRules |
architecture_rules.py |
Rule definition + checking |
RuleParser |
architecture_rules.py |
YAML rule parsing |
XSCalculator |
structural_metrics.py |
XS score calculation |
XSMetrics |
structural_metrics.py |
Metrics data |
HotspotDetector |
structural_metrics.py |
Complexity hotspot detection |
Data Flow¶
# Typical analysis flow
from repo_ctx.analysis import (
CodeAnalyzer, DependencyGraph, GraphType,
DSMBuilder, CycleDetector, LayerDetector,
RuleParser, XSCalculator, HotspotDetector
)
# 1. Analyze code
analyzer = CodeAnalyzer()
results = analyzer.analyze_files(files)
symbols = analyzer.aggregate_symbols(results)
dependencies = analyzer.aggregate_dependencies(results)
# 2. Build dependency graph
graph_builder = DependencyGraph()
graph = graph_builder.build(
symbols=symbols,
dependencies=dependencies,
graph_type=GraphType.CLASS
)
# 3. Generate DSM
dsm = DSMBuilder().build(graph)
# 4. Detect cycles
cycles = CycleDetector().detect(graph)
# 5. Detect layers
layers = LayerDetector().detect(graph)
# 6. Check rules (optional)
rules = RuleParser().parse_file("architecture.yaml")
violations = rules.check(graph)
# 7. Calculate metrics
metrics = XSCalculator().calculate_from_input(
XSInput(graph=graph, violations=violations)
)
# 8. Find hotspots
hotspots = HotspotDetector().detect(graph)
CLI and MCP Tools¶
CLI Commands¶
| Command | Description |
|---|---|
repo-ctx graph <target> |
Generate dependency graph |
repo-ctx dsm <target> |
Generate DSM matrix |
repo-ctx cycles <target> |
Detect cycles |
repo-ctx layers <target> |
Detect layers |
repo-ctx architecture <target> |
Check architecture rules |
repo-ctx metrics <target> |
Calculate XS metrics |
repo-ctx dump <target> |
Export analysis to .repo-ctx directory |
Common Options¶
--type, -t {file,module,class,function} # Graph type (default: class)
--format, -f {text,json} # Output format (default: text)
--rules, -r <file> # Architecture rules YAML file
MCP Tools¶
| Tool | Description |
|---|---|
ctx-graph |
Generate dependency graph (class, function, file, module) |
ctx-dsm |
Generate DSM matrix |
ctx-cycles |
Detect cycles with breakup suggestions |
ctx-layers |
Detect architectural layers |
ctx-architecture |
Check architecture rules |
ctx-metrics |
Calculate XS metrics |
MCP Tool Examples¶
// DSM Analysis
await mcp.call("ctx-dsm", {
path: "./src",
graphType: "class",
outputFormat: "json"
});
// Cycle Detection
await mcp.call("ctx-cycles", {
path: "./src",
graphType: "class"
});
// Layer Detection
await mcp.call("ctx-layers", {
repoId: "/owner/repo",
graphType: "module"
});
// Architecture Rules
await mcp.call("ctx-architecture", {
path: "./src",
rulesYaml: `
layers:
- name: ui
above: domain
- name: domain
above: data
forbidden:
- from: "*.data.*"
to: "*.ui.*"
`
});
// XS Metrics
await mcp.call("ctx-metrics", {
path: "./src",
rulesFile: "architecture.yaml",
outputFormat: "json"
});
LLM Integration for Software Modernization¶
This section explains how to use repo-ctx's MCP tools with Large Language Models for software modernization, refactoring, and code analysis tasks.
Why MCP + LLM?¶
| Challenge | How MCP Tools Help |
|---|---|
| LLMs lack codebase context | MCP provides structured code analysis on-demand |
| Token limits for large codebases | Filtered, focused results stay within context |
| Hallucinated dependencies | Real dependency graphs from actual code |
| Architecture decisions without data | Metrics and cycles provide quantified evidence |
Core Integration Patterns¶
Pattern 1: Architecture Assessment¶
Use this pattern when evaluating a codebase for modernization.
User: "Assess this codebase's architecture health"
LLM Workflow:
1. ctx-metrics → Get XS score and hotspots
2. ctx-cycles → Identify circular dependencies
3. ctx-layers → Understand current structure
4. ctx-dsm → Visualize coupling patterns
5. Synthesize findings into assessment report
MCP Tool Sequence:
// Step 1: Get overall health metrics
const metrics = await mcp.call("ctx-metrics", {
path: "./src",
outputFormat: "json"
});
// Step 2: Find cycles that need breaking
const cycles = await mcp.call("ctx-cycles", {
path: "./src",
graphType: "module"
});
// Step 3: Understand layer structure
const layers = await mcp.call("ctx-layers", {
path: "./src",
graphType: "class"
});
// LLM can now provide data-driven assessment
Example LLM Response:
## Architecture Assessment
**Overall Health: Grade C (XS Score: 47.5)**
### Key Findings:
1. **Cyclic Dependencies (2 cycles)**
- `services ↔ repositories` - Blocks independent testing
- `models ↔ validators` - Causes build order issues
2. **Detected Layers (4)**
- API → Services → Repositories → Database
- ⚠️ 3 violations of layer ordering detected
3. **Hotspots**
- `ServiceManager` (cycle participant, severity: 6.0)
- `DataAccess` (high coupling: 12 connections)
### Recommended Actions:
1. Break `services → repositories` cycle by introducing interfaces
2. Refactor `DataAccess` to reduce coupling
3. Move validation logic to dedicated layer
Pattern 2: Refactoring Planning¶
Use this pattern when planning a specific refactoring.
User: "I want to extract the user module into a microservice"
LLM Workflow:
1. ctx-cycles → Check if user module is in a cycle
2. ctx-dsm → See what depends on user module
3. ctx-analyze → Get user module's public interface
4. ctx-graph → Visualize dependencies to cut
5. Create extraction plan with specific steps
MCP Tool Sequence:
// Step 1: Check for cycles involving user module
const cycles = await mcp.call("ctx-cycles", {
path: "./src",
graphType: "module"
});
// Look for cycles containing "user"
// Step 2: Get DSM to see all dependencies
const dsm = await mcp.call("ctx-dsm", {
path: "./src",
graphType: "module",
outputFormat: "json"
});
// Step 3: Analyze user module's public interface
const analysis = await mcp.call("ctx-analyze", {
path: "./src/user",
language: "python",
outputFormat: "json"
});
// Step 4: Get detailed dependency graph
const graph = await mcp.call("ctx-graph", {
path: "./src",
graphType: "class",
outputFormat: "json"
});
Example LLM Response:
## Microservice Extraction Plan: User Module
### Current State Analysis
**Dependencies TO user module (must be updated):**
- `auth.login()` → `user.authenticate()`
- `billing.get_customer()` → `user.find_by_id()`
- `notifications.send()` → `user.get_email()`
**Dependencies FROM user module (must break or replicate):**
- `user.save()` → `database.execute()`
- `user.hash_password()` → `crypto.hash()`
⚠️ **Cycle Detected:** `user ↔ auth`
Must break before extraction!
### Extraction Steps
1. **Break user ↔ auth cycle**
- Move `auth.validate_token()` to user module
- Or: Create shared `TokenService` interface
2. **Define API boundary**
- Public endpoints: `find_by_id`, `authenticate`, `create`, `update`
- Internal: `hash_password`, `validate_email`
3. **Create anti-corruption layer**
- Replace direct calls with REST/gRPC clients
- Update 3 calling modules
4. **Database separation**
- Extract users table to separate schema
- Create data migration scripts
Pattern 3: Code Quality Enforcement¶
Use this pattern in CI/CD or code review workflows.
User: "Check if this PR violates our architecture rules"
LLM Workflow:
1. ctx-architecture → Check rules against current code
2. Report violations with specific file locations
3. Suggest fixes for each violation
MCP Tool Sequence:
// Check architecture rules
const violations = await mcp.call("ctx-architecture", {
path: "./src",
rulesYaml: `
name: "Clean Architecture"
layers:
- name: api
patterns: ["*.api.*", "*.controller.*"]
above: service
- name: service
patterns: ["*.service.*"]
above: repository
- name: repository
patterns: ["*.repository.*", "*.dao.*"]
forbidden:
- from: "*.controller.*"
to: "*.repository.*"
reason: "Controllers must use services"
`,
outputFormat: "json"
});
Example LLM Response:
## Architecture Violation Report
❌ **2 violations detected**
### Violation 1: Layer Order
- **Rule:** Controllers must use services
- **Location:** `src/api/user_controller.py:45`
- **Issue:** `UserController` directly calls `UserRepository.find_by_id()`
- **Fix:** Inject `UserService` and call `UserService.get_user()`
### Violation 2: Forbidden Dependency
- **Rule:** Repository cannot depend on API
- **Location:** `src/data/order_repository.py:23`
- **Issue:** Imports `from api.serializers import OrderDTO`
- **Fix:** Move `OrderDTO` to `models/` or create `repository/dto.py`
Pattern 4: Legacy Code Understanding¶
Use this pattern when onboarding to an unfamiliar codebase.
User: "Help me understand this legacy codebase structure"
LLM Workflow:
1. ctx-layers → Discover implicit architecture
2. ctx-docs → Get any existing documentation
3. ctx-analyze → Extract key classes and functions
4. ctx-dsm → Visualize relationships
5. Create architectural overview document
MCP Tool Sequence:
// Step 1: Discover layers
const layers = await mcp.call("ctx-layers", {
path: "./src",
graphType: "module"
});
// Step 2: Get existing docs
const docs = await mcp.call("ctx-docs", {
repository: "/owner/legacy-app",
max_tokens: 8000,
include: ["code", "diagrams"]
});
// Step 3: Analyze main components
const analysis = await mcp.call("ctx-analyze", {
path: "./src",
outputFormat: "json"
});
// Step 4: Get dependency overview
const dsm = await mcp.call("ctx-dsm", {
path: "./src",
graphType: "module"
});
Modernization Workflows¶
Workflow 1: Monolith to Microservices¶
graph TD
A[Analyze Current State] --> B[Identify Bounded Contexts]
B --> C[Detect Cycles to Break]
C --> D[Plan Extraction Order]
D --> E[Execute & Validate]
A --> |ctx-metrics| A1[XS Score]
A --> |ctx-dsm| A2[Dependency Matrix]
B --> |ctx-layers| B1[Layer Analysis]
B --> |ctx-analyze| B2[Module Boundaries]
C --> |ctx-cycles| C1[Cycle Detection]
D --> |LLM Synthesis| D1[Extraction Plan]
E --> |ctx-architecture| E1[Validate Rules]
E --> |ctx-metrics| E2[Track Improvement]
Prompt Template:
I'm modernizing a monolithic application. Use these MCP tools to analyze:
1. Run ctx-metrics on ./src to get the overall health
2. Run ctx-dsm on ./src to see module coupling
3. Run ctx-cycles to find circular dependencies
4. Run ctx-layers to understand the current structure
Then create a microservice extraction plan that:
- Lists modules in order of extraction (least coupled first)
- Identifies cycles that must be broken before extraction
- Estimates complexity based on coupling scores
- Suggests API boundaries based on current interfaces
Workflow 2: Framework Migration¶
graph TD
A[Inventory Current Usage] --> B[Identify Patterns]
B --> C[Plan Replacement Strategy]
C --> D[Track Migration Progress]
A --> |ctx-analyze| A1[Symbol Inventory]
A --> |ctx-find-symbol| A2[Usage Search]
B --> |ctx-graph| B1[Dependency Patterns]
C --> |LLM Plan| C1[Migration Steps]
D --> |ctx-metrics| D1[Complexity Reduction]
Prompt Template:
I need to migrate from Framework X to Framework Y. Use these MCP tools:
1. Run ctx-find-symbol to find all uses of "FrameworkX"
2. Run ctx-analyze to inventory current framework patterns
3. Run ctx-graph to see how framework usage is distributed
Then create a migration plan that:
- Lists all files/classes using Framework X
- Groups them by migration complexity (simple, moderate, complex)
- Identifies shared utilities that can be migrated once
- Suggests migration order (least dependent first)
Workflow 3: Technical Debt Reduction¶
graph TD
A[Measure Current Debt] --> B[Prioritize Hotspots]
B --> C[Plan Refactoring Sprint]
C --> D[Execute & Remeasure]
A --> |ctx-metrics| A1[XS Score Baseline]
B --> |Hotspot Detection| B1[Priority List]
B --> |ctx-cycles| B2[Cycle Impact]
C --> |LLM Plan| C1[Sprint Tasks]
D --> |ctx-metrics| D1[Score Improvement]
Prompt Template:
I want to reduce technical debt in this codebase. Use these MCP tools:
1. Run ctx-metrics to get current XS score and hotspots
2. Run ctx-cycles to find all cyclic dependencies
3. Run ctx-architecture with our rules to find violations
Then create a debt reduction plan that:
- Ranks hotspots by severity and fix effort
- Calculates expected XS score improvement per fix
- Creates sprint-sized work packages
- Defines acceptance criteria (target XS scores)
Best Practices for LLM Integration¶
1. Use JSON Output for Structured Analysis¶
// Always request JSON for programmatic processing
await mcp.call("ctx-metrics", {
path: "./src",
outputFormat: "json" // Not "text"
});
2. Limit Token Usage for Large Codebases¶
// For docs, always set max_tokens
await mcp.call("ctx-docs", {
repository: "/owner/repo",
max_tokens: 8000,
include: ["code"] // Only what you need
});
3. Chain Tools Logically¶
Good: metrics → cycles → architecture (progressive detail)
Bad: dsm → docs → analyze (unrelated sequence)
4. Cache Results for Multi-Turn Conversations¶
LLMs should store analysis results in context rather than re-running tools:
Turn 1: User asks for assessment
→ Run all analysis tools, store in context
Turn 2: User asks follow-up about cycles
→ Use cached cycle data, don't re-run
Turn 3: User asks about specific module
→ Run targeted analyze on that module only
5. Combine with CPGQL for Deep Analysis¶
// High-level architecture
const cycles = await mcp.call("ctx-cycles", {...});
// Deep data flow analysis for specific cycle
const dataflow = await mcp.call("ctx-cpg-query", {
path: "./src/problematic_module",
query: "cpg.method.name('save').reachableBy(cpg.method.name('validate')).l"
});
Example Prompts for Common Scenarios¶
Architecture Health Check¶
Analyze this codebase's architecture:
1. Use ctx-metrics to get the XS score
2. Use ctx-cycles to find circular dependencies
3. Use ctx-layers to understand the structure
4. Use ctx-architecture with standard Clean Architecture rules
Provide a report with:
- Overall health grade and score
- Top 3 issues to address
- Specific refactoring suggestions with file locations
Pre-PR Review¶
Before I submit this PR, check for architecture violations:
1. Run ctx-architecture with our team's rules
2. Run ctx-cycles to ensure no new cycles
3. Run ctx-metrics and compare to baseline
Flag any issues that would fail our architecture checks.
Dependency Analysis¶
I need to understand what will break if I change module X:
1. Use ctx-dsm to see what depends on X (check column for X)
2. Use ctx-graph with graphType=class for detailed view
3. Use ctx-find-symbol to find all uses of X's public APIs
List all affected files and the specific functions/classes that need updates.
Modernization Roadmap¶
Create a modernization roadmap for this legacy codebase:
1. Use ctx-metrics to assess current state
2. Use ctx-layers to understand architecture
3. Use ctx-cycles to identify blocking issues
4. Use ctx-analyze to inventory key components
Create a phased plan with:
- Phase 1: Critical cycle breaking
- Phase 2: Layer enforcement
- Phase 3: Module extraction
- Phase 4: Framework updates
Include effort estimates based on coupling and complexity.
Best Practices¶
1. Start with DSM¶
Use DSM to get a quick overview of coupling patterns:
2. Address Cycles First¶
Cycles are often the root cause of other problems:
3. Define Architecture Rules Early¶
Create architecture.yaml to enforce boundaries:
layers:
- name: api
patterns: ["*.api.*", "*.controller.*"]
above: service
- name: service
patterns: ["*.service.*"]
above: repository
- name: repository
patterns: ["*.repository.*", "*.dao.*"]
forbidden:
- from: "*.repository.*"
to: "*.api.*"
reason: "Repository layer cannot depend on API layer"
4. Track Metrics Over Time¶
Regularly check XS score to catch degradation:
5. Focus on Hotspots¶
Address highest-severity hotspots first:
References¶
- DSM Overview - DSM theory
- Tarjan's Algorithm - SCC detection
- Clean Architecture - Layered architecture