CPG Analysis Output Guide¶
This guide explains how to interpret the Code Property Graph (CPG) analysis output generated by repo-ctx using Joern.
Overview¶
The CPG analysis produces structured markdown reports designed for: - LLM code analysis - Clean, contextual information suitable for AI assistants - Code review - Understanding codebase structure at a glance - Documentation - Auto-generated code maps
Output Files¶
When analyzing code with analyze_examples.py or using the CPG queries directly, the following files are generated:
| File | Description |
|---|---|
cpg-analysis.md |
Combined report with all sections |
cpg-methods.md |
Functions and methods only |
cpg-types.md |
Classes and type declarations |
cpg-calls.md |
Call graph (who calls whom) |
Report Sections¶
1. Classes/Types Section¶
Shows type declarations (classes, interfaces, structs) organized by inheritance.
## Classes/Types
### Inheritance Hierarchy
| Type | Extends | File | Line |
|------|---------|------|------|
| `Shape` | ABC | sample.py | 17 |
| `Rectangle` | Shape | sample.py | 53 |
### Standalone Types
| Type | File | Line |
|------|------|------|
| `ShapeManager` | sample.py | 120 |
How to interpret:
- Inheritance Hierarchy: Classes that extend other classes. The "Extends" column shows parent class(es).
- Standalone Types: Classes with no explicit parent (may still have implicit base like object).
- File/Line: Exact location for navigation.
LLM usage:
- Use inheritance to understand class relationships
- Identify abstract base classes (like ABC) vs concrete implementations
- Find entry points (classes with no children)
2. Methods/Functions Section¶
Lists all functions and methods grouped by file.
## Methods/Functions
### `sample.py`
| Method | Lines | Parameters |
|--------|-------|------------|
| `__init__` | 20-23 | color:str |
| `__init__` | 56-61 | width:float, height:float, color:str |
| `add_shape` | 126-131 | shape:Shape |
| `filter_by` | 139-143 | predicate:typing.Callable |
How to interpret: - Method: Function or method name - Lines: Start-end line range (useful for understanding method size) - Parameters: Method parameters with types (from type hints in Python, type declarations in Java/etc.)
LLM usage:
- Identify long methods (large line ranges) for refactoring candidates
- Find constructors (__init__, constructor, etc.)
- Locate specific functionality by name
3. Call Graph Section¶
Shows which functions call which other functions.
## Call Graph
| Caller | Calls | Line |
|--------|-------|------|
| `main` | `create_shapes` | 195 |
| `main` | `process_async` | 200 |
| `process_async` | `calculate_total` | 165 |
How to interpret: - Caller: The function making the call - Calls: The function being called - Line: Where the call occurs
LLM usage: - Trace execution flow from entry points - Identify heavily-called functions (popular targets) - Find unused functions (never appear in "Calls" column) - Understand dependencies between modules
Using Output for LLM Code Analysis¶
Recommended Prompt Patterns¶
1. Architecture Understanding:
Given this code structure:
[paste cpg-analysis.md]
Explain the overall architecture and main responsibilities of each class.
2. Code Review:
Based on this call graph:
[paste cpg-calls.md]
Identify potential issues:
- Circular dependencies
- God classes (too many incoming calls)
- Dead code (functions never called)
3. Refactoring Suggestions:
Given these methods:
[paste cpg-methods.md]
Identify:
- Methods that might be too long (>50 lines)
- Potential extraction candidates
- Methods that could be moved to different classes
Combining with Source Code¶
For deeper analysis, combine CPG output with actual code:
Here is the code structure:
[paste cpg-analysis.md]
And here is the implementation of the Shape class:
[paste actual code]
Analyze how well the implementation follows the class hierarchy.
Query Output Format¶
The underlying queries produce pipe-delimited output that the formatter processes:
Methods Query Output¶
name|fullName|file|lineStart|lineEnd|parameters
__init__|sample.Shape.__init__|sample.py|20|23|color:str
__init__|sample.Rectangle.__init__|sample.py|56|61|width:float, height:float, color:str
add_shape|sample.ShapeManager.add_shape|sample.py|126|131|shape:Shape
Types Query Output¶
name|fullName|file|line|inheritsFrom
Shape|sample.Shape|sample.py|17|ABC
Rectangle|sample.Rectangle|sample.py|53|Shape
Calls Query Output¶
Filtering Applied¶
The output is automatically cleaned to remove Joern internal artifacts:
| Filtered Pattern | Description |
|---|---|
<operator>.* |
Built-in operators |
<meta* |
Metaclass handlers |
<fake* |
Synthetic nodes |
<body> |
Body markers |
<module> |
Module markers |
<lambda>N |
Anonymous functions |
ANY |
Type placeholders |
Programmatic Usage¶
from repo_ctx.joern import (
CPGFormatter,
QUERY_LLM_METHODS,
QUERY_LLM_TYPES,
QUERY_LLM_CALLS,
)
from repo_ctx.analysis import CodeAnalyzer
analyzer = CodeAnalyzer()
formatter = CPGFormatter()
# Run queries
methods = analyzer.run_cpg_query("/path/to/code", QUERY_LLM_METHODS)
types = analyzer.run_cpg_query("/path/to/code", QUERY_LLM_TYPES)
calls = analyzer.run_cpg_query("/path/to/code", QUERY_LLM_CALLS)
# Format for LLM
report = formatter.format_combined_report(
methods_output=methods.get("output", ""),
types_output=types.get("output", ""),
calls_output=calls.get("output", ""),
source_path="/path/to/code",
language="python"
)
print(report) # Markdown suitable for LLM
Available LLM-Friendly Queries¶
| Query | Purpose |
|---|---|
QUERY_LLM_METHODS |
Methods with file, line, signature |
QUERY_LLM_TYPES |
Types with inheritance hierarchy |
QUERY_LLM_CALLS |
Call graph (caller → callee) |
QUERY_LLM_MEMBERS |
Class fields and members |
QUERY_LLM_SUMMARY |
Classes with their methods listed |
Best Practices¶
- Start with cpg-analysis.md for overall understanding
- Use call graph to trace specific functionality
- Combine with source code for detailed analysis
- Filter by file when analyzing large codebases
- Look for patterns in inheritance hierarchies
Limitations¶
- Dynamic languages: Python/JavaScript may miss runtime-determined calls
- External libraries: Only internal code is analyzed (external = filtered)
- Generics/Templates: Complex type parameters may be simplified
- Macros/Preprocessing: C/C++ macros may not fully resolve