Skip to content

CPG Analysis Output Guide

This guide explains how to interpret the Code Property Graph (CPG) analysis output generated by repo-ctx using Joern.

Overview

The CPG analysis produces structured markdown reports designed for: - LLM code analysis - Clean, contextual information suitable for AI assistants - Code review - Understanding codebase structure at a glance - Documentation - Auto-generated code maps

Output Files

When analyzing code with analyze_examples.py or using the CPG queries directly, the following files are generated:

File Description
cpg-analysis.md Combined report with all sections
cpg-methods.md Functions and methods only
cpg-types.md Classes and type declarations
cpg-calls.md Call graph (who calls whom)

Report Sections

1. Classes/Types Section

Shows type declarations (classes, interfaces, structs) organized by inheritance.

## Classes/Types

### Inheritance Hierarchy

| Type | Extends | File | Line |
|------|---------|------|------|
| `Shape` | ABC | sample.py | 17 |
| `Rectangle` | Shape | sample.py | 53 |

### Standalone Types

| Type | File | Line |
|------|------|------|
| `ShapeManager` | sample.py | 120 |

How to interpret: - Inheritance Hierarchy: Classes that extend other classes. The "Extends" column shows parent class(es). - Standalone Types: Classes with no explicit parent (may still have implicit base like object). - File/Line: Exact location for navigation.

LLM usage: - Use inheritance to understand class relationships - Identify abstract base classes (like ABC) vs concrete implementations - Find entry points (classes with no children)

2. Methods/Functions Section

Lists all functions and methods grouped by file.

## Methods/Functions

### `sample.py`

| Method | Lines | Parameters |
|--------|-------|------------|
| `__init__` | 20-23 | color:str |
| `__init__` | 56-61 | width:float, height:float, color:str |
| `add_shape` | 126-131 | shape:Shape |
| `filter_by` | 139-143 | predicate:typing.Callable |

How to interpret: - Method: Function or method name - Lines: Start-end line range (useful for understanding method size) - Parameters: Method parameters with types (from type hints in Python, type declarations in Java/etc.)

LLM usage: - Identify long methods (large line ranges) for refactoring candidates - Find constructors (__init__, constructor, etc.) - Locate specific functionality by name

3. Call Graph Section

Shows which functions call which other functions.

## Call Graph

| Caller | Calls | Line |
|--------|-------|------|
| `main` | `create_shapes` | 195 |
| `main` | `process_async` | 200 |
| `process_async` | `calculate_total` | 165 |

How to interpret: - Caller: The function making the call - Calls: The function being called - Line: Where the call occurs

LLM usage: - Trace execution flow from entry points - Identify heavily-called functions (popular targets) - Find unused functions (never appear in "Calls" column) - Understand dependencies between modules

Using Output for LLM Code Analysis

1. Architecture Understanding:

Given this code structure:
[paste cpg-analysis.md]

Explain the overall architecture and main responsibilities of each class.

2. Code Review:

Based on this call graph:
[paste cpg-calls.md]

Identify potential issues:
- Circular dependencies
- God classes (too many incoming calls)
- Dead code (functions never called)

3. Refactoring Suggestions:

Given these methods:
[paste cpg-methods.md]

Identify:
- Methods that might be too long (>50 lines)
- Potential extraction candidates
- Methods that could be moved to different classes

Combining with Source Code

For deeper analysis, combine CPG output with actual code:

Here is the code structure:
[paste cpg-analysis.md]

And here is the implementation of the Shape class:
[paste actual code]

Analyze how well the implementation follows the class hierarchy.

Query Output Format

The underlying queries produce pipe-delimited output that the formatter processes:

Methods Query Output

name|fullName|file|lineStart|lineEnd|parameters
__init__|sample.Shape.__init__|sample.py|20|23|color:str
__init__|sample.Rectangle.__init__|sample.py|56|61|width:float, height:float, color:str
add_shape|sample.ShapeManager.add_shape|sample.py|126|131|shape:Shape

Types Query Output

name|fullName|file|line|inheritsFrom
Shape|sample.Shape|sample.py|17|ABC
Rectangle|sample.Rectangle|sample.py|53|Shape

Calls Query Output

caller|callee|line
main|create_shapes|195
process_async|calculate_total|165

Filtering Applied

The output is automatically cleaned to remove Joern internal artifacts:

Filtered Pattern Description
<operator>.* Built-in operators
<meta* Metaclass handlers
<fake* Synthetic nodes
<body> Body markers
<module> Module markers
<lambda>N Anonymous functions
ANY Type placeholders

Programmatic Usage

from repo_ctx.joern import (
    CPGFormatter,
    QUERY_LLM_METHODS,
    QUERY_LLM_TYPES,
    QUERY_LLM_CALLS,
)
from repo_ctx.analysis import CodeAnalyzer

analyzer = CodeAnalyzer()
formatter = CPGFormatter()

# Run queries
methods = analyzer.run_cpg_query("/path/to/code", QUERY_LLM_METHODS)
types = analyzer.run_cpg_query("/path/to/code", QUERY_LLM_TYPES)
calls = analyzer.run_cpg_query("/path/to/code", QUERY_LLM_CALLS)

# Format for LLM
report = formatter.format_combined_report(
    methods_output=methods.get("output", ""),
    types_output=types.get("output", ""),
    calls_output=calls.get("output", ""),
    source_path="/path/to/code",
    language="python"
)

print(report)  # Markdown suitable for LLM

Available LLM-Friendly Queries

Query Purpose
QUERY_LLM_METHODS Methods with file, line, signature
QUERY_LLM_TYPES Types with inheritance hierarchy
QUERY_LLM_CALLS Call graph (caller → callee)
QUERY_LLM_MEMBERS Class fields and members
QUERY_LLM_SUMMARY Classes with their methods listed

Best Practices

  1. Start with cpg-analysis.md for overall understanding
  2. Use call graph to trace specific functionality
  3. Combine with source code for detailed analysis
  4. Filter by file when analyzing large codebases
  5. Look for patterns in inheritance hierarchies

Limitations

  • Dynamic languages: Python/JavaScript may miss runtime-determined calls
  • External libraries: Only internal code is analyzed (external = filtered)
  • Generics/Templates: Complex type parameters may be simplified
  • Macros/Preprocessing: C/C++ macros may not fully resolve