Architecture governance toolkit — a 7-script Python pipeline for risk scoring, drift detection, ownership analysis, and SBOM generation across any codebase.
Most codebases accumulate architectural debt invisibly. By the time a team notices, the drift between intended architecture and actual dependencies has calcified into technical risk that blocks every initiative. This toolkit makes that drift visible, quantifiable, and actionable — before it becomes an emergency.
This repository sits within ORGAN-I: Theoria — the organ of the organvm system concerned with theory, epistemology, recursion, and ontology. Architecture governance belongs here because it is fundamentally a question about knowledge: what does a system know about itself, and what has it forgotten?
A codebase is a living structure that drifts from its intended design under the pressure of deadlines, personnel changes, and incremental feature work. The intended architecture — the one in the wiki, the one the tech lead describes in onboarding — becomes a fiction. The actual architecture is encoded in import graphs, commit histories, and vulnerability surfaces. The gap between these two is architectural ignorance: the system no longer knows its own shape.
This toolkit operationalizes the epistemological question. It treats the intended architecture (declared in service_paths.yaml) as a hypothesis and the actual dependency graph as empirical evidence. Drift detection is then hypothesis testing: does reality still match the model? Risk scoring quantifies the consequences of divergence. Ownership analysis maps where institutional knowledge has concentrated and where it has evaporated.
The name — reverse-engine-recursive-run — reflects this recursive quality. The toolkit reverse-engineers a codebase’s actual structure, then runs that structure against its declared intent, producing a self-referential report: the system examining itself. This is the same recursive pattern explored in recursive-engine, the flagship ORGAN-I repository, applied not to abstract computation but to the concrete problem of software governance.
In the broader organvm model, this toolkit provides the theoretical foundation that ORGAN-IV (Taxis — orchestration) operationalizes at scale. Where ORGAN-IV coordinates governance across organizations, this repository provides the analytical primitives: the scoring models, detection heuristics, and reporting templates that make governance measurable rather than aspirational.
Software architecture degrades through three invisible mechanisms:
Knowledge concentration — critical subsystems understood by one person, creating single points of failure that only surface during attrition or incident response. The git history contains this information, but nobody reads git log to assess organizational risk.
Dependency drift — the actual import graph and service boundaries diverge from the documented or intended architecture, introducing coupling that contradicts design decisions. This coupling is invisible until someone tries to extract a service or change a shared module and discovers unexpected consumers.
Security surface expansion — vulnerability density in specific modules grows unchecked because scanning tools produce raw findings without contextual prioritization. A critical CVE in a dead-code module and a critical CVE in the authentication middleware are treated identically, diluting the signal.
Traditional approaches treat these as separate concerns: bus-factor analysis in one tool, dependency graphing in another, vulnerability scanning in a third. The result is three dashboards that nobody synthesizes. This toolkit unifies all three into a single pipeline that produces a weighted, composite risk score per service or module — so you can see which parts of your codebase are simultaneously poorly understood, architecturally drifting, and accumulating vulnerabilities.
The output is a prioritized remediation backlog, not a dashboard. It is designed to feed into sprint planning, not sit in a monitoring tab.
This is a working set of 7 Python/shell scripts, orchestrated by a Makefile, that analyze a codebase and produce structured risk assessments. The pipeline is functional and has been used for real analysis. It is not a SaaS product, a web application, or a platform — it is a command-line toolkit that runs locally or in a container.
What exists today:
make full-analysisdocs/summary_compiled.mdWhat does not exist yet (see Roadmap):
pyproject.toml)The toolkit follows a staged pipeline architecture where each script reads from upstream outputs or external tool results and writes structured JSON or YAML for the next stage. There is no shared state, no database, and no runtime coordination — each script is a standalone CLI tool that communicates through the filesystem.
┌─────────────────┐
│ Target Codebase │
└────────┬────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Trivy JSON │ │Semgrep JSON│ │ git log │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│parse_trivy.py│ │parse_semgrep │ │ownership_diff│
│ │ │ .py │ │ .py │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ │
┌──────────────────────────┐ │
│ Security Findings JSON │ │
└────────────┬─────────────┘ │
│ │
┌────────────┼─────────┐ │
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ radon │ │git churn │ │scan_drift│ │ownership │
│ (CC) │ │ │ │ .py │ │ .json │
└───┬────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
▼ ▼ ▼ │
┌───────────────────────┐ │ │
│ hotspot_merge.py │ │ │
│ (weighted risk scores)│ │ │
└──────────┬────────────┘ │ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────┐
│ risk_update.py │
│ (consolidated risk register) │
└──────────────────┬──────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌────────────┐ ┌──────────┐ ┌──────────────┐
│ Executive │ │Remediation│ │ ADR │
│ Summary │ │ Backlog │ │ Generation │
└────────────┘ └──────────┘ └──────────────┘
Key architectural properties:
pyyaml for YAML config). No pip install required for core functionality.artifacts/ directory. This makes the pipeline trivially debuggable — you can inspect any intermediate artifact.scan_drift.py exits with code 2 when drift exceeds the configured threshold, enabling CI/CD integration via standard Unix conventions.The pipeline consists of 7 scripts that run sequentially via make full-analysis. Each script is independently executable.
| # | Script | Purpose | Input | Output |
|---|---|---|---|---|
| 1 | parse_trivy.py |
Normalize Trivy vulnerability scan results into a standard schema | Trivy JSON output | Normalized vulnerability records |
| 2 | parse_semgrep.py |
Normalize Semgrep static analysis results into the same schema | Semgrep JSON output | Normalized finding records |
| 3 | gen_sbom.sh |
Generate Software Bill of Materials using Syft/CycloneDX across 5 ecosystems | Target codebase | SBOM in CycloneDX + SPDX format |
| 4 | scan_drift.py |
Detect architecture drift by comparing dependency graph snapshots | Two graph JSON snapshots + threshold | Drift report with boundary violations |
| 5 | ownership_diff.py |
Analyze git blame and commit history to identify knowledge concentration | Git repository + time window | Ownership concentration scores per directory |
| 6 | hotspot_merge.py |
Merge churn, complexity, coverage, and criticality signals into composite risk scores | Churn data + complexity JSON + optional coverage/criticality | Weighted risk scores per file |
| 7 | risk_update.py |
Aggregate all upstream reports into a prioritized consolidated risk register | Any combination of hotspots, drift, ownership, and security JSONs | Timestamped risk register with severity classifications |
The canonical invocation is:
make full-analysis
This runs stages 4-7 in order (hotspots, ownership, drift, risk). Security normalization (stages 1-2) and SBOM generation (stage 3) are run separately because they depend on external scanner output. Individual stages can also be run independently if you only need a subset of the analysis.
The composite risk score for each file is computed by hotspot_merge.py using a weighted linear formula. Four input signals are combined:
risk = (norm_churn * W_churn) + (norm_complexity * W_complexity)
+ (coverage_penalty * W_coverage) + (norm_criticality * W_criticality)
Where:
norm_churn = file_churn / max_churn — how frequently this file has changed in the analysis window (default: 90 days). High churn indicates instability.norm_complexity = avg_cyclomatic_complexity / max_cc — the average cyclomatic complexity of functions in the file, as measured by radon or equivalent. High complexity indicates maintenance burden.coverage_penalty = 1 - test_coverage — inverted test coverage. Low coverage means high penalty. Files with unknown coverage default to 0.5 (uncertain, not assumed-bad).norm_criticality = business_criticality / max_criticality — a manually assigned 1-5 score reflecting business impact. A payment processing module scores higher than an internal admin tool.The default weights in config/risk_weights.yaml are:
| Signal | Weight | Rationale |
|---|---|---|
| Churn | 0.30 | Volatile files are harder to reason about |
| Complexity | 0.35 | Complex code is the primary maintenance burden |
| Coverage gap | 0.15 | Missing tests compound other risks |
| Criticality | 0.10 | Business impact provides prioritization context |
| Security hotspot | 0.10 | Optional extension for vulnerability presence |
Weights can be overridden via environment variables (RISK_W_CHURN, RISK_W_COMPLEXITY, RISK_W_COVERAGE, RISK_W_CRITICALITY) or by editing the YAML config. The script normalizes weights to sum to 1.0 regardless of input values.
risk_update.py converts numeric risk scores into severity labels using configurable thresholds:
| Score Range | Severity | Action |
|---|---|---|
| >= 0.70 | HIGH | Immediate attention — schedule remediation this sprint |
| >= 0.50 | MEDIUM | Plan remediation within 90 days |
| < 0.50 | LOW | Monitor; address opportunistically |
Certain conditions trigger automatic severity escalation regardless of the base score:
hotspot_merge.py writes a JSON file with full transparency into the scoring:
{
"meta": {
"weights": { "churn": 0.3, "complexity": 0.35, "coverage": 0.15, "criticality": 0.1 }
},
"hotspots": [
{
"file": "src/core/payment_validator.py",
"churn": 47,
"avg_complexity": 8.3,
"coverage": 0.42,
"criticality": 5,
"risk_score": 0.7821,
"components": {
"churn": 0.2100,
"complexity": 0.2905,
"coverage_penalty": 0.0870,
"criticality_factor": 0.1000
}
}
]
}
The components breakdown lets you see exactly which signals drove the score, making it possible to have informed conversations about remediation strategy — should you reduce complexity, add tests, or spread knowledge?
scan_drift.py is the most theoretically interesting script in the pipeline. It operates on a premise drawn from control theory: if you can declare what your system boundaries should be, the tool can tell you where reality has diverged.
The script compares two dependency graph snapshots — a “previous” baseline and a “current” state — and computes:
(added_edges + removed_edges) / previous_edge_count, a single number summarizing the rate of structural changeBoth snapshots use a simple JSON schema:
{
"nodes": [
{ "id": "moduleA", "group": "serviceX" }
],
"edges": [
{ "from": "moduleA", "to": "moduleB", "type": "import" }
],
"meta": {
"ref": "abc123",
"generated_at": "2025-10-29T00:00:00Z"
}
}
This schema is deliberately minimal. You can generate it from AST parsing (for Python or JavaScript import analysis), from go mod graph, from explicit service declarations, or from any other dependency extraction tool. The drift detector does not care how the graph was produced — it only compares two snapshots.
This convention enables CI/CD integration: a GitHub Actions step can run drift detection and fail the build if the architecture has changed more than expected.
The output includes a SHA-256 hash of the summary fields, allowing downstream consumers to verify that the drift report has not been tampered with between generation and consumption.
ownership_diff.py answers the question that org charts cannot: who actually understands this code?
The script runs git log over a configurable time window (default: 90 days) and computes per-directory authorship concentration. For each directory (at a configurable depth), it calculates:
HIGH_CONCENTRATION (top author > 60% but not sole contributor) or SINGLE_CONTRIBUTOR (only one person has touched the directory)The flag thresholds are configurable. The default 60% threshold is deliberately aggressive — it flags directories where a single person has done most of the work even if others have contributed. This is because a 60/20/20 split means two people could leave and the remaining person still would not fully understand the module.
An optional criticality mapping (YAML) allows you to weight the importance of each directory. A SINGLE_CONTRIBUTOR flag on a criticality-5 payment processing directory is far more concerning than the same flag on an internal utility script.
The toolkit does not perform security scanning itself — it normalizes the output of existing scanners into a unified schema that feeds into the risk aggregator.
parse_trivy.py)Ingests Trivy JSON output and produces normalized findings with:
target::package@version)parse_semgrep.py)Ingests Semgrep JSON output with severity mapping:
ERROR maps to HIGHWARNING maps to MEDIUMINFO maps to LOWBoth parsers produce JSON arrays with the same schema, making them interchangeable inputs to risk_update.py. This normalization layer means you can swap Trivy for Grype, or Semgrep for CodeQL, by writing a single parser script that outputs the same schema.
pyyaml optional for YAML config)Optional for extended features:
pip install radon) for Python complexity metrics# Clone the repository
git clone https://github.com/organvm-i-theoria/reverse-engine-recursive-run.git
cd reverse-engine-recursive-run
# Edit service_paths.yaml to declare your architecture
# (see Configuration Reference below)
# Run the full analysis pipeline
make full-analysis
# Results appear in artifacts/
cat artifacts/consolidated_risk.json | python3 -m json.tool
# Build the analysis image
docker build -f Dockerfile.analysis -t reverse-engine .
# Mount your target codebase and run
docker run --rm \
-v /path/to/your/codebase:/workspace \
reverse-engine \
make full-analysis
# Ownership analysis only (runs git log internally)
python3 scripts/ownership_diff.py --days 90 --depth 2 --out ownership.json
# Hotspot analysis only (requires pre-generated churn and complexity data)
python3 scripts/hotspot_merge.py \
--churn churn.txt \
--complexity complexity.json \
--out hotspots.json
# Drift detection only
python3 scripts/scan_drift.py \
--current current_graph.json \
--previous previous_graph.json \
--threshold 0.1 \
--out drift_report.json
# Risk aggregation (accepts any subset of inputs)
python3 scripts/risk_update.py \
--hotspots hotspots.json \
--drift drift_report.json \
--ownership ownership.json \
--out consolidated_risk.json
See QUICKSTART.md for detailed setup instructions with troubleshooting guidance, and scripts/README.md for per-script documentation.
All configuration is file-based. There are no environment variables required (though hotspot_merge.py accepts optional weight overrides via RISK_W_* env vars).
config/risk_weights.yamlControls the risk scoring model:
weights:
churn: 0.30 # Code volatility (git activity)
complexity: 0.35 # Cyclomatic complexity
coverage_gap: 0.15 # Missing test coverage
criticality: 0.10 # Business impact weight
security_hotspot: 0.10 # Vulnerability presence
thresholds:
hotspot_high: 0.70 # Score >= this = HIGH severity
hotspot_medium: 0.50 # Score >= this = MEDIUM severity
ownership_concentration: 0.60 # Flag above this %
drift_churn_high: 0.30 # Drift ratio >= this = HIGH
drift_churn_medium: 0.12 # Drift ratio >= this = MEDIUM
Tuning guidance: increase the complexity weight during a refactoring phase to surface the worst tangles. Increase the churn weight before a release freeze to identify volatile modules that need stabilization.
config/service_paths.yamlDeclares your intended architecture — the service boundaries that drift detection compares against:
services:
payments-service:
paths:
- "src/payments/"
- "src/core/payment"
billing-service:
paths:
- "src/billing/"
- "src/core/billing"
admin-portal:
paths:
- "src/ui/admin/"
This file answers: what services exist, and which directories belong to each? When scan_drift.py finds an import that crosses these boundaries unexpectedly, it flags a boundary violation.
The Makefile provides named targets for each analysis stage:
| Target | What It Does |
|---|---|
make full-analysis |
Run the complete pipeline: hotspots, ownership, drift, risk aggregation |
make hotspots |
Generate churn data from git, create placeholder complexity, merge into hotspot report |
make ownership |
Run ownership concentration analysis |
make drift |
Run drift detection (creates baseline graph on first run) |
make risk |
Aggregate all available reports into consolidated risk register |
make sbom |
Generate SBOM via gen_sbom.sh |
make build-analysis-image |
Build the Docker analysis image |
make clean |
Remove the artifacts/ directory |
make adr-new TITLE='...' |
Create a new Architecture Decision Record |
All targets create the artifacts/ directory if it does not exist.
The templates/ directory contains two structured output templates:
templates/executive_summary_template.md)A comprehensive Markdown template with 13 sections designed for architecture review boards, leadership briefings, and audit submissions. Includes placeholders for:
templates/remediation_backlog.yaml)A structured YAML template for tracking prioritized remediation items. Each item includes:
(impact * urgency * criticality) / effort_weightThe template ships with 5 example items covering code quality, security, architecture drift, knowledge concentration, and observability gaps. Replace these with your actual findings.
The Dockerfile.analysis provides a reproducible analysis environment based on Ubuntu 22.04 with:
The image is designed for ephemeral analysis runs, not as a long-running service. Mount your target codebase at /workspace and run any Make target or individual script.
gen_sbom.sh automatically detects and generates SBOMs for up to 5 ecosystems:
| Ecosystem | Detection | Tool | Output Format |
|---|---|---|---|
| Node.js | package.json |
@cyclonedx/cyclonedx-npm |
CycloneDX JSON |
| Python | requirements.txt or pyproject.toml |
cyclonedx-py + pipdeptree |
CycloneDX JSON |
| Go | go.mod |
syft |
CycloneDX JSON |
| Java | pom.xml or *.gradle |
Maven dependency:tree | Text |
| Rust | Cargo.toml |
cargo metadata + syft |
CycloneDX JSON |
The script consolidates all CycloneDX fragments into a single sbom_combined.cyclonedx.json with deduplicated components, a UUID serial number, and the source git ref embedded in metadata. If Syft is available, it also generates an SPDX JSON output.
The SBOM outputs contain only dependency coordinates (package names, versions, licenses) — never source code. This makes them safe for distribution in compliance contexts (SOX, PCI DSS, HIPAA, ISO 27001) without exposing proprietary implementation details.
Architecture Decision Records provide a paper trail for governance decisions. The toolkit includes:
scripts/adr_new.sh — creates a new ADR from the template with auto-incrementing four-digit numbering. Usage: make adr-new TITLE='Adopt Event-Driven Architecture'docs/adr/ADR_TEMPLATE.md — a structured template with sections for Context, Decision, Alternatives Considered, Consequences, Implementation Plan, and Metrics/Validationdocs/adr/0000-record-architecture-decisions.md — the meta-ADR documenting the decision to use ADRs, serving as both the index and the workflow referenceADR scaffolding is included alongside the analysis toolkit because governance decisions should be recorded in the same context where problems are detected. When drift analysis identifies a boundary violation, the ADR scaffolding provides an immediate place to document the response: was the boundary intentionally changed, or was this accidental coupling that needs to be reversed?
reverse-engine-recursive-run/
├── Makefile # Pipeline orchestration
├── Dockerfile.analysis # Reproducible environment (Ubuntu 22.04)
├── README.md # This document
├── QUICKSTART.md # Setup and first-run guide
├── config/
│ ├── risk_weights.yaml # Tunable risk signal weights and thresholds
│ └── service_paths.yaml # Declared service boundaries for drift detection
├── scripts/
│ ├── README.md # Per-script documentation and CI pipeline diagram
│ ├── hotspot_merge.py # Risk scoring — 4-signal weighted linear model
│ ├── scan_drift.py # Architecture drift — graph diff with boundary flags
│ ├── ownership_diff.py # Knowledge concentration — git authorship analysis
│ ├── risk_update.py # Aggregation — consolidates all upstream reports
│ ├── parse_trivy.py # Security normalization — Trivy output
│ ├── parse_semgrep.py # Security normalization — Semgrep output
│ ├── gen_sbom.sh # SBOM generation — 5-ecosystem auto-detection
│ └── adr_new.sh # ADR scaffolding — numbered record creation
├── templates/
│ ├── executive_summary_template.md # 13-section report template for leadership
│ └── remediation_backlog.yaml # Prioritized backlog with scoring formula
├── docs/
│ ├── summary_compiled.md # 3,000+ word methodology reference
│ └── adr/
│ ├── 0000-record-architecture-decisions.md # ADR index and workflow
│ └── ADR_TEMPLATE.md # Template for new ADRs
└── .gitignore
22 files, ~158KB total. Pure Python + shell. No external Python dependencies beyond the standard library.
Each script is a standalone CLI tool with no cross-imports. This means you can copy any single script into another project and use it immediately. The cost is some duplication (each script has its own argparse setup and JSON loading) and no shared utilities. The benefit is zero coupling between tools — you can adopt hotspot analysis without drift detection, or vice versa.
JSON is universally readable, trivially debuggable (cat artifacts/hotspots.json | python3 -m json.tool), and requires no additional libraries. Every intermediate artifact is human-inspectable. The alternative — an in-process pipeline with shared Python objects — would be faster but opaque. For a governance toolkit, transparency matters more than performance.
The weighted linear model is explainable. When you present a risk score to a team lead, they can see exactly which signals drove it and by how much. A neural network or random forest might produce more accurate rankings, but it cannot answer “why is this file risky?” in a way that leads to actionable conversation. Governance tools must be interpretable.
No pip install step means no dependency conflicts, no virtual environment setup, and no supply chain risk from third-party packages. The only optional dependency is pyyaml for YAML config loading, and even that falls back gracefully (the scripts accept JSON configuration as well). This is a governance tool — it should not itself be a source of dependency risk.
Exit code 1 conventionally means “general error.” Exit code 2 means “drift threshold exceeded” — a structured, expected condition that CI/CD can act on. This follows the convention of tools like grep (exit 1 = no match, not an error) and allows pipelines to distinguish between “the tool crashed” and “the tool ran successfully and found a problem.”
The toolkit is designed for extension through new scripts that follow the same conventions:
hotspot_merge.py or as a new input to risk_update.py.parse_trivy.py — read the scanner’s native JSON, output a list of { "id", "severity", "component", "desc", "recommendation" } objects.consolidated_risk.json and produces your desired output (Jira tickets, Slack messages, Backstage annotations, etc.).coverage.json file with { "files": { "path": fraction } } and pass it to hotspot_merge.py --coverage.hotspot_merge.py --criticality or ownership_diff.py --criticality.The following capabilities are planned but do not exist in the current codebase:
pyproject.toml with proper dependency declaration, versioned releases, and pip install supportContributions toward any of these are welcome. The current scripts are deliberately simple (single-file, standard library only) to make extension straightforward.
| Resource | Organ | Relationship |
|---|---|---|
| recursive-engine | I — Theoria | Flagship ORGAN-I repository exploring recursive system theory and self-referential computation. This toolkit applies that recursive principle to software governance: the system examines itself. |
| ORGAN-I: Theoria | I — Theoria | Parent organization. Theory, epistemology, recursion, ontology. |
| ORGAN-IV: Taxis | IV — Taxis | Orchestration organ. This toolkit’s risk primitives feed into ORGAN-IV’s cross-organ governance routing. |
| agentic-titan | IV — Taxis | Agentic orchestration system. Could consume this toolkit’s outputs as governance signals for automated decision-making. |
| meta-organvm | VIII — Meta | Umbrella governance across all 8 organs. This toolkit exemplifies the meta-governance principle at the individual-repo scale. |
This repository belongs to ORGAN-I because architecture governance is fundamentally a theoretical concern: it asks “what should the system look like?” and measures deviation from that ideal. The toolkit operationalizes architectural intent — making it a bridge between theory (ORGAN-I) and orchestration (ORGAN-IV). It does not orchestrate; it observes, measures, and reports. The orchestration — deciding what to do about the findings — is ORGAN-IV’s domain.
hotspot_merge.py to generate complexity metrics.parse_trivy.py normalizes its output.parse_semgrep.py normalizes its output.gen_sbom.sh.Contributions are welcome, particularly toward the Roadmap items. The toolkit is designed to be extended by adding new scripts that follow the existing conventions:
argparse with --out for output pathPlease open an issue before starting work on a major feature to discuss approach.
MIT License. See LICENSE for full text.
Author: @4444J99 / Part of ORGAN-I: Theoria