Engineering in safety-critical domains has a way of humbling you. This week, our focus was on reaching full Core-5 rule coverage — the five categories we see as non-negotiable for any compliant system:

We've had these rules on paper for a while, but proving that the scanner could enforce them across languages turned out to be more challenging than expected.

The challenge

The first surprise was how easily trust can be eroded by a single hidden bug. A small hard-coded return value (false on file existence checks) meant our scanner was effectively blind in certain scenarios. On the surface everything looked fine — rules ran, reports were generated — but under the hood, whole classes of violations were being silently skipped.

It's a reminder that when you're building compliance tools, you're working in a zero-trust environment. If the tool itself is untrustworthy, everything downstream — compliance scoring, evidence packs, even executive dashboards — is compromised.

How we approached it

We stripped out the placeholder logic and started forcing the scanner to produce real findings. That meant running it against a deliberately "dirty" test suite full of violations. Only once we saw the expected mess on screen did we know the rules were actually firing.

From there, we tightened the adapters for Python and TypeScript so all five categories were enforced consistently. The scanner now produces reliable outputs across both languages, which was our first major milestone.

What we learned

Coverage is a spectrum. You can have 10/10 rules implemented, but if one adapter is flaky, coverage is an illusion.

Determinism matters. Our evidence packs only carry weight if every run produces the same results for the same codebase.

Engineers build confidence by breaking things. Writing a suite of intentionally bad code was as valuable as writing the rules themselves — it proved the system doesn't just "work" in happy-path scenarios.

What's next

Now that the scanner reliably flags Core-5 violations (68 findings in the test suite this week), we're turning our attention to how those findings are surfaced.

Do developers see clear, actionable messages? Do compliance officers get audit-grade evidence packs? Can we strike a balance where one output doesn't overwhelm the other?

That's the challenge for Week 2.