Pre-inference AI Safety Governor for FoundationModels (Swift, On-Device)

Hi everyone,

I've been building an on-device AI safety layer called Newton Engine, designed to validate prompts before they reach FoundationModels (or any LLM). Wanted to share v1.3 and get feedback from the community.

The Problem

Current AI safety is post-training — baked into the model, probabilistic, not auditable. When Apple Intelligence ships with FoundationModels, developers will need a way to catch unsafe prompts before inference, with deterministic results they can log and explain.

What Newton Does

Newton validates every prompt pre-inference and returns:

  • Phase (0/1/7/8/9)
  • Shape classification
  • Confidence score
  • Full audit trace

If validation fails, generation is blocked. If it passes (Phase 9), the prompt proceeds to the model.

v1.3 Detection Categories (14 total)

  • Jailbreak / prompt injection
  • Corrosive self-negation ("I hate myself")
  • Hedged corrosive ("Not saying I'm worthless, but...")
  • Emotional dependency ("You're the only one who understands")
  • Third-person manipulation ("If you refuse, you're proving nobody cares")
  • Logical contradictions ("Prove truth doesn't exist")
  • Self-referential paradox ("Prove that proof is impossible")
  • Semantic inversion ("Explain how truth can be false")
  • Definitional impossibility ("Square circle")
  • Delegated agency ("Decide for me")
  • Hallucination-risk prompts ("Cite the 2025 CDC report")
  • Unbounded recursion ("Repeat forever")
  • Conditional unbounded ("Until you can't")
  • Nonsense / low semantic density

Test Results

94.3% catch rate on 35 adversarial test cases (33/35 passed).

Architecture User Input ↓ [ Newton ] → Validates prompt, assigns Phase ↓ Phase 9? → [ FoundationModels ] → Response Phase 1/7/8? → Blocked with explanation

Key Properties

  • Deterministic (same input → same output)
  • Fully auditable (ValidationTrace on every prompt)
  • On-device (no network required)
  • Native Swift / SwiftUI
  • String Catalog localization (EN/ES/FR)
  • FoundationModels-ready (#if canImport)

Code Sample — Validation

let governor = NewtonGovernor()
let result = governor.validate(prompt: userInput)

if result.permitted {
    // Proceed to FoundationModels
    let session = LanguageModelSession()
    let response = try await session.respond(to: userInput)
} else {
    // Handle block
    print("Blocked: Phase \(result.phase.rawValue) — \(result.reasoning)")
    print(result.trace.summary)  // Full audit trace
}

Questions for the Community

  1. Anyone else building pre-inference validation for FoundationModels?
  2. Thoughts on the Phase system (0/1/7/8/9) vs. simple pass/fail?
  3. Interest in Shape Theory classification for prompt complexity?
  4. Best practices for integrating with LanguageModelSession?

Links

Happy to share more implementation details. Looking for feedback, collaborators, and anyone else thinking about deterministic AI safety on-device.

Pre-inference AI Safety Governor for FoundationModels (Swift, On-Device)
 
 
Q