Hi everyone,
I've been building an on-device AI safety layer called Newton Engine, designed to validate prompts before they reach FoundationModels (or any LLM). Wanted to share v1.3 and get feedback from the community.
The Problem
Current AI safety is post-training — baked into the model, probabilistic, not auditable. When Apple Intelligence ships with FoundationModels, developers will need a way to catch unsafe prompts before inference, with deterministic results they can log and explain.
What Newton Does
Newton validates every prompt pre-inference and returns:
- Phase (0/1/7/8/9)
- Shape classification
- Confidence score
- Full audit trace
If validation fails, generation is blocked. If it passes (Phase 9), the prompt proceeds to the model.
v1.3 Detection Categories (14 total)
- Jailbreak / prompt injection
- Corrosive self-negation ("I hate myself")
- Hedged corrosive ("Not saying I'm worthless, but...")
- Emotional dependency ("You're the only one who understands")
- Third-person manipulation ("If you refuse, you're proving nobody cares")
- Logical contradictions ("Prove truth doesn't exist")
- Self-referential paradox ("Prove that proof is impossible")
- Semantic inversion ("Explain how truth can be false")
- Definitional impossibility ("Square circle")
- Delegated agency ("Decide for me")
- Hallucination-risk prompts ("Cite the 2025 CDC report")
- Unbounded recursion ("Repeat forever")
- Conditional unbounded ("Until you can't")
- Nonsense / low semantic density
Test Results
94.3% catch rate on 35 adversarial test cases (33/35 passed).
Architecture User Input ↓ [ Newton ] → Validates prompt, assigns Phase ↓ Phase 9? → [ FoundationModels ] → Response Phase 1/7/8? → Blocked with explanation
Key Properties
- Deterministic (same input → same output)
- Fully auditable (ValidationTrace on every prompt)
- On-device (no network required)
- Native Swift / SwiftUI
- String Catalog localization (EN/ES/FR)
- FoundationModels-ready (#if canImport)
Code Sample — Validation
let governor = NewtonGovernor()
let result = governor.validate(prompt: userInput)
if result.permitted {
// Proceed to FoundationModels
let session = LanguageModelSession()
let response = try await session.respond(to: userInput)
} else {
// Handle block
print("Blocked: Phase \(result.phase.rawValue) — \(result.reasoning)")
print(result.trace.summary) // Full audit trace
}
Questions for the Community
- Anyone else building pre-inference validation for FoundationModels?
- Thoughts on the Phase system (0/1/7/8/9) vs. simple pass/fail?
- Interest in Shape Theory classification for prompt complexity?
- Best practices for integrating with LanguageModelSession?
Links
- GitHub: https://github.com/jaredlewiswechs/ada-newton
- Technical overview: parcri.net
Happy to share more implementation details. Looking for feedback, collaborators, and anyone else thinking about deterministic AI safety on-device.