Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

FoundationModels guardrailViolation on Beta 3
Hello everybody! I’m encountering an unexpected guardrailViolation error when using Foundation Models on macOS Beta 3 (Tahoe) with an Apple M2 Pro chip. This issue didn’t occur on Beta 1 or Beta 2 using the same codebase. Reproduction Context I’m developing an app that leverages Foundation Models for structured generation, paired with a local database tool. After upgrading to macOS Beta 3, I started receiving this error consistently, despite no changes in the generation logic. To isolate the issue, I opened the official WWDC sample project from the Adding intelligent app features with generative models and the same guardrailViolation error appeared without any modifications. Simplified Working Example I attempted to narrow down the issue by starting with a minimal prompt structure. This basic case works fine: import Foundation import Playgrounds import FoundationModels @Generable struct GeneableLandmark { @Guide(description: "Name of the landmark to visit") var name: String } final class LandmarkSuggestionGenerator { var landmarkSuggestion: GeneableLandmark.PartiallyGenerated? private var session: LanguageModelSession init(){ self.session = LanguageModelSession( instructions: Instructions { """ generate a list of landmarks to visit """ } ) } func createLandmarkSuggestion(location: String) async throws { let stream = session.streamResponse( generating: GeneableLandmark.self, options: GenerationOptions(sampling: .greedy), includeSchemaInPrompt: false ) { """ Generate a list of landmarks to viist in \(location) """ } for try await partialResponse in stream { landmarkSuggestion = partialResponse } } } #Playground { let generator = LandmarkSuggestionGenerator() Task { do { try await generator.createLandmarkSuggestion(location: "New york") if let suggestion = generator.landmarkSuggestion { print("Suggested landmark: \(suggestion)") } else { print("No suggestion generated.") } } catch { print("Error generating landmark suggestion: \(error)") } } } But as soon as I use the Sample ItineraryPlanner: #Playground { // Example landmark for demonstration let exampleLandmark = Landmark( id: 1, name: "San Francisco", continent: "North America", description: "A vibrant city by the bay known for the Golden Gate Bridge.", shortDescription: "Iconic Californian city.", latitude: 37.7749, longitude: -122.4194, span: 0.2, placeID: nil ) let planner = ItineraryPlanner(landmark: exampleLandmark) Task { do { try await planner.suggestItinerary(dayCount: 3) if let itinerary = planner.itinerary { print("Suggested itinerary: \(itinerary)") } else { print("No itinerary generated.") } } catch { print("Error generating itinerary: \(error)") } } } The error pops up: Multiline Error generating itinerary: guardrailViolation(FoundationModels.LanguageModelSession. >GenerationError.Context(debug Description: "May contain sensitive or unsafe content", >underlyingErrors: [FoundationModels. LanguageModelSession. Gene >rationError.guardrailViolation(FoundationMo dels. >LanguageModelSession.GenerationError.C ontext (debugDescription: >"May contain unsafe content", underlyingErrors: []))])) Based on my tests: The error may not be tied to structure complexity (since more nested structures work) The issue may stem from the tools or prompt content used inside the ItineraryPlanner The guardrail sensitivity may have increased or changed in Beta 3, affecting models that worked in earlier betas Thank you in advance for your help. Let me know if more details or reproducible code samples are needed - I’m happy to provide them. Best, Sasha Morozov
2
1
418
Jul ’25
WWDC25 combining metal and ML
WWDC25: Combine Metal 4 machine learning and graphics Demonstrated a way to combine neural network in the graphics pipeline directly through the shaders, using an example of Texture Compression. However there is no mention of using which ML technique texture is compressed. Can anyone point me to some well known model/s for this particular use case shown in WWDC25.
2
0
487
Jul ’25
ModelManager received unentitled request. Expected entitlement com.apple.modelmanager.inference
Just tried to write a very simple test of using foundation models, but it gave me the error like this "ModelManager received unentitled request. Expected entitlement com.apple.modelmanager.inference establishment of session failed with Missing entitlement: com.apple.modelmanager.inference" The simple code is listed below: let session: LanguageModelSession = LanguageModelSession() let response = try? await session.respond(to: "What is the capital of France?") print("Response: (response)") So what's the problem of this one?
2
0
265
Jul ’25
Foundation Model - Change LLM
Almost everywhere else you see Apple Intelligence, you get to select whether it's on device, private cloud compute, or ChatGPT. Is there a way to do that via code in the Foundation Model? I searched through the docs and couldn't find anything, but maybe I missed it.
2
1
167
Jul ’25
Problem running NLContextualEmbeddingModel in simulator
Environment MacOC 26 Xcode Version 26.0 beta 7 (17A5305k) simulator: iPhone 16 pro iOS: iOS 26 Problem NLContextualEmbedding.load() fails with the following error In simulator Failed to load embedding from MIL representation: filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] Failed to load embedding model 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289' assetRequestFailed(Optional(Error Domain=NLNaturalLanguageErrorDomain Code=7 "Embedding model requires compilation" UserInfo={NSLocalizedDescription=Embedding model requires compilation})) in #Playground I'm new to this embedding model. Not sure if it's caused by my code or environment. Code snippet import Foundation import NaturalLanguage import Playgrounds #Playground { // Prefer initializing by script for broader coverage; returns NLContextualEmbedding? guard let embeddingModel = NLContextualEmbedding(script: .latin) else { print("Failed to create NLContextualEmbedding") return } print(embeddingModel.hasAvailableAssets) do { try embeddingModel.load() print("Model loaded") } catch { print("Failed to load model: \(error)") } }
2
2
1.3k
Jan ’26
Setting Required Capabilities for Foundation Models
Is there any way to ensure iOS apps we develop using Foundation Models can only be purchasable/downloadable on App Store by folks with capable devices? I would've thought there would be a Required Capabilities that App Store would hook into, but I don't seem to see it in the documentation here: https://developer.apple.com/documentation/bundleresources/information-property-list/uirequireddevicecapabilities The closest seems to be iphone-performance-gaming-tier as that seems to target all M1 and above chips on iPhone & iPad. There is an ipad-minimum-performance-m1 that would more reasonably seem to ensure Foundation Models is likely available, but that doesn't help with iPhone. So far, it seems the only path would be to set Minimum Deployment to iOS 26 and add iphone-performance-gaming-tier as a required capability, but I'm a bit worried that capability might diverge in the future from what's Foundation Model / Apple Intelligence capable. While I understand for the majority of apps they'll want to just selectively add in Apple Intelligence features and so can be usable by folks whose devices don't support it, the app experience I'm building doesn't make sense without the Foundation Models being available and I'd rather not have a large number of users downloading the app to be told "Sorry, you're not Apple Intelligence capable"
2
2
262
Aug ’25
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
2
0
440
1w
A specific mlmodelc model runs on iPhone 15, but not on iPhone 16
As we described on the title, the model that I have built completely works on iPhone 15 / A16 Bionic, on the other hand it does not run on iPhone 16 / A18 chip with the following error message. E5RT encountered an STL exception. msg = MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED. E5RT: MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED (11) It consumes 1.5 ~ 1.6 GB RAM on the loading the model, then the consumption is decreased to less than 100MB on the both of iPhone 15 and 16. After that, only on iPhone 16, the above error is shown on the Xcode log, the memory consumption is surged to 5 to 6GB, and the system kills the app. It works well only on iPhone 15. This model is built with the Core ML tools. Until now, I have tried the target iOS 16 to 18 and the compute units of CPU_AND_NE and ALL. But any ways have not solved this issue. Eventually, what kindof fix should I do? minimum_deployment_target = ct.target.iOS18 compute_units = ct.ComputeUnit.ALL compute_precision = ct.precision.FLOAT16
2
0
224
May ’25
How can I change the output dimensions of a CoreML model in Xcode when the outputs come from a NonMaximumSuppression layer?
After exerting a custom model with nms=True. In Xcode, the outputs show as: confidence: MultiArray (0 × 5) coordinates: MultiArray (0 × 4) I want to set fixed shapes (e.g., 100 × 5, 100 × 4), but Xcode does not allow editing—the shape fields are locked. The model graph shows both outputs come directly from a NonMaximumSuppression layer. Is it possible to set fixed output dimensions for NMS outputs in CoreML?
2
0
217
3w
Downloading my fine tuned model from huggingface
I have used mlx_lm.lora to fine tune a mistral-7b-v0.3-4bit model with my data. I fused the mistral model with my adapters and upload the fused model to my directory on huggingface. I was able to use mlx_lm.generate to use the fused model in Terminal. However, I don't know how to load the model in Swift. I've used Imports import SwiftUI import MLX import MLXLMCommon import MLXLLM let modelFactory = LLMModelFactory.shared let configuration = ModelConfiguration( id: "pharmpk/pk-mistral-7b-v0.3-4bit" ) // Load the model off the main actor, then assign on the main actor let loaded = try await modelFactory.loadContainer(configuration: configuration) { progress in print("Downloading progress: \(progress.fractionCompleted * 100)%") } await MainActor.run { self.model = loaded } I'm getting an error runModel error: downloadError("A server with the specified hostname could not be found.") Any suggestions? Thanks, David PS, I can load the model from the app bundle // directory: Bundle.main.resourceURL! but it's too big to upload for Testflight
1
0
557
Oct ’25
no tensorflow-metal past tf 2.18?
Hi We're on tensorflow 2.20 that has support now for python 3.13 (finally!). tensorflow-metal is still only supporting 2.18 which is over a year old. When can we expect to see support in tensorflow-metal for tf 2.20 (or later!) ? I bought a mac thinking I would be able to get great performance from the M processors but here I am using my CPU for my ML projects. If it's taking so long to release it, why not open source it so the community can keep it more up to date? cheers Matt
1
1
434
Nov ’25
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
1
0
54
7h
Is it possible to create a virtual NPU device on macOS using Hypervisor.framework + CoreML?
Is it possible to expose a custom VirtIO device to a Linux guest running inside a VM — likely using QEMU backed by Hypervisor.framework. The guest would see this device as something like /dev/npu0, and it would use a kernel driver + userspace library to submit inference requests. On the macOS host, these requests would be executed using CoreML, MPSGraph, or BNNS. The results would be passed back to the guest via IPC. Does the macOS allow this kind of "fake" NPU / GPU
1
0
441
Aug ’25
What's the best way to load adapters to try?
I'm new to Swift and was hoping the Playground would support loading adaptors. When I tried, I got a permissions error - thinking it's because it's not in the project and Playgrounds don't like going outside the project? A tutorial and some sample code would be helpful. Also some benchmarks on how long it's expected to take. Selfishly I'm on an M2 Mac Mini.
1
0
304
Jul ’25
KV-Cache MLState Not Updating During Prefill Stage in Core ML LLM Inference
Hello, I'm running a large language model (LLM) in Core ML that uses a key-value cache (KV-cache) to store past attention states. The model was converted from PyTorch using coremltools and deployed on-device with Swift. The KV-cache is exposed via MLState and is used across inference steps for efficient autoregressive generation. During the prefill stage — where a prompt of multiple tokens is passed to the model in a single batch to initialize the KV-cache — I’ve noticed that some entries in the KV-cache are not updated after the inference. Specifically: Here are a few details about the setup: The MLState returned by the model is identical to the input state (often empty or zero-initialized) for some tokens in the batch. The issue only happens during the prefill stage (i.e., first call over multiple tokens). During decoding (single-token generation), the KV-cache updates normally. The model is invoked using MLModel.prediction(from:using:options:) for each batch. I’ve confirmed: The prompt tokens are non-repetitive and not masked. The model spec has MLState inputs/outputs correctly configured for KV-cache tensors. Each token is processed in a loop with the correct positional encodings. Questions: Is there any known behavior in Core ML that could prevent MLState from updating during batched or prefill inference? Could this be caused by internal optimizations such as lazy execution, static masking, or zero-value short-circuiting? How can I confirm that each token in the batch is contributing to the KV-cache during prefill? Any insights from the Core ML or LLM deployment community would be much appreciated.
1
0
269
May ’25
jax-metal failing due to incompatibility with jax 0.5.1 or later.
Hello, I am interested in using jax-metal to train ML models using Apple Silicon. I understand this is experimental. After installing jax-metal according to https://developer.apple.com/metal/jax/, my python code fails with the following error JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22 -:0:0: note: in bytecode version 6 produced by: StableHLO_v1.12.1 My issue is identical to the one reported here https://github.com/jax-ml/jax/issues/26968#issuecomment-2733120325, and is fixed by pinning to jax-metal 0.1.1., jax 0.5.0 and jaxlib 0.5.0. Thank you!
1
0
870
Feb ’26
Is MCP (Model Context Protocol) supported on iOS/macOS?
Hi team, I’m exploring the Model Context Protocol (MCP), which is used to connect LLMs/AI agents to external tools in a structured way. It's becoming a common standard for automation and agent workflows. Before I go deeper, I want to confirm: Does Apple currently provide any official MCP server, API surface, or SDK on iOS/macOS? From what I see, only third-party MCP servers exist for iOS simulators/devices, and Apple’s own frameworks (Foundation Models, Apple Intelligence) don’t expose MCP endpoints. Is there any chance Apple might introduce MCP support—or publish recommended patterns for safely integrating MCP inside apps or developer tools? I would like to see if I can share my app's data to the MCP server to enable other third-party apps/services to integrate easily
1
1
499
Dec ’25
FoundationModels guardrailViolation on Beta 3
Hello everybody! I’m encountering an unexpected guardrailViolation error when using Foundation Models on macOS Beta 3 (Tahoe) with an Apple M2 Pro chip. This issue didn’t occur on Beta 1 or Beta 2 using the same codebase. Reproduction Context I’m developing an app that leverages Foundation Models for structured generation, paired with a local database tool. After upgrading to macOS Beta 3, I started receiving this error consistently, despite no changes in the generation logic. To isolate the issue, I opened the official WWDC sample project from the Adding intelligent app features with generative models and the same guardrailViolation error appeared without any modifications. Simplified Working Example I attempted to narrow down the issue by starting with a minimal prompt structure. This basic case works fine: import Foundation import Playgrounds import FoundationModels @Generable struct GeneableLandmark { @Guide(description: "Name of the landmark to visit") var name: String } final class LandmarkSuggestionGenerator { var landmarkSuggestion: GeneableLandmark.PartiallyGenerated? private var session: LanguageModelSession init(){ self.session = LanguageModelSession( instructions: Instructions { """ generate a list of landmarks to visit """ } ) } func createLandmarkSuggestion(location: String) async throws { let stream = session.streamResponse( generating: GeneableLandmark.self, options: GenerationOptions(sampling: .greedy), includeSchemaInPrompt: false ) { """ Generate a list of landmarks to viist in \(location) """ } for try await partialResponse in stream { landmarkSuggestion = partialResponse } } } #Playground { let generator = LandmarkSuggestionGenerator() Task { do { try await generator.createLandmarkSuggestion(location: "New york") if let suggestion = generator.landmarkSuggestion { print("Suggested landmark: \(suggestion)") } else { print("No suggestion generated.") } } catch { print("Error generating landmark suggestion: \(error)") } } } But as soon as I use the Sample ItineraryPlanner: #Playground { // Example landmark for demonstration let exampleLandmark = Landmark( id: 1, name: "San Francisco", continent: "North America", description: "A vibrant city by the bay known for the Golden Gate Bridge.", shortDescription: "Iconic Californian city.", latitude: 37.7749, longitude: -122.4194, span: 0.2, placeID: nil ) let planner = ItineraryPlanner(landmark: exampleLandmark) Task { do { try await planner.suggestItinerary(dayCount: 3) if let itinerary = planner.itinerary { print("Suggested itinerary: \(itinerary)") } else { print("No itinerary generated.") } } catch { print("Error generating itinerary: \(error)") } } } The error pops up: Multiline Error generating itinerary: guardrailViolation(FoundationModels.LanguageModelSession. >GenerationError.Context(debug Description: "May contain sensitive or unsafe content", >underlyingErrors: [FoundationModels. LanguageModelSession. Gene >rationError.guardrailViolation(FoundationMo dels. >LanguageModelSession.GenerationError.C ontext (debugDescription: >"May contain unsafe content", underlyingErrors: []))])) Based on my tests: The error may not be tied to structure complexity (since more nested structures work) The issue may stem from the tools or prompt content used inside the ItineraryPlanner The guardrail sensitivity may have increased or changed in Beta 3, affecting models that worked in earlier betas Thank you in advance for your help. Let me know if more details or reproducible code samples are needed - I’m happy to provide them. Best, Sasha Morozov
Replies
2
Boosts
1
Views
418
Activity
Jul ’25
WWDC25 combining metal and ML
WWDC25: Combine Metal 4 machine learning and graphics Demonstrated a way to combine neural network in the graphics pipeline directly through the shaders, using an example of Texture Compression. However there is no mention of using which ML technique texture is compressed. Can anyone point me to some well known model/s for this particular use case shown in WWDC25.
Replies
2
Boosts
0
Views
487
Activity
Jul ’25
ModelManager received unentitled request. Expected entitlement com.apple.modelmanager.inference
Just tried to write a very simple test of using foundation models, but it gave me the error like this "ModelManager received unentitled request. Expected entitlement com.apple.modelmanager.inference establishment of session failed with Missing entitlement: com.apple.modelmanager.inference" The simple code is listed below: let session: LanguageModelSession = LanguageModelSession() let response = try? await session.respond(to: "What is the capital of France?") print("Response: (response)") So what's the problem of this one?
Replies
2
Boosts
0
Views
265
Activity
Jul ’25
Album segmentation model
I have a question. In China, long pressing a picture in the album can segment the target. Is this model a local model? Is there any information? Can developers use it?
Replies
2
Boosts
0
Views
311
Activity
Jul ’25
Foundation Model - Change LLM
Almost everywhere else you see Apple Intelligence, you get to select whether it's on device, private cloud compute, or ChatGPT. Is there a way to do that via code in the Foundation Model? I searched through the docs and couldn't find anything, but maybe I missed it.
Replies
2
Boosts
1
Views
167
Activity
Jul ’25
Problem running NLContextualEmbeddingModel in simulator
Environment MacOC 26 Xcode Version 26.0 beta 7 (17A5305k) simulator: iPhone 16 pro iOS: iOS 26 Problem NLContextualEmbedding.load() fails with the following error In simulator Failed to load embedding from MIL representation: filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] filesystem error: in create_directories: Permission denied ["/var/db/com.apple.naturallanguaged/com.apple.e5rt.e5bundlecache"] Failed to load embedding model 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289' assetRequestFailed(Optional(Error Domain=NLNaturalLanguageErrorDomain Code=7 "Embedding model requires compilation" UserInfo={NSLocalizedDescription=Embedding model requires compilation})) in #Playground I'm new to this embedding model. Not sure if it's caused by my code or environment. Code snippet import Foundation import NaturalLanguage import Playgrounds #Playground { // Prefer initializing by script for broader coverage; returns NLContextualEmbedding? guard let embeddingModel = NLContextualEmbedding(script: .latin) else { print("Failed to create NLContextualEmbedding") return } print(embeddingModel.hasAvailableAssets) do { try embeddingModel.load() print("Model loaded") } catch { print("Failed to load model: \(error)") } }
Replies
2
Boosts
2
Views
1.3k
Activity
Jan ’26
Setting Required Capabilities for Foundation Models
Is there any way to ensure iOS apps we develop using Foundation Models can only be purchasable/downloadable on App Store by folks with capable devices? I would've thought there would be a Required Capabilities that App Store would hook into, but I don't seem to see it in the documentation here: https://developer.apple.com/documentation/bundleresources/information-property-list/uirequireddevicecapabilities The closest seems to be iphone-performance-gaming-tier as that seems to target all M1 and above chips on iPhone & iPad. There is an ipad-minimum-performance-m1 that would more reasonably seem to ensure Foundation Models is likely available, but that doesn't help with iPhone. So far, it seems the only path would be to set Minimum Deployment to iOS 26 and add iphone-performance-gaming-tier as a required capability, but I'm a bit worried that capability might diverge in the future from what's Foundation Model / Apple Intelligence capable. While I understand for the majority of apps they'll want to just selectively add in Apple Intelligence features and so can be usable by folks whose devices don't support it, the app experience I'm building doesn't make sense without the Foundation Models being available and I'd rather not have a large number of users downloading the app to be told "Sorry, you're not Apple Intelligence capable"
Replies
2
Boosts
2
Views
262
Activity
Aug ’25
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
Replies
2
Boosts
0
Views
440
Activity
1w
A specific mlmodelc model runs on iPhone 15, but not on iPhone 16
As we described on the title, the model that I have built completely works on iPhone 15 / A16 Bionic, on the other hand it does not run on iPhone 16 / A18 chip with the following error message. E5RT encountered an STL exception. msg = MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED. E5RT: MILCompilerForANE error: failed to compile ANE model using ANEF. Error=_ANECompiler : ANECCompile() FAILED (11) It consumes 1.5 ~ 1.6 GB RAM on the loading the model, then the consumption is decreased to less than 100MB on the both of iPhone 15 and 16. After that, only on iPhone 16, the above error is shown on the Xcode log, the memory consumption is surged to 5 to 6GB, and the system kills the app. It works well only on iPhone 15. This model is built with the Core ML tools. Until now, I have tried the target iOS 16 to 18 and the compute units of CPU_AND_NE and ALL. But any ways have not solved this issue. Eventually, what kindof fix should I do? minimum_deployment_target = ct.target.iOS18 compute_units = ct.ComputeUnit.ALL compute_precision = ct.precision.FLOAT16
Replies
2
Boosts
0
Views
224
Activity
May ’25
How can I change the output dimensions of a CoreML model in Xcode when the outputs come from a NonMaximumSuppression layer?
After exerting a custom model with nms=True. In Xcode, the outputs show as: confidence: MultiArray (0 × 5) coordinates: MultiArray (0 × 4) I want to set fixed shapes (e.g., 100 × 5, 100 × 4), but Xcode does not allow editing—the shape fields are locked. The model graph shows both outputs come directly from a NonMaximumSuppression layer. Is it possible to set fixed output dimensions for NMS outputs in CoreML?
Replies
2
Boosts
0
Views
217
Activity
3w
Downloading my fine tuned model from huggingface
I have used mlx_lm.lora to fine tune a mistral-7b-v0.3-4bit model with my data. I fused the mistral model with my adapters and upload the fused model to my directory on huggingface. I was able to use mlx_lm.generate to use the fused model in Terminal. However, I don't know how to load the model in Swift. I've used Imports import SwiftUI import MLX import MLXLMCommon import MLXLLM let modelFactory = LLMModelFactory.shared let configuration = ModelConfiguration( id: "pharmpk/pk-mistral-7b-v0.3-4bit" ) // Load the model off the main actor, then assign on the main actor let loaded = try await modelFactory.loadContainer(configuration: configuration) { progress in print("Downloading progress: \(progress.fractionCompleted * 100)%") } await MainActor.run { self.model = loaded } I'm getting an error runModel error: downloadError("A server with the specified hostname could not be found.") Any suggestions? Thanks, David PS, I can load the model from the app bundle // directory: Bundle.main.resourceURL! but it's too big to upload for Testflight
Replies
1
Boosts
0
Views
557
Activity
Oct ’25
no tensorflow-metal past tf 2.18?
Hi We're on tensorflow 2.20 that has support now for python 3.13 (finally!). tensorflow-metal is still only supporting 2.18 which is over a year old. When can we expect to see support in tensorflow-metal for tf 2.20 (or later!) ? I bought a mac thinking I would be able to get great performance from the M processors but here I am using my CPU for my ML projects. If it's taking so long to release it, why not open source it so the community can keep it more up to date? cheers Matt
Replies
1
Boosts
1
Views
434
Activity
Nov ’25
Download the Foundation Models Adaptor Training Toolkit
Download the Foundation Models Adaptor Training Toolkit Hi, after I clicked on the download button, I was redirected to this page https://developer.apple.com and did not download the toolkit.
Replies
1
Boosts
0
Views
475
Activity
Jul ’25
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
Replies
1
Boosts
0
Views
54
Activity
7h
Is it possible to create a virtual NPU device on macOS using Hypervisor.framework + CoreML?
Is it possible to expose a custom VirtIO device to a Linux guest running inside a VM — likely using QEMU backed by Hypervisor.framework. The guest would see this device as something like /dev/npu0, and it would use a kernel driver + userspace library to submit inference requests. On the macOS host, these requests would be executed using CoreML, MPSGraph, or BNNS. The results would be passed back to the guest via IPC. Does the macOS allow this kind of "fake" NPU / GPU
Replies
1
Boosts
0
Views
441
Activity
Aug ’25
What's the best way to load adapters to try?
I'm new to Swift and was hoping the Playground would support loading adaptors. When I tried, I got a permissions error - thinking it's because it's not in the project and Playgrounds don't like going outside the project? A tutorial and some sample code would be helpful. Also some benchmarks on how long it's expected to take. Selfishly I'm on an M2 Mac Mini.
Replies
1
Boosts
0
Views
304
Activity
Jul ’25
KV-Cache MLState Not Updating During Prefill Stage in Core ML LLM Inference
Hello, I'm running a large language model (LLM) in Core ML that uses a key-value cache (KV-cache) to store past attention states. The model was converted from PyTorch using coremltools and deployed on-device with Swift. The KV-cache is exposed via MLState and is used across inference steps for efficient autoregressive generation. During the prefill stage — where a prompt of multiple tokens is passed to the model in a single batch to initialize the KV-cache — I’ve noticed that some entries in the KV-cache are not updated after the inference. Specifically: Here are a few details about the setup: The MLState returned by the model is identical to the input state (often empty or zero-initialized) for some tokens in the batch. The issue only happens during the prefill stage (i.e., first call over multiple tokens). During decoding (single-token generation), the KV-cache updates normally. The model is invoked using MLModel.prediction(from:using:options:) for each batch. I’ve confirmed: The prompt tokens are non-repetitive and not masked. The model spec has MLState inputs/outputs correctly configured for KV-cache tensors. Each token is processed in a loop with the correct positional encodings. Questions: Is there any known behavior in Core ML that could prevent MLState from updating during batched or prefill inference? Could this be caused by internal optimizations such as lazy execution, static masking, or zero-value short-circuiting? How can I confirm that each token in the batch is contributing to the KV-cache during prefill? Any insights from the Core ML or LLM deployment community would be much appreciated.
Replies
1
Boosts
0
Views
269
Activity
May ’25
Create ML training data gives unreadable error message
Can't import data in create ML word tagging project training data is 100% correct I guarantee it: I mean look this one has one entry in it. [ { "tokens": [ "a", "august", "gruters" ], "labels": [ "BUILDER", "BUILDER", "BUILDER" ] } ]
Replies
1
Boosts
0
Views
186
Activity
Mar ’25
jax-metal failing due to incompatibility with jax 0.5.1 or later.
Hello, I am interested in using jax-metal to train ML models using Apple Silicon. I understand this is experimental. After installing jax-metal according to https://developer.apple.com/metal/jax/, my python code fails with the following error JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22 -:0:0: note: in bytecode version 6 produced by: StableHLO_v1.12.1 My issue is identical to the one reported here https://github.com/jax-ml/jax/issues/26968#issuecomment-2733120325, and is fixed by pinning to jax-metal 0.1.1., jax 0.5.0 and jaxlib 0.5.0. Thank you!
Replies
1
Boosts
0
Views
870
Activity
Feb ’26
Is MCP (Model Context Protocol) supported on iOS/macOS?
Hi team, I’m exploring the Model Context Protocol (MCP), which is used to connect LLMs/AI agents to external tools in a structured way. It's becoming a common standard for automation and agent workflows. Before I go deeper, I want to confirm: Does Apple currently provide any official MCP server, API surface, or SDK on iOS/macOS? From what I see, only third-party MCP servers exist for iOS simulators/devices, and Apple’s own frameworks (Foundation Models, Apple Intelligence) don’t expose MCP endpoints. Is there any chance Apple might introduce MCP support—or publish recommended patterns for safely integrating MCP inside apps or developer tools? I would like to see if I can share my app's data to the MCP server to enable other third-party apps/services to integrate easily
Replies
1
Boosts
1
Views
499
Activity
Dec ’25