Apple Intelligence

RSS for tag

Apple Intelligence is the personal intelligence system that puts powerful generative models right at the core of your iPhone, iPad, and Mac and powers incredible new features to help users communicate, work, and express themselves.

Posts under Apple Intelligence subtopic

Post

Replies

Boosts

Views

Activity

Accessibility & Inclusion
We are developing Apple AI for foreign markets and adapting it for iPhone models 17 and above. When the system language and Siri language are not the same—for example, if the system is in English and Siri is in Chinese—it can cause a situation where Apple AI cannot be used. So, may I ask if there are any other reasons that could cause Apple AI to be unavailable within the app, even if it has been enabled?
0
0
498
Dec ’25
Image Playground files suddenly not available
My app lets you create images with Image Playground. When the user approves an image I move it to the documents dir from the temp storage. With over a year of usage I’ve created a lot of images over time. Out of nowhere the app stopped loading my custom creations from Image Playground saying it couldn’t find the files. It still had my VoiceOver strings I had added for each image and still had the custom categories I assigned them. Debug code to look in the docs dir doesn’t find them. I downloaded the app’s container and only see the images I created as a test after the problem started. But my ~70MB app is still taking up 300MB on my iPhone so it feels like they’re there but not accessible. Is there anything else I can try?
2
0
937
Jan ’26
Siri not calling my INExtension
Things I did: created an Intents Extension target added "Supported Intents" to both my main app target and the intent extension, with "INAddTasksIntent" and "INCreateNoteIntent" created the AppIntentVocabulary in my main app target created the handlers in the code in the Intents Extension target class AddTaskIntentHandler: INExtension, INAddTasksIntentHandling { func resolveTaskTitles(for intent: INAddTasksIntent) async -> [INSpeakableStringResolutionResult] { if let taskTitles = intent.taskTitles { return taskTitles.map { INSpeakableStringResolutionResult.success(with: $0) } } else { return [INSpeakableStringResolutionResult.needsValue()] } } func handle(intent: INAddTasksIntent) async -> INAddTasksIntentResponse { // my code to handle this... let response = INAddTasksIntentResponse(code: .success, userActivity: nil) response.addedTasks = tasksCreated.map { INTask( title: INSpeakableString(spokenPhrase: $0.name), status: .notCompleted, taskType: .completable, spatialEventTrigger: nil, temporalEventTrigger: intent.temporalEventTrigger, createdDateComponents: DateHelper.localCalendar().dateComponents([.year, .month, .day, .minute, .hour], from: Date.now), modifiedDateComponents: nil, identifier: $0.id ) } return response } } class AddItemIntentHandler: INExtension, INCreateNoteIntentHandling { func resolveTitle(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let title = intent.title { return INSpeakableStringResolutionResult.success(with: title) } else { return INSpeakableStringResolutionResult.needsValue() } } func resolveGroupName(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let groupName = intent.groupName { return INSpeakableStringResolutionResult.success(with: groupName) } else { return INSpeakableStringResolutionResult.needsValue() } } func handle(intent: INCreateNoteIntent) async -> INCreateNoteIntentResponse { do { // my code for handling this... let response = INCreateNoteIntentResponse(code: .success, userActivity: nil) response.createdNote = INNote( title: INSpeakableString(spokenPhrase: itemName), contents: itemNote.map { [INTextNoteContent(text: $0)] } ?? [], groupName: INSpeakableString(spokenPhrase: list.name), createdDateComponents: DateHelper.localCalendar().dateComponents([.day, .month, .year, .hour, .minute], from: Date.now), modifiedDateComponents: nil, identifier: newItem.id ) return response } catch { return INCreateNoteIntentResponse(code: .failure, userActivity: nil) } } } uninstalled my app restarted my physical device and simulator Yet, when I say "Remind me to buy dog food in Index" (Index is the name of my app), as stated in the examples of INAddTasksIntent, Siri proceeds to say that a list named "Index" doesn't exist in apple Reminders app, instead of processing the request in my app. Am I missing something?
2
0
129
5h
Parallel/Steam processing of Apple Intelligence
I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance. Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance. Any suggestions would be welcome. This could also be considered a request for a future enhancement.
2
0
184
3w
CoreML Instrument Testing Native Clawbot using FM.SyML & OAIC & Diffusion
After running performance test on my CoreML qwen3 vision, I appreciated the update where results were viewable... ON Mac it mentions Ios18 and im not sure if or how to change.. that bottle neck lead to rebuilding CoreML view. I woke up and realized I have all the pieces together... and ended up with a swift package working demo of Clawbot.. the current issue is Im trying to use gguf 3b to code it.. I have become well aware that everything I create using the big models, they soon become the default themes /layouts for everyone else simply asking for this or that (I appoligise) so here I am asking (while looking to schedule meet with dev) if its possible to speak with anyone about th 1000s of Apple Intelligence PCC, Xcode, and vision reports and feedback ive sent , in terms of just general ways I can work more efficiently without the crash... ive already build a TUI for MLX but the tools for coreML while seems promising are not intuitive, but the vision format instruction was nice to see. Anyway my question is:
0
0
80
3w
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
2
0
552
1w
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
2
0
436
1w
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
0
0
513
1w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
328
2d
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
0
0
297
2d
Unified Use Case Mail Categories & Spam
Hi Apple product owners. I am missing a unified concept which might be derived from the use cases for mail categories and mail spam for the app "Mail" on Mac. I need a recommendation on how to use categories in combination with the spam filter to get most out of it. So I was looking for the use cases for the 2 functionality areas in order to figure out how to organise my mails by using as much automation as possible before I start creating intelligent folders in addition. What can you recommend where I get this information from? I don't want to guess or read a lot of forum contributions which are based on guesses.
1
0
97
Apr ’25
AppIntentsSampleApp Failed to refresh AppShortcut parameters
I've been struggling and Siri support to an application. I have developed it kept getting this error when I run it on MacOS: Failed to refresh AppShortcut parameters with error: Error Domain=Foundation._GenericObjCError Code=0 "(null)" So I found AppIntentsSampleApp and downloaded and buil it and I get a similar, but larger, error: Failed to refresh AppShortcut parameters with error: Error Domain=RBSServiceErrorDomain Code=1 "(originator doesn't have entitlement com.apple.private.xpc.launchd.app-server AND originator doesn't have entitlement com.apple.assertiond.system-shell AND originator doesn't have entitlement com.apple.runningboard.launchprocess)" UserInfo={NSLocalizedFailureReason=(originator doesn't have entitlement com.apple.private.xpc.launchd.app-server AND originator doesn't have entitlement com.apple.assertiond.system-shell AND j And it goes on and on. What am I missing? I'm using Xcode 16. I don't see an option to add a Siri framework. I have tried adding both the intent and tap, intent frameworks, which does not seem to make a difference.
2
0
170
Apr ’25
Will Apple Intelligence Support Third-Party LLMs or Custom AI Agent Integrations?
Hi everyone, I’m an AI engineer working on autonomous AI agents and exploring ways to integrate them into the Apple ecosystem, especially via Siri and Apple Intelligence. I was impressed by Apple’s integration of ChatGPT and its privacy-first design, but I’m curious to know: • Are there plans to support third-party LLMs? • Could Siri or Apple Intelligence call external AI agents or allow extensions to plug in alternative models for reasoning, scheduling, or proactive suggestions? I’m particularly interested in building event-driven, voice-triggered workflows where Apple Intelligence could act as a front-end for more complex autonomous systems (possibly local or cloud-based). This kind of extensibility would open up incredible opportunities for personalized, privacy-friendly use cases — while aligning with Apple’s system architecture. Is anything like this on the roadmap? Or is there a suggested way to prototype such integrations today? Thanks in advance for any thoughts or pointers!
4
0
530
May ’25
Ways I can leverage AI when the user asks Siri, "What does this word mean"
I'm the creator of an app that helps users learn Arabic. Inside of the app users can save words, engage in lessons specific to certain grammar concepts etc. I'm looking for a way for Siri to 'suggest' my app when the user asks to define any Arabic words. There are other questions that I would like for Siri to suggest my app for, but I figure that's a good start. What framework am I looking for here? I think AppItents? I remember I played with it for a bit last year but didn't get far. Any suggestions would be great. Would the new Foundations model be any help here?
2
0
146
Jun ’25
App Shortcuts Limit (10 per app) — Can This Be Increased?
Hi Apple team, When using AppShortcutsProvider, I hit the hard limit: Each app may have at most 10 App Shortcuts. This feels limiting for apps that offer multiple workflows and would benefit from deeper Siri integration. Could this cap be raised — ideally to 30 — to support broader use of AppIntents, enhance Siri automation, and unlock more system-level capabilities? AppShortcuts are a fantastic tool. Increasing the limit would make them even more powerful. Thanks!
1
0
216
Jun ’25
Accessibility & Inclusion
When the system language and Siri language are not the same, Apple AI may not be usable. For example, if the system is in English and Siri is in Chinese, it may cause Apple AI to not work. May I ask if there are other reasons why the app still cannot be used internally even after enabling Apple AI?
Replies
0
Boosts
0
Views
478
Activity
Dec ’25
Accessibility & Inclusion
We are developing Apple AI for foreign markets and adapting it for iPhone models 17 and above. When the system language and Siri language are not the same—for example, if the system is in English and Siri is in Chinese—it can cause a situation where Apple AI cannot be used. So, may I ask if there are any other reasons that could cause Apple AI to be unavailable within the app, even if it has been enabled?
Replies
0
Boosts
0
Views
498
Activity
Dec ’25
Image playground stuck
Got new iPhone Boxing Day all works bar image playground uninstalled/reinstalled turns ai on/off still stuck
Replies
1
Boosts
0
Views
508
Activity
Dec ’25
Image Playground files suddenly not available
My app lets you create images with Image Playground. When the user approves an image I move it to the documents dir from the temp storage. With over a year of usage I’ve created a lot of images over time. Out of nowhere the app stopped loading my custom creations from Image Playground saying it couldn’t find the files. It still had my VoiceOver strings I had added for each image and still had the custom categories I assigned them. Debug code to look in the docs dir doesn’t find them. I downloaded the app’s container and only see the images I created as a test after the problem started. But my ~70MB app is still taking up 300MB on my iPhone so it feels like they’re there but not accessible. Is there anything else I can try?
Replies
2
Boosts
0
Views
937
Activity
Jan ’26
Siri not calling my INExtension
Things I did: created an Intents Extension target added "Supported Intents" to both my main app target and the intent extension, with "INAddTasksIntent" and "INCreateNoteIntent" created the AppIntentVocabulary in my main app target created the handlers in the code in the Intents Extension target class AddTaskIntentHandler: INExtension, INAddTasksIntentHandling { func resolveTaskTitles(for intent: INAddTasksIntent) async -> [INSpeakableStringResolutionResult] { if let taskTitles = intent.taskTitles { return taskTitles.map { INSpeakableStringResolutionResult.success(with: $0) } } else { return [INSpeakableStringResolutionResult.needsValue()] } } func handle(intent: INAddTasksIntent) async -> INAddTasksIntentResponse { // my code to handle this... let response = INAddTasksIntentResponse(code: .success, userActivity: nil) response.addedTasks = tasksCreated.map { INTask( title: INSpeakableString(spokenPhrase: $0.name), status: .notCompleted, taskType: .completable, spatialEventTrigger: nil, temporalEventTrigger: intent.temporalEventTrigger, createdDateComponents: DateHelper.localCalendar().dateComponents([.year, .month, .day, .minute, .hour], from: Date.now), modifiedDateComponents: nil, identifier: $0.id ) } return response } } class AddItemIntentHandler: INExtension, INCreateNoteIntentHandling { func resolveTitle(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let title = intent.title { return INSpeakableStringResolutionResult.success(with: title) } else { return INSpeakableStringResolutionResult.needsValue() } } func resolveGroupName(for intent: INCreateNoteIntent) async -> INSpeakableStringResolutionResult { if let groupName = intent.groupName { return INSpeakableStringResolutionResult.success(with: groupName) } else { return INSpeakableStringResolutionResult.needsValue() } } func handle(intent: INCreateNoteIntent) async -> INCreateNoteIntentResponse { do { // my code for handling this... let response = INCreateNoteIntentResponse(code: .success, userActivity: nil) response.createdNote = INNote( title: INSpeakableString(spokenPhrase: itemName), contents: itemNote.map { [INTextNoteContent(text: $0)] } ?? [], groupName: INSpeakableString(spokenPhrase: list.name), createdDateComponents: DateHelper.localCalendar().dateComponents([.day, .month, .year, .hour, .minute], from: Date.now), modifiedDateComponents: nil, identifier: newItem.id ) return response } catch { return INCreateNoteIntentResponse(code: .failure, userActivity: nil) } } } uninstalled my app restarted my physical device and simulator Yet, when I say "Remind me to buy dog food in Index" (Index is the name of my app), as stated in the examples of INAddTasksIntent, Siri proceeds to say that a list named "Index" doesn't exist in apple Reminders app, instead of processing the request in my app. Am I missing something?
Replies
2
Boosts
0
Views
129
Activity
5h
Parallel/Steam processing of Apple Intelligence
I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance. Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance. Any suggestions would be welcome. This could also be considered a request for a future enhancement.
Replies
2
Boosts
0
Views
184
Activity
3w
CoreML Instrument Testing Native Clawbot using FM.SyML & OAIC & Diffusion
After running performance test on my CoreML qwen3 vision, I appreciated the update where results were viewable... ON Mac it mentions Ios18 and im not sure if or how to change.. that bottle neck lead to rebuilding CoreML view. I woke up and realized I have all the pieces together... and ended up with a swift package working demo of Clawbot.. the current issue is Im trying to use gguf 3b to code it.. I have become well aware that everything I create using the big models, they soon become the default themes /layouts for everyone else simply asking for this or that (I appoligise) so here I am asking (while looking to schedule meet with dev) if its possible to speak with anyone about th 1000s of Apple Intelligence PCC, Xcode, and vision reports and feedback ive sent , in terms of just general ways I can work more efficiently without the crash... ive already build a TUI for MLX but the tools for coreML while seems promising are not intuitive, but the vision format instruction was nice to see. Anyway my question is:
Replies
0
Boosts
0
Views
80
Activity
3w
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
Replies
2
Boosts
0
Views
552
Activity
1w
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
Replies
2
Boosts
0
Views
436
Activity
1w
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
Replies
0
Boosts
0
Views
513
Activity
1w
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
328
Activity
2d
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
Replies
0
Boosts
0
Views
297
Activity
2d
Unified Use Case Mail Categories & Spam
Hi Apple product owners. I am missing a unified concept which might be derived from the use cases for mail categories and mail spam for the app "Mail" on Mac. I need a recommendation on how to use categories in combination with the spam filter to get most out of it. So I was looking for the use cases for the 2 functionality areas in order to figure out how to organise my mails by using as much automation as possible before I start creating intelligent folders in addition. What can you recommend where I get this information from? I don't want to guess or read a lot of forum contributions which are based on guesses.
Replies
1
Boosts
0
Views
97
Activity
Apr ’25
AppIntentsSampleApp Failed to refresh AppShortcut parameters
I've been struggling and Siri support to an application. I have developed it kept getting this error when I run it on MacOS: Failed to refresh AppShortcut parameters with error: Error Domain=Foundation._GenericObjCError Code=0 "(null)" So I found AppIntentsSampleApp and downloaded and buil it and I get a similar, but larger, error: Failed to refresh AppShortcut parameters with error: Error Domain=RBSServiceErrorDomain Code=1 "(originator doesn't have entitlement com.apple.private.xpc.launchd.app-server AND originator doesn't have entitlement com.apple.assertiond.system-shell AND originator doesn't have entitlement com.apple.runningboard.launchprocess)" UserInfo={NSLocalizedFailureReason=(originator doesn't have entitlement com.apple.private.xpc.launchd.app-server AND originator doesn't have entitlement com.apple.assertiond.system-shell AND j And it goes on and on. What am I missing? I'm using Xcode 16. I don't see an option to add a Siri framework. I have tried adding both the intent and tap, intent frameworks, which does not seem to make a difference.
Replies
2
Boosts
0
Views
170
Activity
Apr ’25
AI ecosystem
I’m sure someone though about it already. But let’s have ecosystem, where Apple Intelligence uses your most capable (Apple) hardware at first and the cloud service as second.
Replies
1
Boosts
0
Views
220
Activity
May ’25
Will Apple Intelligence Support Third-Party LLMs or Custom AI Agent Integrations?
Hi everyone, I’m an AI engineer working on autonomous AI agents and exploring ways to integrate them into the Apple ecosystem, especially via Siri and Apple Intelligence. I was impressed by Apple’s integration of ChatGPT and its privacy-first design, but I’m curious to know: • Are there plans to support third-party LLMs? • Could Siri or Apple Intelligence call external AI agents or allow extensions to plug in alternative models for reasoning, scheduling, or proactive suggestions? I’m particularly interested in building event-driven, voice-triggered workflows where Apple Intelligence could act as a front-end for more complex autonomous systems (possibly local or cloud-based). This kind of extensibility would open up incredible opportunities for personalized, privacy-friendly use cases — while aligning with Apple’s system architecture. Is anything like this on the roadmap? Or is there a suggested way to prototype such integrations today? Thanks in advance for any thoughts or pointers!
Replies
4
Boosts
0
Views
530
Activity
May ’25
Does Apple's new foundation models include a Vision API for accessing on-device LLM capabilities?
I couldn't find information about this in the documentation. Could someone clarify if this API is available and how to access it?
Replies
2
Boosts
0
Views
226
Activity
Jun ’25
Ways I can leverage AI when the user asks Siri, "What does this word mean"
I'm the creator of an app that helps users learn Arabic. Inside of the app users can save words, engage in lessons specific to certain grammar concepts etc. I'm looking for a way for Siri to 'suggest' my app when the user asks to define any Arabic words. There are other questions that I would like for Siri to suggest my app for, but I figure that's a good start. What framework am I looking for here? I think AppItents? I remember I played with it for a bit last year but didn't get far. Any suggestions would be great. Would the new Foundations model be any help here?
Replies
2
Boosts
0
Views
146
Activity
Jun ’25
App Shortcuts Limit (10 per app) — Can This Be Increased?
Hi Apple team, When using AppShortcutsProvider, I hit the hard limit: Each app may have at most 10 App Shortcuts. This feels limiting for apps that offer multiple workflows and would benefit from deeper Siri integration. Could this cap be raised — ideally to 30 — to support broader use of AppIntents, enhance Siri automation, and unlock more system-level capabilities? AppShortcuts are a fantastic tool. Increasing the limit would make them even more powerful. Thanks!
Replies
1
Boosts
0
Views
216
Activity
Jun ’25
Determining which new features use AI/ML under the hood
iOS26 is supported by a wider range of devices than are able to run AI, e.g iPhone 12 runs iOS26, but does not support AI. How do we determine in code if AI is supported on a device ? How do we determine what features use AI under the hood ? Thanks, Steve.
Replies
1
Boosts
0
Views
170
Activity
Jun ’25