Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

The standalone Siri app and cross-surface continuity
The new standalone Siri app keeps conversation history synced via iCloud across iPhone, iPad, and Mac. Can third-party content, results, or an app's agent surface appear inside the Siri app (e.g., as referenced sources or follow-up actions), and can the user deep-link from a Siri-app result back into the originating app with state intact? Is any conversation context from the Siri app exposed to a developer's intent when an action is invoked, so the app can act with the relevant context, and what are the privacy boundaries on that? When the same action is invoked from different surfaces (in-app, system Siri, the Siri app) and across synced devices, how should developers reason about execution location and idempotency to avoid duplicate side effects?
0
0
21
2w
Foundation Models framework — the unified API for third-party cloud providers
The 2026 framework lets apps call cloud models like Claude and Gemini (or "any provider that conforms to Apple's Language Model protocol") through the same Swift API as the on-device model. What exactly must a provider implement to conform to the Language Model protocol, and can developers register a custom/self-hosted endpoint and their own API keys, or is routing limited to an Apple-curated provider list? Does the unified API normalize provider-specific capabilities — tool/function calling formats, system-prompt handling, streaming tokens, JSON/structured output, multi-turn state — or do these degrade to a lowest common denominator across providers? When a request is routed to a third-party cloud model, what is the data path and privacy boundary? Does it transit Private Cloud Compute, or go direct to the provider, and what is disclosed to the user about where their prompt is processed? If an app supplies a conforming provider, does that provider become selectable by Siri AI for system actions, or is custom-provider routing confined to in-app LanguageModelSession use only? With the framework slated to open-source this summer, will the provider/protocol surface be stable enough to build against now, or should developers expect breaking changes between the beta and the open-source release?
1
0
119
2w
Private Cloud Compute trust model across multiple cloud vendors
Reports indicate PCC now extends to NVIDIA hardware in Google Cloud datacenters, and the flagship cloud model is refined using Gemini outputs. Now that PCC spans infrastructure outside Apple's own datacenters, what attestation or verifiable transparency is available to developers and users about where a given request was processed, and do the original "data unreachable even by Apple" guarantees hold unchanged across all hardware vendors? For apps with enterprise or regulated users, is there documented data residency behavior for PCC and for third-party model routing, and any contractual/compliance posture (e.g., regional pinning) developers can rely on? Given the EU and China availability gaps at launch, what is the recommended graceful-degradation path for apps that must function in those regions — fall back to on-device only, to a developer-supplied provider, or disable AI features? Does routing to a third-party cloud provider through the framework carry the same PCC privacy guarantees, or are those guarantees specific to Apple's own cloud models?
0
0
16
2w
Spotlight semantic index & entity schemas — privacy and dynamic/remote content
Entity schemas add app content to the Spotlight semantic index so Siri can find information inside apps. Is the semantic index built and stored entirely on-device, or is any indexed entity content transmitted to Apple or to Private Cloud Compute for embedding/retrieval? How should developers index content that does not live on the device — data that resides on a remote server or is fetched on demand? Is there a provider/just-in-time pattern, or must entities be materialized locally first? What is the freshness/update latency of the index when entities change frequently, and what are the practical limits on entity count and update rate before indexing is throttled? What controls exist to exclude sensitive entities from the semantic index or from Siri's personal-context reach, on a per-entity or per-field basis? How is indexed app content scoped per user/account on shared or multi-account devices, and is it cleared on sign-out?
0
0
71
2w
Clarifying the "Weight List"
In the WWDC26 AI Group Lab, it was mentioned as a 'spoiler alert' that the 'weight list applies only to Siri' and not to the Private Cloud Compute (PCC) language model . Could you clarify if there is a technical path for a developer’s custom adapter—running via the Language Model Protocol—to ever be added to this weight list to handle system-originated Siri requests?
0
0
30
2w
App Intents — exposing conversational and agentic actions to Siri AI
App Intents now connect app content and actions to Apple Intelligence, and Siri AI can take action directly inside third-party apps without fixed trigger phrases. Can an app expose a single conversational/agent-style entry point to Siri AI, or must all capabilities be modeled as discrete intents? If discrete, how does Siri AI chain multiple intents to fulfill a compound natural-language request? What is the supported pattern for long-running or asynchronous intents — actions that acknowledge immediately but complete and return a result seconds or minutes later? Is there a progress/continuation/callback model? How are an intent's results rendered — inline in the Siri app, via a snippet/App Intent UI, or by deep-linking into the app? What control do developers have over that presentation? For intents whose parameters are ambiguous, what disambiguation and follow-up affordances does Siri AI provide, and can developers supply candidate resolutions dynamically at runtime? Is there an eligibility or review process for apps to participate in systemwide Siri AI actions, beyond simply adopting App Intents?
0
0
40
2w
Local Agentic AI on Mac using MLX: issues solved with Gemma-4
I was keen on trying local models with Xcode agents after watching the WWDC 2026 session Run Local agentic AI on the Mac using MLX https://developer.apple.com/videos/play/wwdc2026/232/ Ran into a few issues while following the three setup steps shown in the session, so I put together a small project with the workarounds I used: https://github.com/jdhark-com/opencode_mlx_bridge/ I needed to use another model than the one demonstrated. The main issue I hit was that running: mlx_lm.server --model mlx-community/gemma-4-e4b-it-4bit failed with: ValueError: Received 126 parameters not in model The workaround in start_xcode_server.py is to load the model with strict=False which resolved the issue for me. opencode.json prompt config really helped to get more verbose feedback from the model. Hopefully this helps anyone else trying to get a local MLX model working as an Xcode agent.
0
1
101
2w
Unions and @Schemas
Hello, I'm still working on my @addToAlbum schema implementation, and I'm exploring how multiple entities could be "destinations" to the intent. I considered using a @UnionValue for this, but I'm running into compiler difficulties trying to get a @UnionValue to conform to @AppEntity(schema: .photos.album) Am I out of luck on a Unionized "target" for the add-to-album intent?
0
0
74
2w
Visual Intelligence for VisionOS in 3rd Party Apps
During the keynote, we saw an amazing example of Siri using Visual Intelligence to identify items in the user's physical space and make inferences based on their size. Do 3rd party apps have the ability to perform this same, or similar actions? For example: User loads a photo of an item or product and clicks a button that says 'Find Item In My Space'. Apple Intelligence is then used to analyze the user's surroundings, and notify the user if the item is present or not present, along with some positional or physical context. Response is shown on the user interface as text, "This item is in your room, 1 meter to your right." Goal: Developers currently can not access the Passthrough Camera on Apple Vision Pro to run AI/ML vision processing models on, for privacy reasons. If Apple Intelligence can look through the camera for the developer, in a privacy-preserving, isolated black box, without providing the image texture to the developer in any way, the user can make use Visual Intelligence features based on their physical surroundings without sacrificing their privacy. Purpose: Visual Intelligence is a key feature for that exemplifies the benefits of Spatial Computing, and examples like the one shown in the Keynote are a perfect use-case for the medium. Since Siri now has this capability, users will come to expect that all apps across VisionOS will be able to perform the same kinds of actions. Developers don't generally want or need direct access to the images of a user's surroundings, and having a local/private method of processing these requests is ideal both for developers concerned with data privacy management and users concerned with developers having too much access to their surroundings. Wearable devices with cameras are a foundational accelerator to users adopting AI in useful ways for their daily life. It is the most natural way to communicate with AI about what is relevant to you at any given time, removes the friction/difficulty of manually scanning good data for AI inferencing, and brings purpose to wearing this class of device every day. As these devices become more common and capable, data privacy becomes even more important. Users will need reassurance that the devices they choose to wear will only have access to observe their surroundings when they choose to allow it, while retaining the capability to use the powerful features that make them worthwhile. Accessibility: Using Visual Intelligence is an extremely powerful accessibility tool (for example; for individuals who have low vision), and can meaningfully improve quality of life. Various applications beyond Siri AI can be designed by developers with very specific inferencing capabilities powered by AI. The future of Visually Intelligent apps should have intentional, unique purposes that users can choose to incorporate in their lives. This will not be a one-size-fits-all Visual Intelligence approach, and will require specific design, training and development to create meaningful capabilities. If this is already possible, amazing! Any resources to learn more would be greatly appreciated. If this is not yet possible, please let us know what we can do to encourage Apple to consider it. Thank you.
2
0
144
2w
Adapter Problem - compatibleAdapterNotFound
Hello. I have a problem with the FoundationModels adapter and the Apple-hosted managed asset pack via TestFlight. I have created an adapter that works fine locally by creating a model via (fileURL: URL) on a real device, but I cannot create a model using background assets by downloading the adapter via TestFlight. Every time I try to get an adapter, the creation of the adapter is interrupted by the compatibleAdapterNotFound error. The aar. archive i created using a special command - xcrun ba-package foundation-models package --adapter-path aurelius1.fmadapter --asset-pack-id fmadapter-aurelius1-9799725 --output-path ./aurelius1.aar --platforms iOS --on-demand\ after that, I replaced "OnDemand": null with "OnDemand": {} in the manifest so that the Transporter could send my archive to the App Store Connect. I followed all the recommendations in this topic - https://origin-devforums.apple.com/forums/thread/823148 ...but unfortunately unsuccessfully I would appreciate any help in solving this problem. here is the code that I use in my app -
5
0
243
2w
backDeploy SystemLanguageModel.tokenCount
SystemLanguageModel.contextSize is back-deployed, but SystemLanguageModel.tokenCount is not. The custom adapter toolkit ships with a ~2.7MB tokenizer with a ~150,000 vocabulary size, but the LICENSE.rtf exclusively permits it's use for training LoRAs. Is it possible to back-deploy tokenCount or for Apple to permit the use of the tokenizer.model for counting tokens? This is important to avoiding context overflow errors.
1
1
757
2w
Autocorrection and predictive text support for additional Cyrillic languages
Hello Apple Keyboard / Internationalization team, I would like to ask about autocorrection and predictive text support for additional Cyrillic-based languages, especially Kazakh, Kyrgyz, Chuvash, and Ingush. These languages use Cyrillic scripts with their own letters, spelling rules, and word-frequency patterns. When users type in these languages, Russian-based autocorrection or missing language-specific correction can produce incorrect suggestions or replacements. My questions are: Are there plans to expand autocorrection and predictive text support for more Cyrillic-based languages? Is there a recommended way for developers or language communities to provide dictionaries, word-frequency lists, corpora, or other linguistic data to help improve autocorrection? Should this type of request be submitted through Feedback Assistant, Developer Forums, or another Apple channel? I have corpus-based frequency data and language resources for multiple Cyrillic-based languages and would be happy to share them if useful. Thank you. Ali Kuzhuget
1
3
143
2w
New Siri AI and indexing stuck
New siri AI and indexing stuck for over 65 hours i’ve tried everything. Hard restarting my phone, putting an airplane mode doing the diagnostics, everything, but nothing still helps. I also downloaded the iOS 27 beta on my iPad Air 13 inch M3 and the same thing happened to it I’m waiting over 24 hours now. Does anybody know how to resolve this because I am tired of waiting and waiting
0
0
146
2w
Approaching Custom VST GUI Automation: Combining local Vision OCR with the new FoundationModels framework for screen-grounding
Hello everyone, I’m working on a project to automate software controls inside non-standard macOS applications—specifically custom-drawn audio plugins (like the Roland TR-909 VST). The Challenge: These VST interfaces do not expose their buttons, knobs, or dials via the standard macOS Accessibility tree (NSAccessibility / event taps). Because they are custom-rendered, standard automation tools are blind to them. My Current Hybrid Approach: I am combining two of Apple's local machine learning technologies to solve this without sending data to the cloud: Step 1: Text-Based Layout Mapping (Vision Framework) I capture a screenshot of the targeted window using Quartz Window Services and run a local VNRecognizeTextRequest to extract coordinates for all text labels. This works exceptionally well for text buttons like "OPTION" or "ABOUT". Step 2: Contextual & Non-Text Element Interpretation (FoundationModels Framework) For controls that lack text labels (such as blank step sequencer buttons, parameter knobs, or toggle light states), I pass the screenshot as an Attachment into the new local LanguageModelSession. I ask the model to ground coordinates relative to the text landmarks mapped in Step 1. Here is a simplified snippet of how I am feeding the visual context into the local model: import Foundation import FoundationModels import Cocoa func analyzePluginInterface(cgImage: CGImage) async { guard SystemLanguageModel.default.isAvailable else { print("Local model not downloaded or available.") return } let instructions = """ You are a screen-aware assistant. Your job is to locate GUI controls on a custom 1024x802 VST window. """ let session = LanguageModelSession(instructions: instructions) do { let response = try await session.respond { "Look at this screenshot of the VST window." Attachment(cgImage) "Locate the blank step-sequencer buttons located below the instrument channel labels." "What are the center coordinates (X, Y) for the first active step?" } print("Model Grounding Output: \(response.content)") } catch { print("Inference failed: \(error)") } } My Questions for the Community: Performance & Latency: The local LanguageModelSession.respond call takes several seconds to run on device. For real-time DAW automation, this is a bottleneck. Has anyone experimented with using a custom LoRA adapter or a smaller model profile to speed up spatial coordinate inference? Coordinate Stability: Multimodal models can sometimes hallucinate coordinates (bounding box values). What strategies are you using to constrain the model output to precise pixel boundaries on varying display scaling configurations (Retina vs non-Retina)? Alternative Solutions: Are there newer on-device vision APIs (perhaps in CoreML or Vision) that are better suited for bounding-box grounding of abstract graphics (like dials/knobs) than a general language model session? Would love to hear how others are approaching screen-aware GUI interpretation with these new frameworks! Thanks!
0
0
54
2w
Siri to be interoperable with Copilot’s version control systems
Thank the elders for their knowledge and teachings. Is there a consensus regarding Siri’s utilization for the Agentic and/ or Copilot version control systems. For example the Copilot within, Edge Browser, the stand alone App, the Xbox copilot, and the M365 copilot App. Does the team have a standardized approach for the’start’ feature that can be prompted whilst utilizing Copilot’s build and generate capabilities? Thank you all and best regards.
1
0
29
2w
The standalone Siri app and cross-surface continuity
The new standalone Siri app keeps conversation history synced via iCloud across iPhone, iPad, and Mac. Can third-party content, results, or an app's agent surface appear inside the Siri app (e.g., as referenced sources or follow-up actions), and can the user deep-link from a Siri-app result back into the originating app with state intact? Is any conversation context from the Siri app exposed to a developer's intent when an action is invoked, so the app can act with the relevant context, and what are the privacy boundaries on that? When the same action is invoked from different surfaces (in-app, system Siri, the Siri app) and across synced devices, how should developers reason about execution location and idempotency to avoid duplicate side effects?
Replies
0
Boosts
0
Views
21
Activity
2w
Foundation Models framework — the unified API for third-party cloud providers
The 2026 framework lets apps call cloud models like Claude and Gemini (or "any provider that conforms to Apple's Language Model protocol") through the same Swift API as the on-device model. What exactly must a provider implement to conform to the Language Model protocol, and can developers register a custom/self-hosted endpoint and their own API keys, or is routing limited to an Apple-curated provider list? Does the unified API normalize provider-specific capabilities — tool/function calling formats, system-prompt handling, streaming tokens, JSON/structured output, multi-turn state — or do these degrade to a lowest common denominator across providers? When a request is routed to a third-party cloud model, what is the data path and privacy boundary? Does it transit Private Cloud Compute, or go direct to the provider, and what is disclosed to the user about where their prompt is processed? If an app supplies a conforming provider, does that provider become selectable by Siri AI for system actions, or is custom-provider routing confined to in-app LanguageModelSession use only? With the framework slated to open-source this summer, will the provider/protocol surface be stable enough to build against now, or should developers expect breaking changes between the beta and the open-source release?
Replies
1
Boosts
0
Views
119
Activity
2w
Performance and customization of alternate options
Performance wise what are trade-offs when running an MLX-backed model on-device compared to using the system's AFM Core model? Also semiconnected: How do I use the 'model judge evaluator' to compare the accuracy of a custom LoRA adapter against the system's private cloud compute models?
Replies
1
Boosts
0
Views
199
Activity
2w
Hybrid assistant architecture (on-device model + server tools)
We run a conversational assistant where answers depend on live API data, not just static knowledge. What is Apple’s recommended split between on-device Foundation Models (intent, routing, summarization, privacy-sensitive context) and server-side tool execution? Is there an official pattern for a local planner with a remote executor?
Replies
0
Boosts
0
Views
22
Activity
2w
Private Cloud Compute trust model across multiple cloud vendors
Reports indicate PCC now extends to NVIDIA hardware in Google Cloud datacenters, and the flagship cloud model is refined using Gemini outputs. Now that PCC spans infrastructure outside Apple's own datacenters, what attestation or verifiable transparency is available to developers and users about where a given request was processed, and do the original "data unreachable even by Apple" guarantees hold unchanged across all hardware vendors? For apps with enterprise or regulated users, is there documented data residency behavior for PCC and for third-party model routing, and any contractual/compliance posture (e.g., regional pinning) developers can rely on? Given the EU and China availability gaps at launch, what is the recommended graceful-degradation path for apps that must function in those regions — fall back to on-device only, to a developer-supplied provider, or disable AI features? Does routing to a third-party cloud provider through the framework carry the same PCC privacy guarantees, or are those guarantees specific to Apple's own cloud models?
Replies
0
Boosts
0
Views
16
Activity
2w
Spotlight semantic index & entity schemas — privacy and dynamic/remote content
Entity schemas add app content to the Spotlight semantic index so Siri can find information inside apps. Is the semantic index built and stored entirely on-device, or is any indexed entity content transmitted to Apple or to Private Cloud Compute for embedding/retrieval? How should developers index content that does not live on the device — data that resides on a remote server or is fetched on demand? Is there a provider/just-in-time pattern, or must entities be materialized locally first? What is the freshness/update latency of the index when entities change frequently, and what are the practical limits on entity count and update rate before indexing is throttled? What controls exist to exclude sensitive entities from the semantic index or from Siri's personal-context reach, on a per-entity or per-field basis? How is indexed app content scoped per user/account on shared or multi-account devices, and is it cleared on sign-out?
Replies
0
Boosts
0
Views
71
Activity
2w
Clarifying the "Weight List"
In the WWDC26 AI Group Lab, it was mentioned as a 'spoiler alert' that the 'weight list applies only to Siri' and not to the Private Cloud Compute (PCC) language model . Could you clarify if there is a technical path for a developer’s custom adapter—running via the Language Model Protocol—to ever be added to this weight list to handle system-originated Siri requests?
Replies
0
Boosts
0
Views
30
Activity
2w
App Intents — exposing conversational and agentic actions to Siri AI
App Intents now connect app content and actions to Apple Intelligence, and Siri AI can take action directly inside third-party apps without fixed trigger phrases. Can an app expose a single conversational/agent-style entry point to Siri AI, or must all capabilities be modeled as discrete intents? If discrete, how does Siri AI chain multiple intents to fulfill a compound natural-language request? What is the supported pattern for long-running or asynchronous intents — actions that acknowledge immediately but complete and return a result seconds or minutes later? Is there a progress/continuation/callback model? How are an intent's results rendered — inline in the Siri app, via a snippet/App Intent UI, or by deep-linking into the app? What control do developers have over that presentation? For intents whose parameters are ambiguous, what disambiguation and follow-up affordances does Siri AI provide, and can developers supply candidate resolutions dynamically at runtime? Is there an eligibility or review process for apps to participate in systemwide Siri AI actions, beyond simply adopting App Intents?
Replies
0
Boosts
0
Views
40
Activity
2w
Local Agentic AI on Mac using MLX: issues solved with Gemma-4
I was keen on trying local models with Xcode agents after watching the WWDC 2026 session Run Local agentic AI on the Mac using MLX https://developer.apple.com/videos/play/wwdc2026/232/ Ran into a few issues while following the three setup steps shown in the session, so I put together a small project with the workarounds I used: https://github.com/jdhark-com/opencode_mlx_bridge/ I needed to use another model than the one demonstrated. The main issue I hit was that running: mlx_lm.server --model mlx-community/gemma-4-e4b-it-4bit failed with: ValueError: Received 126 parameters not in model The workaround in start_xcode_server.py is to load the model with strict=False which resolved the issue for me. opencode.json prompt config really helped to get more verbose feedback from the model. Hopefully this helps anyone else trying to get a local MLX model working as an Xcode agent.
Replies
0
Boosts
1
Views
101
Activity
2w
Unions and @Schemas
Hello, I'm still working on my @addToAlbum schema implementation, and I'm exploring how multiple entities could be "destinations" to the intent. I considered using a @UnionValue for this, but I'm running into compiler difficulties trying to get a @UnionValue to conform to @AppEntity(schema: .photos.album) Am I out of luck on a Unionized "target" for the add-to-album intent?
Replies
0
Boosts
0
Views
74
Activity
2w
Visual Intelligence for VisionOS in 3rd Party Apps
During the keynote, we saw an amazing example of Siri using Visual Intelligence to identify items in the user's physical space and make inferences based on their size. Do 3rd party apps have the ability to perform this same, or similar actions? For example: User loads a photo of an item or product and clicks a button that says 'Find Item In My Space'. Apple Intelligence is then used to analyze the user's surroundings, and notify the user if the item is present or not present, along with some positional or physical context. Response is shown on the user interface as text, "This item is in your room, 1 meter to your right." Goal: Developers currently can not access the Passthrough Camera on Apple Vision Pro to run AI/ML vision processing models on, for privacy reasons. If Apple Intelligence can look through the camera for the developer, in a privacy-preserving, isolated black box, without providing the image texture to the developer in any way, the user can make use Visual Intelligence features based on their physical surroundings without sacrificing their privacy. Purpose: Visual Intelligence is a key feature for that exemplifies the benefits of Spatial Computing, and examples like the one shown in the Keynote are a perfect use-case for the medium. Since Siri now has this capability, users will come to expect that all apps across VisionOS will be able to perform the same kinds of actions. Developers don't generally want or need direct access to the images of a user's surroundings, and having a local/private method of processing these requests is ideal both for developers concerned with data privacy management and users concerned with developers having too much access to their surroundings. Wearable devices with cameras are a foundational accelerator to users adopting AI in useful ways for their daily life. It is the most natural way to communicate with AI about what is relevant to you at any given time, removes the friction/difficulty of manually scanning good data for AI inferencing, and brings purpose to wearing this class of device every day. As these devices become more common and capable, data privacy becomes even more important. Users will need reassurance that the devices they choose to wear will only have access to observe their surroundings when they choose to allow it, while retaining the capability to use the powerful features that make them worthwhile. Accessibility: Using Visual Intelligence is an extremely powerful accessibility tool (for example; for individuals who have low vision), and can meaningfully improve quality of life. Various applications beyond Siri AI can be designed by developers with very specific inferencing capabilities powered by AI. The future of Visually Intelligent apps should have intentional, unique purposes that users can choose to incorporate in their lives. This will not be a one-size-fits-all Visual Intelligence approach, and will require specific design, training and development to create meaningful capabilities. If this is already possible, amazing! Any resources to learn more would be greatly appreciated. If this is not yet possible, please let us know what we can do to encourage Apple to consider it. Thank you.
Replies
2
Boosts
0
Views
144
Activity
2w
Python 3.13 macOS wheel for coreai-core
Will there be a wheel published on pypi.org for Python 3.13 on macOS? There is a 3.13 wheel for Linux, but not macOS.
Replies
0
Boosts
0
Views
97
Activity
2w
Adapter Problem - compatibleAdapterNotFound
Hello. I have a problem with the FoundationModels adapter and the Apple-hosted managed asset pack via TestFlight. I have created an adapter that works fine locally by creating a model via (fileURL: URL) on a real device, but I cannot create a model using background assets by downloading the adapter via TestFlight. Every time I try to get an adapter, the creation of the adapter is interrupted by the compatibleAdapterNotFound error. The aar. archive i created using a special command - xcrun ba-package foundation-models package --adapter-path aurelius1.fmadapter --asset-pack-id fmadapter-aurelius1-9799725 --output-path ./aurelius1.aar --platforms iOS --on-demand\ after that, I replaced "OnDemand": null with "OnDemand": {} in the manifest so that the Transporter could send my archive to the App Store Connect. I followed all the recommendations in this topic - https://origin-devforums.apple.com/forums/thread/823148 ...but unfortunately unsuccessfully I would appreciate any help in solving this problem. here is the code that I use in my app -
Replies
5
Boosts
0
Views
243
Activity
2w
backDeploy SystemLanguageModel.tokenCount
SystemLanguageModel.contextSize is back-deployed, but SystemLanguageModel.tokenCount is not. The custom adapter toolkit ships with a ~2.7MB tokenizer with a ~150,000 vocabulary size, but the LICENSE.rtf exclusively permits it's use for training LoRAs. Is it possible to back-deploy tokenCount or for Apple to permit the use of the tokenizer.model for counting tokens? This is important to avoiding context overflow errors.
Replies
1
Boosts
1
Views
757
Activity
2w
Autocorrection and predictive text support for additional Cyrillic languages
Hello Apple Keyboard / Internationalization team, I would like to ask about autocorrection and predictive text support for additional Cyrillic-based languages, especially Kazakh, Kyrgyz, Chuvash, and Ingush. These languages use Cyrillic scripts with their own letters, spelling rules, and word-frequency patterns. When users type in these languages, Russian-based autocorrection or missing language-specific correction can produce incorrect suggestions or replacements. My questions are: Are there plans to expand autocorrection and predictive text support for more Cyrillic-based languages? Is there a recommended way for developers or language communities to provide dictionaries, word-frequency lists, corpora, or other linguistic data to help improve autocorrection? Should this type of request be submitted through Feedback Assistant, Developer Forums, or another Apple channel? I have corpus-based frequency data and language resources for multiple Cyrillic-based languages and would be happy to share them if useful. Thank you. Ali Kuzhuget
Replies
1
Boosts
3
Views
143
Activity
2w
New Siri AI and indexing stuck
New siri AI and indexing stuck for over 65 hours i’ve tried everything. Hard restarting my phone, putting an airplane mode doing the diagnostics, everything, but nothing still helps. I also downloaded the iOS 27 beta on my iPad Air 13 inch M3 and the same thing happened to it I’m waiting over 24 hours now. Does anybody know how to resolve this because I am tired of waiting and waiting
Replies
0
Boosts
0
Views
146
Activity
2w
Approaching Custom VST GUI Automation: Combining local Vision OCR with the new FoundationModels framework for screen-grounding
Hello everyone, I’m working on a project to automate software controls inside non-standard macOS applications—specifically custom-drawn audio plugins (like the Roland TR-909 VST). The Challenge: These VST interfaces do not expose their buttons, knobs, or dials via the standard macOS Accessibility tree (NSAccessibility / event taps). Because they are custom-rendered, standard automation tools are blind to them. My Current Hybrid Approach: I am combining two of Apple's local machine learning technologies to solve this without sending data to the cloud: Step 1: Text-Based Layout Mapping (Vision Framework) I capture a screenshot of the targeted window using Quartz Window Services and run a local VNRecognizeTextRequest to extract coordinates for all text labels. This works exceptionally well for text buttons like "OPTION" or "ABOUT". Step 2: Contextual & Non-Text Element Interpretation (FoundationModels Framework) For controls that lack text labels (such as blank step sequencer buttons, parameter knobs, or toggle light states), I pass the screenshot as an Attachment into the new local LanguageModelSession. I ask the model to ground coordinates relative to the text landmarks mapped in Step 1. Here is a simplified snippet of how I am feeding the visual context into the local model: import Foundation import FoundationModels import Cocoa func analyzePluginInterface(cgImage: CGImage) async { guard SystemLanguageModel.default.isAvailable else { print("Local model not downloaded or available.") return } let instructions = """ You are a screen-aware assistant. Your job is to locate GUI controls on a custom 1024x802 VST window. """ let session = LanguageModelSession(instructions: instructions) do { let response = try await session.respond { "Look at this screenshot of the VST window." Attachment(cgImage) "Locate the blank step-sequencer buttons located below the instrument channel labels." "What are the center coordinates (X, Y) for the first active step?" } print("Model Grounding Output: \(response.content)") } catch { print("Inference failed: \(error)") } } My Questions for the Community: Performance & Latency: The local LanguageModelSession.respond call takes several seconds to run on device. For real-time DAW automation, this is a bottleneck. Has anyone experimented with using a custom LoRA adapter or a smaller model profile to speed up spatial coordinate inference? Coordinate Stability: Multimodal models can sometimes hallucinate coordinates (bounding box values). What strategies are you using to constrain the model output to precise pixel boundaries on varying display scaling configurations (Retina vs non-Retina)? Alternative Solutions: Are there newer on-device vision APIs (perhaps in CoreML or Vision) that are better suited for bounding-box grounding of abstract graphics (like dials/knobs) than a general language model session? Would love to hear how others are approaching screen-aware GUI interpretation with these new frameworks! Thanks!
Replies
0
Boosts
0
Views
54
Activity
2w
Siri to be interoperable with Copilot’s version control systems
Thank the elders for their knowledge and teachings. Is there a consensus regarding Siri’s utilization for the Agentic and/ or Copilot version control systems. For example the Copilot within, Edge Browser, the stand alone App, the Xbox copilot, and the M365 copilot App. Does the team have a standardized approach for the’start’ feature that can be prompted whilst utilizing Copilot’s build and generate capabilities? Thank you all and best regards.
Replies
1
Boosts
0
Views
29
Activity
2w
A question about new iOS 27 Siri and Apple Intelligence
I have a question about the new Siri on iOS 27. That Im developing an app where people can order like dif sound technologies such as speakers and earphone and goes on. But can I merge it with new Siri that if customers can order through Siri and she will make the order?
Replies
0
Boosts
0
Views
66
Activity
2w
Speech generation by the new Foundation Model
During the Keynote (at 30m:20s) Craig Federighi mentions the second, "even more powerful version of our on-device model" and that this model lets supported products understand and generate speech. Is there any public API for generating speech using this model?
Replies
0
Boosts
0
Views
37
Activity
2w