Media Technologies

Explore the integration of media technologies within your app. Discuss working with audio, video, camera, and other media functionalities.

Photos & Camera

Audio

General

Streaming

Video

All subtopics

Post

Replies

Boosts

Views

Activity

.sf3 sound font

...when will they be supported natively?

Media Technologies Audio

AVPlayerNode + AVAudioEngine

What's the best way of detecting if player node / audio engine is in broken state (for example as a result of AVAudioSession.mediaServicesWereResetNotification). Sometimes it requires being reset, but from time to time it is not enought to just reset it for example playerNode.reset() engine.reset() But also required to reinitialize it playerNode.reset() playerNode = .init() engine.reset() engine = .init() Can we subscribe on the broken states and reinitialize them proactively?

Media Technologies Audio

kAudioUnitProperty_ParameterList vs AUParameterTree

This is kind of related to my other question, where I think AUv3's KVO to AUv2 C API notification mapping doesn't seem to work. For example, Reaper supports AUv3 plug-ins but I think it uses AUv2 APIs, and the developer told me that they observe these: kAudioUnitEvent_PropertyChange / kAudioUnitScope_Global / kAudioUnitProperty_ParameterList If one of them changes, they will rescan the parameter list/tree. My AUv3 Mela has a dynamic parameter tree, and changes based on the preset loaded, and it works well enough in Logic Pro for iPad/Mac and AUM on iOS. I know it sends out KVOs for the parameterTree property and I've also tried sending out allParameterValues KVOs but it seems Reaper is not getting these notifications. Anything I can do? (Reaper forum discussion, check last few messages https://forum.cockos.com/showthread.php?t=300840)

Media Technologies Audio

Is there plans to evolve AUv3 API?

AUv3 was introduced back when iOS 9 came out, and it only had very minor updates. In a way, it didn't seem important enough for the desktop world, as they continue to use AUv2, and only iOS musicians/developers embraced AUv3. Even Logic Pro doesn't feel like it fully embraces it. It is almost like there was no good enough reason for the desktop world to switch over. Also, because of this, I think many bugs remain. One of the bugs I keep hitting, I think, is an AUv2 to AUv3 API mapping issue. Where AUv3 uses KVO for certain properties, and somehow doesn't work well if the host uses the C API. Here's one example: https://developer.apple.com/forums/thread/828549

Media Technologies Audio

a problem

Handling multilingual text. When a user selects a block of text that is mostly Chinese but contains embedded English words (e.g., technical terms in parentheses), the system reader often stutters, stops, or skips the English entirely. What is the best way to handle mixed-language text processing so that the speech engine can seamlessly and fluidly read Chinese and English together without dropping words?

Media Technologies Audio

Native diarization in '27?

I'm working on a macOS transcription utility that uses Apple's Speech framework (SpeechAnalyzer) for speech-to-text. This is for meetings/interviews/podcasts where speaker identification is critical. The current limitation is speaker attribution — I need to identify which speaker is producing each segment of transcribed text. I have three questions: Native diarization in iOS 27 / macOS Golden Gate Is native diarization coming in the fall release? I've reviewed the WWDC 2026 session catalog and found no mention of diarization in SpeechAnalyzer or elsewhere. I'm probably going to use FluidAudio for speaker attribution, but I'd strongly prefer a native solution if one exists or is planned. Do I need to stay with third-party libraries, or is this coming? Core AI and custom models The new Core AI framework was announced for on-device model deployment. Can I train or integrate a custom diarization model via Core AI? If yes, are there sample implementations or documentation for audio-processing models? Core Audio framework updates Were there any Core Audio API-level additions announced at WWDC 2026 that might support audio analysis or speaker detection downstream? I saw no dedicated session, but wanted to verify. Thanks for any guidance on this.

Media Technologies Audio

How to Fix the Emotionless and Cold Tone of Machine-Read Text?

I am designing an educational app. I notice that current system text-to-speech (like AVSpeechSynthesizer) often sounds too mechanical because the time intervals between characters are strictly equal, making it lack natural human prosody, phrasing, and warmth-which is a huge dealbreaker for sensitive users like children. How can we customize text-to-speech to break this uniform word-spacing, manage prosody dynamically, and make the Al voice sound more emotionally engaging and natural rather than a cold robot? I really want to create an elegant listening experience that feels like a real human storytelling, not just machine reading.

Media Technologies Audio

Cannot generate 2048-bit FairPlay Streaming certificate

Hello, I have a problem generating a 2048-bit FairPlay Streaming certificate. I tried generating SDK v26.x certificate in two ways. (1) Use existing certificate (2) Create new certificate Though, in both ways, Apple gives me a certificate bundle of 1024-bit certificate. (fps_certificate.bin) I've uploaded 2048-bit CSR on creating a certificate. Just to note, I have created a SDK v4.x certificate few years ago. Have anyone bumped into a same issue? Or am I missing something?

Media Technologies Streaming FairPlay Streaming

1.4k

Native diarization in '27?

I'm working on a macOS transcription utility that uses Apple's Speech framework (SpeechAnalyzer) for speech-to-text. This is for meetings/interviews/podcasts where speaker identification is critical. The current challenge is speaker attribution — I need to identify which speaker is producing each segment of transcribed text, and Apple doesn't support this in '26. I have three questions: Is native diarization coming in the fall release? I've reviewed the WWDC 2026 session catalog and found no mention of diarization in SpeechAnalyzer or elsewhere. I'm probably going to use FluidAudio for speaker attribution, but strongly prefer a native solution if one exists or is planned. Do I need to stay with third-party libraries, or is this coming? Core AI and custom models The new Core AI framework was announced for on-device model deployment. Can I train or integrate a custom diarization model via Core AI? If yes, are there sample implementations or documentation for audio-processing models? Core Audio framework updates Were there any Core Audio API-level additions announced at WWDC 2026 that might support audio analysis or speaker detection downstream? I saw no dedicated session, but wanted to verify. Thanks for any guidance on this.

Media Technologies Audio

Facing issues with response from Fairplay SDK based service

Currently we are building a service based on Fairplay SDK version 26.0. Currently our solution is using version 4.5.4. When we run the below request to get version we get proper response curl http://xx.xx.xx.xx:8080/fps/v Response - V26.0 Our client applications call below two APIs https://GW_HOST:8080/fairplay_cert https://GW_HOST:8080/fairplay_license Within the cert API call, we are returning the fairplay public certificate. Currently we are trying to use the test certificate provided along with Fairplay SDK (test_fps_certificate_v26.bin) Then within the fairplay_license API call, we are trying to reach fairplay service based on Fairplay SDK v26 We are seeing some issues with below request(attaching the request json payload) curl -v -X POST \ -H "Content-Type: application/json" \ -d @SDKValidation.json \ http://xx.xx.xx.xx:8080/fps SDKValidation.json We are getting "Empty response from server" When we checked the apache error logs in the file "/etc/httpd/logs/error_log" we see some exception. We are sharing the traces in a file (ApacheErrorLogs.txt). ApacheErrorLogs.txt Also if we use old pblic key used with version 4.5.4, we are getting another error from service. {"fairplay-streaming-response":{"create-ckc":[{"id":1,"status":-42605}]}} Can you please help us with the reason of this failure?

Media Technologies Streaming FairPlay Streaming

1.9k

Test post

testing

Media Technologies Video VideoToolbox

125

test-title-1

test-post-1

Media Technologies Video

Real-time Audio Analysis of Audio Played by Other Apps on iPhone

I’m evaluating a simple iOS application that would perform real-time beat detection and audio analysis. My question is: Can an App Store-compliant iOS application access or analyze audio that is being played by other applications on the same device (e.g. Spotify, Apple Music, YouTube, TikTok, Safari, etc.) in real time, without using the microphone? Specifically: Is there any Apple-supported framework that allows access to system audio for real-time beat detection or frequency analysis? Can ReplayKit be used to analyze audio buffers from other applications in real time without recording or saving the audio? If direct access is not permitted, what Apple-approved architecture would be recommended for synchronizing external hardware with music being played on the iPhone? Would such an implementation be acceptable under App Store Review Guidelines? I am trying to determine whether real-time beat detection from audio played by other apps is technically and policy-wise supported on iOS. Thank you.

Media Technologies Audio Audio ReplayKit AVFoundation iOS

214

import AVFoundation var player: AVAudioPlayer? func playBackgroundAudio() { do { try AVAudioSession.sharedInstance().setCategory(.playback, mode: .default) try AVAudioSession.sharedInstance().setActive(true) } catch { print("Audio session setup failed: (error)") } if let url = Bundle.main.url(forResource: "background_music", withExtension: "mp3") { do { player = try AVAudioPlayer(contentsOf: url) player?.numberOfLoops = -1 player?.play() } catch { print("Error playing audio: \(error)") } } } playBackgroundAudio()

Media Technologies General Swift Packages

516

Push Notification sounds with AVAudioSession, AVAudioEngine

I am using AVAudioSession, AVAudioEngine and SpeechAnalyzer to listen to commands, also when the phone is locked. In the same time, I can receive PushNotifications with pre-defined sound. However, the pre-defined sound is not played when the AVAudioEngine is running and the phone is locked. In the code below, I have made many experiments, all of them are "Receive Push Notification while the phone is locked", and I have the following results: If audioEngine has started - I only see the alert, but no sound. If I comment out audioEngine.start, all works as expected and I hear the apns sound on the speaker. If I change the AVAudioSession category to 'record' I don't receive the push message at all! I wonder if anyone has seen it. Here is my code: private func doStartListening() async { print("SpeechService: doStartListening called") guard !audioEngine.isRunning else { print("SpeechService: Audio engine already running") return } do { try configureAudioSession() let recordingFormat = audioEngine.inputNode.outputFormat(forBus: 0) audioEngine.inputNode.removeTap(onBus: 0) guard let locale = await SpeechTranscriber.supportedLocale(equivalentTo: Locale(identifier: "en-US")) else { print("English is not supported on this device") return } let transcriber = SpeechTranscriber(locale: locale, preset: .transcription) if let installationRequest = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) { try await installationRequest.downloadAndInstall() } let (inputSequence, inputBuilder) = AsyncStream.makeStream(of: AnalyzerInput.self) let audioFormat = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber]) let analyzer = SpeechAnalyzer(modules: [transcriber]) // Initialize the modern SpeechAnalyzer self.analyzer = analyzer task = Task { print("SpeechService: Starting analyzer results loop") do { for try await result in transcriber.results { if Task.isCancelled { break } self.handleAnalyzerResult(result) } } catch { print("SpeechService: Analyzer error: \(error.localizedDescription)") let nsError = error as NSError if nsError.domain == "kAFAssistantErrorDomain" && nsError.code == 203 { self.addLog(NSLocalizedString("error_siri_disabled", comment: "")) Task { await self.stopListening() } } else if self.isListening { self.restartRecognition() } } } audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self]buffer, _ in guard let audioFormat else { return } do { let converted = try self!.converter.convertBuffer(buffer, to: audioFormat) inputBuilder.yield(AnalyzerInput(buffer: converted)) } catch { print("Exception when converting audio") } } audioEngine.prepare() try audioEngine.start() print("SpeechService: Audio engine started") try await analyzer.start(inputSequence: inputSequence) isListening = true addLog(NSLocalizedString("waiting_wakeup", comment: "")) } catch { print("SpeechService: Error starting listening: \(error.localizedDescription)") addLog("Error starting listening: \(error.localizedDescription)") lastError = error.localizedDescription isListening = false } } private func configureAudioSession() throws { let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.playAndRecord, mode: .default, options: [.mixWithOthers, .defaultToSpeaker]) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) }

Media Technologies Audio APNS Media Player AVAudioSession AVAudioEngine

561

Entitlement "com.apple.developer.carplay-driving-task" not allowing audio playback for voice controlled interaction

According to https://developer.apple.com/download/files/CarPlay-Developer-Guide.pdf , apps with entitlement com.apple.developer.carplay-driving-task are allowed to use voice control. In my current implementation the voice recording working fine but the voice response (AVPlayer with category "playback set") does not output any audio. I suspect that it is a entitlement limitation because if I quickly tap to play a music while the voice assistant AVPlayer is "playing", then I can hear the response, but without this trick it stays playing but mute. In parallel I have now requested com.apple.developer.carplay-voice-based-conversation entitlement , but I don't even know if when approved I will be able to use 2 entitlement for the same CarPlay app. Long story short: 1 - Should an app be able to play audio responses when it's CarPlay entitlement is com.apple.developer.carplay-driving-task? 2 - If not, can I combine entitlements com.apple.developer.carplay-driving-task and com.apple.developer.carplay-voice-based-conversation?

Media Technologies Audio CarPlay Audio Siri and Voice

977

macOS 26 – NSSound/CoreAudio causes SIGILL crash in caulk allocator

Hi everyone, We are the engineering team behind an enterprise communications application for macOS. We are experiencing a critical crash on macOS 26 that did not occur on any previous macOS version. We are seeking clarification from Apple engineers or anyone who may have insight into this behaviour. Environment Architecturex86_64macOS26.4.1 (25E253)HardwareMac15,13 (MacBook Pro)ExceptionSIGILL / ILL_ILLOPCCrashed ThreadThread 0 (Main Thread)TriggerPlaying a notification sound via NSSound during an incoming call Crash Stack 0 caulk consolidating_free_map::maybe_create_free_node + 119 ← SIGILL 1 caulk tiered_allocator + 1469 2 caulk exported_resource::do_allocate + 15 3 AudioToolboxCore EABLImpl::create + 204 4 CoreAudio AUNotQuiteSoSimpleTimeFactory + 33267 8 AudioToolboxCore AudioUnitInitialize + 189 9 AudioToolbox XAudioUnit::Initialize + 19 10 AudioToolbox MESubmixGraph::initialize + 125 11 AudioToolbox MESubmixGraph::connectInputChannel + 1172 12 AudioToolbox MEDeviceStreamClient::AddRunningClient + 509 15 AudioToolbox AudioQueueObject::StartRunning + 194 16 AudioToolbox AudioQueueObject::Start + 1447 22 AudioToolbox AQ::API::V2Impl::AudioQueueStartWithFlags + 805 23 AVFAudio AVAudioPlayerCpp::playQueue + 354 24 AVFAudio AVAudioPlayerCpp::DoAction + 134 25 AVFAudio -[AVAudioPlayer play] + 26 26 AppKit -[NSSound play] + 100 27 Our App -[AudioHelper tryToStartSound:ofType:] + 569 28 Our App block_invoke + 59 Behaviour Difference Between macOS Versions The exact same code path that triggers this crash on macOS 26 works without any issue on macOS 14 and macOS 15 — no crash, no warning, no log output of any kind. The crash occurs inside Apple's private caulk memory allocator during CoreAudio audio engine initialisation, triggered by a call to [NSSound play]. The SIGILL / ILL_ILLOPC at maybe_create_free_node + 119 suggests a hard ud2 trap — an intentional abort guard inserted at compile time. This strongly suggests that something changed in macOS 26 within NSSound / CoreAudio / caulk that causes this code path to fail in a way it previously did not. Questions We have the following specific questions: Was there a deliberate threading policy change in NSSound / CoreAudio in macOS 26? Is the SIGILL in caulk::consolidating_free_map::maybe_create_free_node an intentional thread-affinity assertion introduced in macOS 26? Are there any other NSSound / AVAudioPlayer / AudioQueue APIs that have similarly tightened their requirements in macOS 26 that we should be aware of? Is there a migration guide, release note, or WWDC session that covers CoreAudio changes in macOS 26 that we may have missed? Has anyone else in the developer community encountered a similar SIGILL crash in caulk on macOS 26 during audio playback?

Media Technologies Audio AudioToolbox Audio Sound and Haptics AVFoundation

2.7k

AVSpeechSynthesisVoice.speechVoices() - different behavior on Mac (Designed for iPhone) and iOS and MANY errors checking .audioFileSettings properties.

We recently started working on getting an iOS app to work on Macs with Apple Silicon as a "Designed for iPhone" app and are having issues with speech synthesis. Specifically, voices retuned by AVSpeechSynthesisVoice.speechVoices() do not all work on the Mac. When we build an utterance and attempt to speak, the synthesizer falls back on a default voice and says some very odd text about voice parameters (that is not in the utterance speech text) before it does say the intended speech. Here is some sample code to setup the utterance and speak: func speak(_ text: String, _ settings: AppSettings) { let utterance = AVSpeechUtterance(string: text) if let voice = AVSpeechSynthesisVoice(identifier: settings.selectedVoiceIdentifier) { utterance.voice = voice print("speak: voice assigned \(voice.audioFileSettings)") } else { print("speak: voice error") } utterance.rate = settings.speechRate utterance.pitchMultiplier = settings.speechPitch do { let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.playback, mode: .default, options: .duckOthers) try audioSession.setActive(true, options: .notifyOthersOnDeactivation) self.synthesizer.speak(utterance) return } catch let error { print("speak: Error setting up AVAudioSession: \(error.localizedDescription)") } } When running the app on the Mac, this is the kind of error we get with "com.apple.eloquence.en-US.Rocko" as the selectedVoiceIdentifier: speak: voice assgined [:] 2023-05-29 18:00:14.245513-0700 A.I.[9244:240554] [aqme] AQMEIO_HAL.cpp:742 kAudioDevicePropertyMute returned err 2003332927 2023-05-29 18:00:14.410477-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages ( "en-US" ), Age: 0, Gender: 0, Size: 0, Version: (null) 2023-05-29 18:00:14.412837-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages ( "en-US" ), Age: 0, Gender: 0, Size: 0, Version: (null) 2023-05-29 18:00:14.413774-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages ( "en-US" ), Age: 0, Gender: 0, Size: 0, Version: (null) 2023-05-29 18:00:14.414661-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages ( "en-US" ), Age: 0, Gender: 0, Size: 0, Version: (null) 2023-05-29 18:00:14.415544-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages ( "en-US" ), Age: 0, Gender: 0, Size: 0, Version: (null) 2023-05-29 18:00:14.416384-0700 A.I.[9244:240554] Could not retrieve voice [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages ( "en-US" ), Age: 0, Gender: 0, Size: 0, Version: (null) 2023-05-29 18:00:14.416804-0700 A.I.[9244:240554] [AXTTSCommon] Audio Unit failed to start after 5 attempts. 2023-05-29 18:00:14.416974-0700 A.I.[9244:240554] [AXTTSCommon] VoiceProvider: Could not start synthesis for request SSML Length: 140, Voice: [AVSpeechSynthesisProviderVoice 0x6000033794f0] Name: Rocko, Identifier: com.apple.eloquence.en-US.Rocko, Supported Languages ( "en-US" ), Age: 0, Gender: 0, Size: 0, Version: (null), converted from tts request [TTSSpeechRequest 0x600002c29590] <speak><voice name="com.apple.eloquence.en-US.Rocko">How much wood would a woodchuck chuck if a wood chuck could chuck wood?</voice></speak> language: en-US footprint: premium rate: 0.500000 pitch: 1.000000 volume: 1.000000 2023-05-29 18:00:14.428421-0700 A.I.[9244:240360] [VOTSpeech] Failed to speak request with error: Error Domain=TTSErrorDomain Code=-4010 "(null)". Attempting to speak again with fallback identifier: com.apple.voice.compact.en-US.Samantha When we run AVSpeechSynthesisVoice.speechVoices(), the "com.apple.eloquence.en-US.Rocko" is absolutely in the list but fails to speak properly. Notice that the line: print("speak: voice assigned \(voice.audioFileSettings)") Shows: speak: voice assigned [:] The .audioFileSettings being empty seems to be a common factor for the voices that do not work properly on the Mac. For voices that do work, we see this kind of output and values in the .audioFileSettings: speak: voice assigned ["AVFormatIDKey": 1819304813, "AVLinearPCMBitDepthKey": 16, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsFloatKey": 0, "AVSampleRateKey": 22050, "AVLinearPCMIsNonInterleaved": 0, "AVNumberOfChannelsKey": 1] So we added a function to check the .audioFileSettings for each voice returned by AVSpeechSynthesisVoice.speechVoices(): //The voices are set in init(): var voices = AVSpeechSynthesisVoice.speechVoices() ... func checkVoices() { DispatchQueue.global().async { [weak self] in guard let self = self else { return } let checkedVoices = self.voices.map { ($0.0, $0.0.audioFileSettings.count) } DispatchQueue.main.async { self.voices = checkedVoices } } } That looks simple enough, and does work to identify which voices have no data in their .audioFileSettings. But we have to run it asynchronously because on a real iPhone device, it takes more than 9 seconds and produces a tremendous amount of error spew to the console. 2023-06-02 10:56:59.805910-0700 A.I.[17186:910118] [catalog] Query for com.apple.MobileAsset.VoiceServices.VoiceResources failed: 2 2023-06-02 10:56:59.971435-0700 A.I.[17186:910118] [catalog] Query for com.apple.MobileAsset.VoiceServices.VoiceResources failed: 2 2023-06-02 10:57:00.122976-0700 A.I.[17186:910118] [catalog] Query for com.apple.MobileAsset.VoiceServices.VoiceResources failed: 2 2023-06-02 10:57:00.144430-0700 A.I.[17186:910116] [AXTTSCommon] MauiVocalizer: 11006 (Can't compile rule): regularExpression=\Oviedo(?=, (\x1b\\pause=\d+\\)?Florida)\b, message=unrecognized character follows \, characterPosition=1 2023-06-02 10:57:00.147993-0700 A.I.[17186:910116] [AXTTSCommon] MauiVocalizer: 16038 (Resource load failed): component=ttt/re, uri=, contentType=application/x-vocalizer-rettt+text, lhError=88602000 2023-06-02 10:57:00.148036-0700 A.I.[17186:910116] [AXTTSCommon] Error loading rules: 2147483648 ... This goes on and on and on ... There must be a better way?

Media Technologies General iOS Speech Apple Silicon

3.7k

May ’26

Android Music SDK published to maven

Hi, I'm an Android Developer at Radio France, and we're currently integrating Apple Music into our Android application. We noticed that the Android SDK artifacts are currently distributed as raw .aar files, such as: mediaplayback-release-1.1.1.aar musickitauth-release-1.1.2.aar For Android projects, publishing these libraries through a Maven repository would greatly simplify integration and maintenance. It would provide a cleaner setup for dependency management, versioning, and future updates through Gradle. A Maven distribution model such as: implementation("com.apple.music:mediaplayback:1.1.1") implementation("com.apple.music:musickitauth:1.1.2") would make adoption significantly easier for Android teams. Thanks for your work on the SDK and for considering this improvement.

Media Technologies General MusicKit

639

May ’26

iOS Camera app code

What's the underlying code for ios's camera app ? like how does it assign metadata and filename? is it java or something else ? Can another app be created to use the same basic code with added extra features ?

Media Technologies Photos & Camera

640

May ’26

.sf3 sound font

...when will they be supported natively?

Media Technologies Audio

Replies: 1
Boosts: 0
Views: 44
Activity: 2w

AVPlayerNode + AVAudioEngine

Media Technologies Audio

Replies: 1
Boosts: 0
Views: 77
Activity: 2w

kAudioUnitProperty_ParameterList vs AUParameterTree

Media Technologies Audio

Replies: 1
Boosts: 0
Views: 55
Activity: 2w

Is there plans to evolve AUv3 API?

Media Technologies Audio

Replies: 1
Boosts: 0
Views: 84
Activity: 2w

a problem

Media Technologies Audio

Replies: 0
Boosts: 0
Views: 56
Activity: 2w

Native diarization in '27?

Media Technologies Audio

Replies: 0
Boosts: 0
Views: 32
Activity: 2w

How to Fix the Emotionless and Cold Tone of Machine-Read Text?

Media Technologies Audio

Replies: 0
Boosts: 0
Views: 54
Activity: 2w

Cannot generate 2048-bit FairPlay Streaming certificate

Media Technologies Streaming FairPlay Streaming

Replies: 6
Boosts: 0
Views: 1.4k
Activity: 2w

Native diarization in '27?

I'm working on a macOS transcription utility that uses Apple's Speech framework (SpeechAnalyzer) for speech-to-text. This is for meetings/interviews/podcasts where speaker identification is critical. The current challenge is speaker attribution — I need to identify which speaker is producing each segment of transcribed text, and Apple doesn't support this in '26. I have three questions: Is native diarization coming in the fall release? I've reviewed the WWDC 2026 session catalog and found no mention of diarization in SpeechAnalyzer or elsewhere. I'm probably going to use FluidAudio for speaker attribution, but strongly prefer a native solution if one exists or is planned. Do I need to stay with third-party libraries, or is this coming? Core AI and custom models The new Core AI framework was announced for on-device model deployment. Can I train or integrate a custom diarization model via Core AI? If yes, are there sample implementations or documentation for audio-processing models? Core Audio framework updates Were there any Core Audio API-level additions announced at WWDC 2026 that might support audio analysis or speaker detection downstream? I saw no dedicated session, but wanted to verify. Thanks for any guidance on this.

Media Technologies Audio