Stretching an audio file to a certain length can be useful to fix lipsync issues. It isn't immediatly obvious how to do this using Swift. Here is a simple way to do it.
Mathijs Kadijk, Tom Lokhorst
[Update November 2024] The approach described below doesn't work in when stretching very long audio by small amounts. See our follow-up post a more solid solution.
tldr; Add your audio file to an AVMutableComposition and use its scaleTimeRange method to stretch the audio to the desired duration.
Recently we encountered faulty microphone hardware that doesn't provide enough audio samples during recording. This results in an audio file that is slightly shorter than the simulatiously recorded video file. When the audio and video are played back together the audio drifts slowly out of sync because of this.
Since every sample of audio that is delivered from the microphone misses a tiny bit of audio it's possible to stretch out the file without audible distortion. This brings the audio back in sync with the other recorded sources.
Perfect for app demos & presentations; Simply plug in an iPhone and it automatically shows up on your Mac.
To validate stretching indeed worked correctly. We first tried to fix a corrupt audio file by using the following ffmpeg
command:
ffmpeg -i input.m4a -filter:a "atempo=0.9998691805" -vn output.m4a
The atempo
parameter is the audio tempo that will be applied to the output file. The above example slows down the audio slightly. You can calculate the tempo parameter using the following formula: atempo = target duration / current duration
This made the audio align perfectly with the video. To be able to correct this automatically I wanted to have a solution written in Swift that we can embed in our application.
Apple platforms provide a comprehensive set of audio frameworks and technologies. This is very powerful, but it wasn't immediatly clear to me which audiovisual framework can stretch an audio file quickly and easily.
After some research it turned out that AVMutableComposition
is an easy to implement API that can be used to export audio at a different tempo. In combination with an AVAssetExportSession
it can convert the audio file at a high speed in quite a compact method:
import Foundation
import AVFoundation
func scaleAudio(inputURL: URL, toDuration targetDuration: CMTime, outputURL: URL) async throws {
// Load info from the input audio file
let inputAudioAsset = AVAsset(url: inputURL)
let inputAudioDuration = await inputAudioAsset.load(.duration)
let inputAudioTimeRange = CMTimeRange(start: .zero, duration: inputAudioDuration)
let inputAudioTracks = await inputAudioAsset.loadTracks(withMediaType: .audio)
guard let inputAudioTrack = inputAudioTracks.first else {
fatalError("No audio track in input file.")
}
// Create a composition with the current audio added to it on a track
let composition = AVMutableComposition()
guard let audioTrack = composition.addMutableTrack(withMediaType: .audio, preferredTrackID: kCMPersistentTrackID_Invalid) else {
fatalError("Failed to add mutable audio track.")
}
try audioTrack.insertTimeRange(inputAudioTimeRange, of: inputAudioTrack, at: .zero)
// Scale the whole composition to the target duration, this stretches the audio
composition.scaleTimeRange(inputAudioTimeRange, toDuration: targetDuration)
// Setup an export session that will write the composition to the given ouput URL
let exportSession = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetAppleM4A)
exportSession?.outputURL = outputURL
exportSession?.outputFileType = .m4a
// Do the actual export and check for completion
await exportSession?.export()
guard exportSession?.status == .completed else {
fatalError("Export failed, check `exportSession.error` for details.")
}
}
It also is possible to adjust the speed of audio with AVAudioUnitVarispeed
that you attach to AVAudioEngine
. This works great for realtime playback scenarios, but it's really not designed to convert "as fast as possible" and write to a file. For our use case converting real time was too slow, but AVAudioEngine
might be a great choice if you want to play back the resulting audio immediatly.
It seems to be possible to do faster conversion using lower level audio APIs like AudioUnitRenderer
, but that results in much more complex code than the above approach. This approach might be of interest when you want to mix audio from different sources and apply more complex effects.
Perfect for app demos & presentations; Simply plug in an iPhone and it automatically shows up on your Mac.