Enhancing VoIP Apps with CallKit - WWDC 2016

App Frameworks • iOS • 35:53

CallKit is a new framework that lets your VoIP app integrate tightly with the native Phone UI. Learn how you can have your incoming calls displayed fully on the lock screen. Get details on how people can choose to use your app when making calls from the native Phone app's contacts, favorites, and recents. See how adopting CallKit lets your app coexist seamlessly with other active calls, and allow your calls to interact with CarPlay and Bluetooth accessories.

Speakers: Sirisha Pillalamarri, Stuart Montgomery, Nick Fraioli

Open in Apple Developer site

Downloads from Apple

HD Video (1.11 GB)
SD Video (287.1 MB)
PDF Slides (1.9 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Hello everyone. Welcome to Session 230. I'm Sirisha, an engineer on CallKit and I'm joined by my colleagues, Stewart and Nick. And we are excited to introduce a brand new framework today, CallKit. A lot of you here have created VoIP apps before. You have changed the face of telephony and have made the world smaller. And you want your app to be the primary way your users make and receive calls on iOS and we want to help. CallKit is a framework that's going to elevate your third party VoIP apps to a first party experience.

[ Applause ]

Thank you. So while a lot of you have created VoIP apps, for the next 40 minutes let's assume that you and I have created a brand new app called Speakerbox. So Speakerbox is a simple app that can make and receive voice calls on iOS as iOS exists today.

So before we take a look, let me setup the stage for you. So Jane has been traveling around Europe and her concerned parents are trying to reach her to make sure she's doing okay. But because she's international they're using Speakerbox to make calls to her. So let's see how an incoming call to Jane looks like today.

So this is Jane's locked screen. And at first, she gets an iMessage from Dad. And before she could even respond, she gets an incoming call from mom. Jane can't discern the difference between this incoming iMessage notification and the incoming phone call notification because that's just what VoIP calls are today. Just a notification. And if Jane wants to actually answer this call, she has to slide on Speakerbox, type in password, and then that gets her taken to the app and then she can begin speaking.

And this is Jane's unlocked screen and the experience is just as bad. She gets an incoming call from Mom. Did you miss that notification by any chance? That's right. It was just a banner from the top. So wouldn't it be nice if instead the incoming call to Jane looked more like this? This is Jane's locked screen still. And she gets an incoming call from Mom. Notice the full screen Native UI.

[ Applause ]

And Jane can just slide to answer and start talking to her mom. And from the unlock screen, it's the same, rich, Native UI with answer and decline buttons and your custom ringtone playing.

[ Applause ]

And wouldn't it be nice if VoIP calls could interplay with other calls on the systems? Perhaps at the telephony call or a FaceTime audio call or maybe another VoIP call? And even have VoIP calls get started from recents, favorites, and even get assigned to contacts. And get started from Siri, Bluetooth, and even get do not disturb and blocked functionality. That is CallKit.

[ Applause ]

All right. So today we're going to go over the architecture of CallKit and incoming call flow, and outgoing call flow, and then end with some more details about the API. So let's get started. All right. So over here, we have all our system services like Bluetooth, Siri, CarPlay, and then our Native UI. And over here, we have all our VoIP apps like Speakerbox. These are two separate entities right now. Calls made on Speakerbox are non known to our system and then our services.

In iOS 10, we have adopted CallKit in all our system services. So now calls made on Bluetooth are known to our system UI via CallKit. So if Speakerbox wants a similar experience, it needs to adopt CallKit. So now calls made on Speakerbox will get known to our system via CallKit and then system can publish these calls to the rest of our services. So let's talk a little bit more about Speakerbox.

So here we have Speakerbox and all of its code. It talks to its network and has its own app UI and we're going to link CallKit. So there are two primary classes in CallKit that we care about. The first is the CXProvider. So the provider is the class that Speakerbox will use to let the system about any out of band notifications that have happened. The second class is the CXCallController. The CXCallController is the class that Speakerbox will use to let the system know about local user actions.

So let's take a deeper look at these two classes starting with the provider. So the provider, like I said earlier, is the class that we use to let the system know about out of band notifications. That is these are not user actions but actually external events like, for example, an incoming call coming to Speakerbox. Contrasts this with the CXCallController. The CXCallController is the class that Speakerbox will use to let the system know about requests from within the app itself. That is these are actually user actions and like internal events, like a start call action.

By using the CallController, Speakerbox gets to interplay with other calls on the system. Say, for example, there's already an active telephony call and the user wants to start a Speakerbox call from Speakerbox's UI. By using the controller, the system gets to know about the start call action and then the system can tell the telephony provider to hold its call so as to let the Speakerbox start its call.

So let's take a look at some examples of this. So the provider is used to report out of band notifications. Like, for example, an incoming call coming to Speakerbox or maybe an outgoing call getting connected. Or that outgoing call ending on the remote side. Whereas the controller is used to request actions from the system like the user wanting to start an outgoing call or the user wanting to answer the call from Speakerbox or maybe ending the call from within Speakerbox.

So when the provider wants to communicate to the system it uses the CSXCallUpdate class. And when the system wants to let Speakerbox about any user interactions, it uses the CXAction classes to let Speakerbox know. And the controller communicates to the system user actions bundled up into a CSTransaction to let the system to know about these user actions.

So that was a lot of information. So let's take a look all of that in an incoming call flow. So here we have Speakerbox and Jane gets that incoming call from Mom. The incoming call comes to Speakerbox and then Speakerbox creates a CXCallUpdate and using the provider sends that to the system. And then the system can then publish that incoming call to all our services including our UI.

And if Jane wants to answer the call from within our UI, the answer action comes to our system. Our system can then tell Speakerbox, we have the provider to the CXAnswerCallAction and then Speakerbox can answer the call however it needs. And if Jane now wants to end that call from within the app's own UI, the end action comes to the controller.

The controller bundles it up into a CXTransaction and that comes to the system and if everything looks okay, the system sends that back up to the Speakerbox via the provider and then Speakerbox can end the call as necessary. All right. Now let's take a look all of that in a demo with Stewart.

[ Applause ]

Thanks, Sirisha. So now that you've heard about the benefits of CallKit, I'd like to show you how to adopt it in an existing VoIP application, the Speakerbox app which Sirisha mentioned. I'll first show you how to adopt it to handle an incoming call. So I'll first open up the Speakerbox Xscript Project.

Now before I dive into adopting CallKit in this app, let me just show you a little bit about how the app is structured so you have a frame of reference. So we have two main classes in the app right now. The SpeakerboxCallManager class which maintains a list of calls in the app and has certain operations such as starting a call and ending a call.

And our other primary class is SpeakerboxCall. This is our model class which represents a single call in the app and has metadata about it as well as some call back blocks so we can be notified about the life cycle events of the call as it progresses. So like Sirisha mentioned, the first we need to do when adopting CallKit is to create a CXProvider and set its delegate. So I'll do that by creating a new file called ProviderDelegate.

So into this new file, I'll bring in some code that we've already written but let me walk you through what this is doing. So in our initializer we pass in a reference to the SpeakerboxCallManager class. This allows the ProviderDelegate to access the app's list of calls and references them by UUID which we'll show later.

Next, we have -- we create a CXProvider instance and we pass it -- something called a provider configuration which we see here. The provider configuration is something we'll go into more detail later in the session but this allows our app to configure to the system a few options about how it should behave.

Now back in our initializer, we set this class as the delegate of our provider and then if necessary, we request authorization to use the provider. So great. So now we have our provider and our delegate set but we need to create this in our app delegate. So I'll declare a property for our provider delegate and I'll instantiate that in the application did finishing launching with options method.

Cool. So now we have a provider in our app. How does the app respond to an incoming call? So if I scroll down, we can see that the app currently uses PushKit to learn about an incoming call via push notification. And if we look at what this code is doing, we can see that it looks at the dictionaryPayload from our push notification and gets some metadata about the incoming call such as UUID and handle which is an identifier representing who's calling.

And then we call the display incoming call method and we can see here that this is where the app hosts the local notification for -- to show the user that incoming call. But when using CallKit we no longer have to rely on a local notification to show this.

We can, instead, use the system's full screen native incoming call UI and we want to do instead because it's so much richer of an experience. So to do that, I'll go back to the ProviderDelegate and I'm going to create the helper method which will allow us to call the API on our provider.

I'll call that report IncomingCall and in this method, I'll start by creating a CXCallUpdate which contains metadata to represent that incoming call. And then we'll call the report NewIncomingCall method on our provider and this is the step that notifies the system about that incoming call. Now, in the completion block, we'll check if there is an error and if there wasn't, we create a SpeakerboxCall instance and configure it and we add that call to our app's list of calls. We'll go into more detail later in the session about why there could be an error here but suffice it to say, there's certain conditions in which the device will not be prepared to accept an incoming call.

Okay. So now that we have this helper method on our ProviderDelegate, I'll go back to my app delegate and just replace this code that posted the local notification with a call to our helper. So great. So now we're using CallKit to report an incoming call that we learned about from a push notification to the system and the system is showing the full screen Native incoming call UI. Well what happens when the user presses that green button and answers the incoming call? When that happens, our ProviderDelegate will receive another method that we need to implement.

And that is the ProviderPerformAnswerCallAction method. Let me walk through what this is doing. So we start by getting an instance of our Speakerbox call class corresponding to the UUID of the call that we're answering. Next, we call the answer Speakerbox call method and this is some code that was elsewhere app prior to now and this talks to our network to tell it to actually answer that call. And now we do it here in our ProviderDelegate call back.

And last, we call fulfill on our action. In CallKit, every action must either be fulfilled if it was successful or failed it there was an error in processing in that. And we can actually see a few lines above here, if we were unable to find a Speakerbox call for this UUID, we call the failed method on our action to indicate that to the system.

So this method handles answering the call but what about when the user wants to end the call? For that, we have a similar delegate method called ProviderPerformEndCallAction. And this is very similar. It looks up a call based on the UUID. It talks to our network using endSpeakerBoxCall method. It signals that that was successful to the system by calling fulfill. And it removes the call from the app's list of calls.

So we're almost done handling the incoming call but there's one other thing we need to consider when handling that incoming call. And that is our call's audio. So when using CallKit, you will no longer activate your app's audio session directly. Instead you will only configure the audio session and the system will actually activate your app's audio session for you at an elevated priority. So let me show you how that works.

Back in our PerformAnswerCallAction method, I'll insert a call to this function configure audio session. And this does like it says, it configures the app's audio session but does not activate it. Instead our audio session will be activated by the system and after that happens, we'll receive a delegate call back called Provider didActivate audioSession. And this is the point where we begin processing our call's audio.

Now the last step is to stop processing our call's audio in the PerformEndCallAction method. Okay. So that's all the code we need to handle an incoming call using CallKit. And now I've got on my device setup to mirror to the screen and let's build and run the app on the device.

So for the purpose of this demo, I'll just simulate an incoming call using this button at the bottom. And now when I press this, we'll see our call using Speakerbox presented using the full screen Native incoming call UI. And I can just accept that call and our ProviderDelegate receives the PerformAnswerCallAction method. It fulfills that and then finally, when I'm done talking to Jane I can just end the call and our app, ProviderDelegate, fulfills that as well. So that's a demo of handling an incoming call using CallKit. And I'll now pass it back to Sirisha. Thanks.

[ Applause ]

Thank you, Stewart. So let's take a look at what Stewart just showed us. So first, we reported incoming calls to the system via the report new incoming call API. Then we handled an answer call action by implementing the delegate method perform action answer call action. Then once we've actually answered the call, we fulfilled that action by calling the fulfill API. So CallKit can do more than just answer calls. Here a list of all our other actions that we support. As you can see, there's holds, group, play DTMF, and many more.

So now let's spend a few seconds talking about multiple calls. So say Speakerbox can handle more than one call. Right here, in this example, there's already an active Speakerbox call going on and then there's an incoming call waiting call. Now if the user wants to end the active call and the answer incoming waiting call from within the Native UI, the system will send Speakerbox a CXTransaction.

A CXTransaction is nothing more than just a list of one or more actions. In this case, it's a list of end and answer actions. And once Speakerbox has handled each of these and performed each of these actions, it needs to fulfill each of them individually so that the system knows to transition the UI. All right. Now I'll hand this off to Nick to take us through an outgoing call flow.

[ Applause ]

Thanks, Sirisha. So let's check back in with Jane. She talked to her mom yesterday but today she's feeling a little homesick and would like to check in with the home front. So let's see what it takes to make an outgoing call. So the first thing that happens when Jane goes into recents and taps to call her mom back is our app is going to get launched with a start call intent.

Now some of you may have already seen the introducing SiriKit session where we introduced intents but if you'd like to find out more information, you can always watch the video online. In a nutshell though, an intent is an object that represents a desired user action, is wrapped up in an NSUser activity, and then passed back to our app.

So our app has received the start call intent and now we've constructed a start call action based on the information on that intent. We'll take that action and request it via the CallController. The CallController will then pass that action through to the system and if it's accepted, it will come back to our app via the ProviderDelegate.

Then finally our app can take that action and use the necessary commands on our network to make that outgoing call. So now let's take a look at the life cycle of the outgoing call from this point. So we've just begun performing the start call action. So the call is in a starting state.

From here, we'll finish executing the action and fulfill the action to move the call to a started state. Then when the remote side answers the call, we'll notify the provider that the call has started connecting. And then finally, we'll tell the provider that the call has connected to inform the system that the two parties can begin talking to each other. So now I'd like to bring Stewart back us -- back up to give us another demo.

[ Applause ]

Thanks, Nick. So now I'd like to give you part two of our demo of adopting CallKit and Speakerbox. This time how to use it to handle an outgoing call. I'll open up the Speakerbox Xscript project again. Now back in our app delegate class, we can see that Speakerbox already handles being launched with a URL to start a new call.

But when using CallKit, the process of dialing an outgoing call is similar but when the user starts a call from places like the phone apps, recents tab, or a contact card or from Siri, the app will be launched with an intent and that will be given to us via an NSUser activity. So the first step in using CallKit here is to implement the application continue user activity method.

And taking a look at what this is doing, we look at our NSUser activity and we get the value of the startCallHandle property. This is some code we've already written to look at the NSUser activity, get the intent, and return the handle which is a string which represents the person we want to call. Now once we have our handle, the process of starting a new call is identical to the URL handler above. We just call the start call method on our call manager. So now let's look at what this method is doing.

So we can see in the SpeakerboxCallManager class that we start a call by creating a new instance of our model class, Speakerbox call, and then we call the startSpeakerBoxCall method which talks to our network and actually starts that call. And finally we add this call to our list of calls.

But now this is not yet using CallKit to notify the system about our intention to start a new call. And we need to do that. So I'm going to remove this code for now and I'm going to add parts of it back later. The first step of adopting CallKit in this class is to import the framework.

And then I need that second class which Sirisha mentioned, the CXCallController class. And now that I have that, in our start call method, I need to create a startCallAction and configure it with the handle that we want to dial. Then I create a CXTransaction containing that action and finally I call request transaction on our callController to request that from the system. Now just to reiterate a point that Sirisha mentioned earlier, you maybe wondering why we need to request this transaction from the system when it seems like everything that's happening here is happening within our own app.

And the reason for that is that when you attempt to begin an outgoing call, there may already be a call elsewhere on the system. For instance, if the user is in a phone or a FaceTime call, or even a call from another VoIP app. If that happens, the system needs to hold that call before your call can begin. So that's why we need to request action from the system to let it know about those intentions.

So now, once the system receives and improves our start call action, it's going to send that back to our app via our ProviderDelegate. So I need to implement another method on our ProviderDelegate class. This time it's the provider perform StartCallAction method. And let's fill this out together. So we start same as we did before of creating a Speakerbox call model instance and configuring it with the handle which we're going to dial.

Then we configure our audio session just like we did when answering a call previously and next, we need to configure a few properties on our call. And there's a lot going on in this one so let's walk through it. We set two call back blocks on our call. The hasStartedConnectingDidChange and hasConnctedDidChange. These are asynchronous call back blocks which will be invoked when the call progresses to connected and finally to connecting and then to connected.

And in these call back blocks, we report to the system the progress of the call. And it allows the system to be aware of that and reflect it in the UI. So with that set, we can now call the startSpeakerboxCall method on our call. And this again, talks to our network and it starts it outgoing. We fulfill the action to indicate success to the system and add the call to the callManager's list.

Okay. So this handles starting the outgoing call but what about when the user wants to end that call and this time from within our own -- or the app's own UI? So for that, we need to go back to our Speakerbox callManager class and look at the end method. And we can see here that just like the start call method before, this is not yet using CallKit. So I need to replace it with code that does.

I'll just drag this in and we can see that this creates an end to call action and it wraps it in a transaction and then requests that transaction from the callController. But this time, we don't need to do anything further back in our ProviderDelegate because as you can see, we already implemented this earlier in the demo. So that's all the code that we need to handle an outgoing call and I'll now build and run the app on the device. And give you another demo.

So I've built and updated the app on the device but to show you an outgoing call, I'd actually like to go back to the phone app under a contact card. And we can now see the Speakerbox app listed right on the contact card and I just can tap this to launch our app. And when we do, our app will be launched.

It will receive that intent and it will start a call using the callController. It will request a transaction which will be approved by the system and fed to our ProviderDelegate which will then fulfill that action. And here we go. We just saw that happen. The call is now active.

So now once the call is active, if I home out, we can see something new. For the first time, the green double height status bar is shown for our app. Previously this was reserved for the native phone and FaceTime calls only but if I tap this, we'll be sent right back to Speakerbox.

[ Applause ]

Thank you. Then when I'm done talking, I can end the call and this will request an end call action from the system which our ProviderDelegate will fulfill. So that's all demo for handling an outgoing call using CallKit. I'll pass it back to Nick to recap and go over a few other API details. Thanks.

[ Applause ]

Thanks, Stewart. So first let's take a quick look at what we just saw. The first thing that happened is Speakerbox received a start call intent, created a start call action based on that intent, and then requested that start call action. Then the start call action was received via the ProviderDelegate, executed, and then fulfilled. And then finally, Speakerbox reported that the call moved to connecting and then finally connected.

So, now that we've seen a few basic flows let's dive into some of the details of the API to really round out our usage of CallKit. And in particular, we'll take a look the provider authorization and configuration to help customize our app in the native UI. We'll take a look at how to handle action errors and system restrictions and then finally, we'll take a look at how CallKit plays a role in our call's audio on our app.

So as with other APIs like contacts and core location, CallKit requires permission from the user to use. And because of this, one of the first things that your app should do when it launches is check its current authorization status. This may have changed since the last time your app was launched. If the user went into settings and maybe enabled or disabled your app.

Then from here, if you discover that your app's authorization status has not yet been determined, you should request authorization for your app. And what this does is it tells the system to display an alert to the user to request permission. And this is done on behalf of your app and because it's done on behalf of your app, you should always be sure to include an informative usage string in your app's info.plist. And lastly while your app is launched, you should always be sure to observe and listen to any authorization status changes that may happen so that you can always show the most up-to-date UI to your user.

So now let's talk about the provider configuration. So the provider configuration is a way for your app to customize its in call experience directly in the Native and Call UI. Some of the things that can be customized are your app's localized name to display for your calls. This also includes certain capabilities such as whether your app supports video calling.

And this even includes things like specifying your own masked image icon to show directly in one of the buttons of the end call UI. And when tapped, this will take the user directly to your app. Just a note though that this support for this app icon will be available in a future seed [phonetic].

So we've taken a look at what happens when things go smoothly when executing actions but what happens when we run into a problem? Well take a look at the outgoing example from before. We've just started performing the start call action. Let's say that in the process of performing that, we run into an error.

Maybe we don't have good connectivity with our network server, so we can't actually make the outgoing call. Well in this case, we should fail the start call action. And the reason this is important is because it informs a system that something went wrong. And then the system can, in turn, inform the user via things like call failure UI.

And along with these action errors are action time outs. So each action on the system has a particular time out associated with it. And these time outs are important because they assure that actions initiated by the user are performed in a performance and responsive manner. So because of this, your app should always be sure to perform those actions in a timely manner. However, if an action does time out, your app will be notified via an appropriate provider delegate method at which point it can react appropriately.

So based on the current state of the device, certain system restrictions may be in place. And we'll take a look at an incoming call as an example. One of the reasons why your calls -- your app's incoming call might be denied is, perhaps, the user has disabled your app and it's no longer authorized. Or maybe the remote caller for the incoming call is in the user's blocked list. Or perhaps the user has enabled do not disturb and doesn't want to see any incoming calls at the current time.

Well for all of these cases, you app will be notified via completion handlers on the API. For example, the reportNewIncomingCall API passes back an error in its completion handler. And as you can see here, our app checks the error that's returned. Sees that the error code is filtered by do not disturb and then handles that appropriately. So now let's take a look at audio with CallKit.

So with CallKit your app gets a lot of great benefits with its call's audio. And one of the biggest benefits it gets is that its audio session will be at a boosted priority on the system on par with phone and FaceTime calls. And what this means is other apps on the system will not be able to interrupt your app's call audio. And in addition to this, CallKit is aware of certain audio routing hints on this system which means it can decide where to route audio based on things like the user's current accessibility settings or currently connected Bluetooth devices.

Let's take a look at an incoming call flow as an example. We know that, at some point, during an incoming call our app will receive an answer call action and then fulfill that answer call action. Well after receiving the answer call action is a great time to configure our audio session. And that's because we know that the call will soon move to connected.

Then when we fulfill the answer call action, the system will automatically start an audio session for our app at a boosted priority and then let our app know that this has happened via the did activate audio session provider delegate call back. And this is an indication to our app that it is time to start media for the call. So that was just a quick look at some of the details of the API to help us complete our adoption of CallKit.

So now we invite you to adopt CallKit in your existing VoIP apps or to build a new VoIP app from the ground up using CallKit. With CallKit you'll integrate directly into the system's calling infrastructure and once you've adopted CallKit you'll maintain feature parity with native calling services. But most importantly, with CallKit your app will increase its visibility across the system whether it's with full screen, incoming alerts on the locked screen, appearances in recents, favorites, and contacts, or integration with Siri, CarPlay, and Bluetooth.

For more information you can check out our session's webpage at developer.apple.com where we'll also have the Speakerbox sample code that we've been referencing throughout this presentation. We've got a lot of great related sessions for you. So be sure to check out more information about Siri, intents, networking, and audio. Thank you all so much for joining us and we'll see you in the labs.

[ Applause ]