Privacy Support in iOS and OS X - WWDC 2012

Core OS • iOS, OS X • 43:39

Learn about new iOS and OS X privacy features and get details on new and updated APIs. Hear best practices for delivering great features and respecting your customers' privacy.

Speaker: Erik Neuenschwander

Unlisted on Apple Developer site

Downloads from Apple

HD Video (99.7 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

In terms of its behavior, well, it's backed up. We're going to maintain that in the backup set. If it gets restored back onto that same device, that whole table basically will come back down. So it persists across as it restores from backup. However, we won't restore that identifier onto a different device.

So if someone backs up an iPhone and then purchases, say, a new iPhone and restores that backup set onto that new different device, then these identifiers won't come along for the ride. This is a long way of saying they behave just like the UDID does today in that the UDID changed across devices.

Similarly, the vendor identifier for your apps will change across devices. So it's the behavior that you've had with the UDID. So now we have two replacements. We have that one that can be used for your individual application, random number, the application identifier, and a second one which is shared across your applications. But this still doesn't cover all of the key uses of the UDID. So we have one more new API, and we call that the advertiser.

So we have one more new API, and we call that the advertiser. This is the UDID. And it's a new UDID identifier. This, again, is new API. It also hangs out on UIDevice's current device. And you can call it with identifier for advertising. This identifier is unique to the device. And it's the same across all applications on the device.

It's used for advertising. And, in fact, we know it works well for advertising because iAd, that's Apple's ad network, has already converted over to using it. For every device running iOS 6 or later, iAd now uses the advertising identifier exclusively, and no longer uses the UDID as it once did for advertising purposes. If you're familiar with ads, things like frequency capping, conversion tracking, this identifier works for those purposes.

Because it's not a hardware identifier, it's something managed by software, that means the user can reset it by tapping erase all content and settings, and that will create a new advertising identifier on that same device. And then lastly, again, to compare it to the other ones, this identifier is backed up. It's restored back onto the same device. It will persist that way.

But if you restore that back up onto a new device, it will be a new advertising identifier. So this one, like the vendor ID, behaves like the UDID in terms of its behavior across devices. So for the replacements for UDID, we have these three new APIs, the application identifier for single apps, the vendor identifier across all of your applications, and the advertising identifier for advertising. And they have these behaviors. This is exactly what I said before.

It's maybe so you can go back in the video. The scope, lifetime, the fact that all of them are backed up, and the fact that only the advertising identifier will be restored across devices. So you've heard me say replacements a few times over the past ten minutes, and so this slide shouldn't be too much of a surprise, that the future of UDID is no UDID.

There are these new replacement APIs available in iOS 6, and I encourage you to begin your transition now. For any new application you create, you should build it on top of these new APIs. And as you update your existing applications, move them up. And you can see that the new UDID is now on top of the existing ones.

And that's because the new UDID and similar identifiers will be disallowed for use in the future. And when we do that, it's not going to change anything about the legacy behavior of existing applications on users' devices. They'll still be able to call, for instance, the UDID and get back that same number, so it won't be disruptive. However, for those new applications and updates, use of the UDID and similar identifiers won't be allowed. So it's motivating. And start your new UDID. Bye. Bye. Bye.

Let me talk next about data isolation. And I'll actually start by defining the term. Data isolation is just when the OS is going to get between your application and sensitive user data. It does this in a way which is largely transparent to your application. It's just that you call existing API, and that will trigger a user consent dialogue or alert. And you guys have seen these, actually, for location services. I have two examples up there. On the left, we have one from iOS where camera is requesting access to location. On the right, you can see Safari and OS X Lion doing the same thing.

So when these alerts, you also saw that the user could make a yes or no choice there. And so if the user says don't allow, then the application doesn't get access to whatever data is covered by that permissions dialogue. On the other hand, the user can say yes, it works.

And regardless of whether the user says yes or no to that permissions dialogue, the OSs offer a choice for the user to review and to change his or her decision afterward, a no to a yes, a yes to a no, either by going into iOS settings or inside of OS X system preferences.

[Transcript missing]

So let's start with the really dead simple way of how it works today. This is just to be grounding for the next one, right? If you want data today in a non-data isolation world, you call createX, you get back the thing with data, right? It's just as simple as that. You do it every day. So let's talk about how Contacts works on OS X Mountain Lion with data isolation.

Where it starts off again, you call some address book API, say, and then what happens is that OS X will go and ask permission. That dialog box will come up requesting the user's permission. Let's say the user says, "Okay," so you get permission. Then the API that you called will return to you the object with all of that data.

So this covers the API that you can call. It's the purpose-specific API, which is a whole host of APIs inside the address book framework. Two examples might be the shared address book or if you init AB person or AB group. These are the kinds of API that will trigger that permission check.

And as you saw in that last slide, the call does block while the user gives or makes that choice about permission. And so this is a user action, right? It will take a relatively long time compared to running code. And so if your application is in a state where it needs to remain responsive, you should wrap the call to these APIs inside of a dispatch block.

But when it comes back, if you've been granted access, if the user said yes, it's just like the was before data isolation that you'll get back a populated object. It has all the data. If the user denies access to contacts, then you'll get back a nil object. So this is something you're going to want to check for, whereas it would have been unlikely in the past that you would have gotten back a nil object in the same way.

And then in addition to the purpose-specific API for the address book framework, any other APIs which specifically use contacts data, so sync services, spotlight if you're searching for a contact, Apple script, these will also trigger those same permission dialogs. So it's both the purpose-specific APIs and other system services that explicitly access contacts data.

So let's talk about sandboxing and how it relates. And they're really separate things. The sandbox check still applies. So for those of you that are submitting to the app store, sandboxed your application, you know that the address book is already -- it's by default, it's outside of the sandbox. And to bring it into the sandbox, you have to request an entitlement. And you should only use the entitlements that you need.

And so in this case for contacts, that entitlement is com.apple.security personal information address book. Descriptive long string. It's actually easier just to check inside the Xcode UI. There's a box which says allow address book data access. You check that, the exact same thing happens in your entitlements file.

So if you have -- if you have that entitlement, then the sandbox check will pass, and you'll get to that permission check that I just went through. But the sandbox check occurs first. If your application is sandboxed and does not have the entitlement for address book data access, then the sandbox will fail that request even before the user is prompted for permission. So the two work together. You sandbox your application with the necessary entitlements, and then the user gives you permission whether or not to access the contacts data. And that's how it works in OS X Mountain Lion. Amen.

So this is the same slide, but now let's take it from here and talk about iOS. And in iOS, you have the call to create the object, and then you get back actually immediately an empty object. And that's because if you think about it, a lot of this data request happens at launch in some of your applications.

Maybe it doesn't need to, and we'll talk about that more later. But your application has a limited amount of time to launch and needs to remain responsive. And so that permission check is actually done separately by the OS. And then if you receive access to the data, then you'll receive a change notification stating that there's now data in this object.

So that initial access, it's synchronous in the sense that you call and it comes back. But the permission check is occurring outside of that. That initial object may be empty. So it's very important that for the classes that support it, you handle change notifications. And though I'm bringing it up here in the context of data isolation, the change notification is something you already should have been handling, right?

Because in a world with iCloud and data changing from underneath your application, these change notifications were how you kept in sync already. But in this case, a change notification will be more commonly occurring for all of your applications that are accessing contacts data. And so it's important that you really pay attention to it now.

Because that story that I just told, we're finding that it might be a little bit confusing or difficult to handle that permission check out of band, we actually have a new API we'll be adding to iOS 6. That said, it's not in the seed that you have now, so don't go looking for it.

And I'm now going to describe something that we're working on. And so if when you read it in the release notes of a later seed and it doesn't look like this, well, you know, don't blame me. I'm trying to give you the early idea of it. But who knows? We may still tweak it. But the point of this API is that you can call it for each data class and get your individual application status.

So you could say, okay, does my application have access to contacts? And you might think that's a yes or no question, but as we're designing it, this API will probably have four different return values. The first one is not determined. That means your application has never requested access to contacts, so we literally have no idea what the state is. It's simply indeterminate.

The next two, more straightforward. If you've requested permission and been given it, then you've got granted -- you've been granted permission to the data class, and we'll return that. Or if you requested permission and the user said don't allow, then we'll say, no, you've been denied. But then that last one, restricted. So we've also enhanced restrictions, kind of the parental controls or things that enterprises can do with managed devices, to also include each of these data classes and whether or not the user is allowed to even answer that permission question.

So in the case where, say, access to contacts has actually been disallowed from you and that's done by restrictions, this means that your user may not even be able to make that choice to allow your application access to contacts. And so we're proposing this fourth return value to allow you to tailor your user experience to the cases where your user may not have control and therefore not be able to give your application access to the data. So four return values: don't know, yes/no, and then no, but the user may not be able to change it. From your application's perspective, denied and restricted will look the same in that you don't have access to that data class.

So I talked about the classes that we're adding to iOS. It's contacts, calendars, reminders, and photos. Let me go through each of those now, starting on the iOS side with contacts. So in iOS, it's access to the A/B address book ref that's managed. That's where the permission check will happen. So you know this method, A/B address book create.

Well, we've actually deprecated in iOS 6, and so we have a new method, A/B address book create with options. The options parameter is reserved for future use, and that pointer to the CFError is something that can tell you if you've been denied access. So that'll actually be set to a value if your application is denied access to contacts.

Now, of course, deprecated APIs, you should move off of them, and so you should move on to the new one, but I do want to be really clear that even if you call the deprecated API, the permission check occurs there as well. So permission happens no matter which one of these you call, but that second one gives you a little bit of additional information in the case where you've been denied access.

Initially, first time your app runs, it will be in that not determined state. And what this means concretely is that AB Address Book Create With Options will hand back to you, kind of in that immediate synchronous way, it will give you back an empty read-only object. What you should then do is register that address book instance you got with a callback method by calling AB Address Book Register External Change Callback.

You pass in your callback function. And then when your callback gets called, you should call AB Address Book Revert. And what that does is it takes the AB Address Book Ref that you got initially and makes sure that it's in sync, which in the case of a permissions check means that you go from an empty read-only object to one with, you know, 500 contacts suddenly in it, because it's as though the entire address book was just synced into your copy of it.

And so you'll want to definitely sync up the address book reference that you got initially, because that's how you'll really get access to the data after the user has said yes. otherwise you're just holding on to a stale instance.

[Transcript missing]

If you change the checkbox to say whether or not you want the application to have access to it, then you can choose -- then the user gets to choose whether to quit the application or not.

: On iOS, inside settings at the top level, we have a new top-level privacy item there. You can see it with the hand icon. And if you tap into that, there are the five different data classes that I've been talking about. And then I show you photos there on the right. And note that text at the top that's shown to users inside the photos privacy area, that photos stored on your phone may contain other information, such as when and where the photo was taken.

And that's present there to remind the users, just as I reminded you, that photos, along with their metadata, are given to applications to which the user's given their permission here. So we also -- I talked about how restrictions have been updated. So if you go into restrictions inside of settings on iOS, then there's a new privacy section where those same data classes are shown.

And if you tap into one of them, you see at the top "allow changes" and "don't allow changes." And this is where, through parental controls or through device management in an enterprise, someone can choose both whether or not the individual applications should have access and then also whether or not the user of that device is allowed to change or respond to new access requests from new applications on the device.

So this is how restrictions plays into the data isolation. So let me now just broaden it and talk not about particular changes we've made to our platforms, but instead go through some comments on best practices that you can apply when you're doing data collection or doing things that may impact users' privacy.

And most of you are probably client-side developers, but a lot of what I'm saying also applies as part of business decisions and to server programmers as well. I think you'll see as I go through the examples. I'm going to cover three areas. First is transparency and talk about user control, and then data collection techniques that you can use to best respect privacy.

So starting with transparency, privacy policy or statement. These things are really good. It's important for your company to have one, and in fact, there's a new feature now in iTunes Connect that you can submit a link to your privacy policy or your privacy statement to Apple. And in a future release, an upcoming release of the App Store, the page for your application will display that link so that a user is able to view your privacy policy before ever even downloading or purchasing your application.

So this is a feature coming to the App Store, but you can start submitting your privacy statements now through iTunes Connect, and then they will become visible to users in that App Store update. So this is a very good thing to do. But even after explaining to the user up front how you're going to collect data, how you're going to use data, if possible, it's important to give the user opportunities to inspect precisely what your application is providing about them. So if you can provide UI inside your application or a method on the website for the user to inspect that, again, it just goes to open yourselves up, show more transparency, and have the users be better informed about the data collection.

Now, you can be as transparent as you want, but it's also important that you give users control. And so part of that is asking permission, and really asking permission with context. So I went through the purpose strings for the iOS and the OS X permissions dialogs. Those are part of it. It's also important, and this is more under your control, to ask when you need access to the data.

So it's a bad situation in many ways to just ask for everything up front when your application launches. Number one, it's not a great user experience. User just bought your application, launches it, gets a bunch of alerts. But from a performance perspective, if you're not needing access to the data right then, remember, we're only requesting permission when you're instantiating one of these objects to access the data. So if you don't need access to the data, that actually means you're instantiating objects you don't need to use right then, and that's slowing down your application launch. So it has a performance impact.

It's better if you ask just before you need it. Maybe the user taps a button that says, "Oh, I want to upload a photo or something," and then you say, "Oh, well, let me go get access to the photos database." A user's more likely to understand why your application needs access to that data, not only if there's a purpose string in the permissions dialog, but also if they have an idea that the action that they just took is the one that's causing your application to request that access. If they can connect these two actions in their mind, they're more likely to understand the context of your request.

And then lastly, even if you've given them that initial choice of yes or no, it's also important to give them control thereafter. Maybe they changed their mind, and it could go either way, right? They could have said no initially, done some additional research, decided that they really do want to give your application access, or they want to turn it off, maybe even temporarily for some reason. And so allowing the user to remake that choice over time is another way to increase their control over their privacy.

And I'm just going to keep on beating home. Fail gracefully, because your application should try to provide the best experience in light of the control that you give in the user. So in terms of data collection techniques, I really believe it's true that all collection efforts reduce privacy. If you're getting data, that's impacting the user's privacy.

That doesn't mean that the data collection is necessarily bad or wrong or anything of that nature, but you do have to think hard about the negative that's coming from the data collection and make sure that that's outweighed by the positive in terms of how you're improving your application, providing a better experience to the user or a more tailored experience to the user so that the positives outweigh the negatives from the data collection. And as I mentioned, I think that's true both on the app and the server side.

And as you collect data from users, it's important to remember that holding on to rich data has risks. I think you guys read tech news. You've probably seen in the past couple weeks a number of passwords being leaked. We've seen other data breaches in the past. And if you have that data, then an attacker might be able to get access to that data.

And wouldn't it be just really annoying for you and sad for your company's brand if an attacker ferreted out data that you were holding on to that you didn't even need? So trying to reduce the data down to just what you need to provide your products or services reduces the risk to your company as well.

So let me go through some techniques that you can keep in mind. You're never going to use all of these. They're never each appropriate all the time, but six different ways that you can think about how to reduce the data that you're collecting while still satisfying your business or your engineering goals. So I'll go through each one of these in turn, starting with anonymization.

So I'm going to use logline. So let's start with this first logline. And you can see it's something that could be in a server log. It could be in syslog on the device. It could just be something you're writing down and submitting to some back-end analytics of your application.

But in this case, it says error, and there's an illegal token, and then a whole path, right? And paths are very interesting or scary things because if something's in the user's home directory, well, the home directory is named with the user account name, and a lot of Mac users, that's exactly their name.

So here we've collected the fact that this user is John Appleseed. And in addition to that, we know a couple other interesting things out of this logline. We know that there's some Project Zanzibar, whatever that is, and that there's probably something going on in fiscal year 2013. All of this when we were really perhaps focused on this illegal token. So we could do better and not collect that whole path.

Maybe you do want to know the files that are causing illegal tokens. Okay, so we could log this instead. And this is clearly better because we're not collecting all this ancillary data that wasn't related to the actual document. But we still have a file name. That file name could be named Project Zanzibar fiscal 13, right? There's still a lot of information there.

And so if you're really maybe focused on when these illegal tokens occur, you might want to just step back and think, I want to know what the type of the file is. I don't care about the file name at all, but I do want to know that it was a keynote file that had this token. That maybe tells me where I need to go test more. And so this would even further reduce and make sure that there's no user-specific string there in the way of the path or of the file name that you're collecting, just the type.

Let's think about aggregation and now take that same log line but realize that it's registering a particular event. It might even have a timestamp associated with it, but it's a particular time and instance where that token was seen. And maybe this isn't important to you either. If we think back to the fact maybe all you cared about was that it was a keynote file, maybe what you really care about is the frequency at which these are occurring. So you might just log over the lifetime of your application or once a day or once an hour.

You might just write down the number of times you've seen an illegal token in each different file type. So here we have it occurring 21 times in a keynote file, three times in some other kind of file, and this would point you to the fact that the keynote files are a much more interesting area to investigate than this other file type.

And in this log line, you haven't logged anything precisely about a single event. It's just a record of a set of events that occurred over a period of time. So you're getting much less data about the usage pattern while still getting all the data about the occurrence of these illegal tokens. That's aggregation.

You can also think in terms of sampling, statistical sampling. So if we consider this again as a particular event, you can ask yourself, well, do I need each and every event from every single user who's opted into my collection? And maybe you don't. Maybe you could say, well, I could actually sample from just one in ten or one in 100 or one in 1,000 of the user's computers.

And if it's a statistically random sample, it will be representative of your population, and you'll get overall the same distribution and same shape of the events without actually gathering data from the vast majority of your users. So you've improved privacy. You know, 90% or more of your users aren't going to have any data collected from them.

But if you think about it a little harder, you're still collecting complete pictures of those individual users. So if you really only care about the events, you could go one step further and instead sample on the basis of an operation. So you just roll that random die each time you're parsing a file in this case, and if you get one in ten or one in 100 or 1,000 operations, then you choose to log it then. You're getting the same quantity of data, but you're no longer collecting a complete picture of any of your users. And so this is even better from a privacy perspective than sampling on a per-user or per-device basis.

Let's take a different log line now and I want to talk about derezzing or de-resolving the data. So here's a log line. Maybe you might see this one on a server on a client that, you know, some action succeeded at that particular time and precise number there, 22,341 bytes were sent. And this looks like a pretty anonymous log message.

It doesn't have that same sort of path information in it. But especially with that precise byte count, and let's say that this was a client message and there's a corresponding server message, between that time stamp and the bytes, you might be able to start to recombine those two logs that you thought were independent because you can see that at 3:03, 22,341 bytes were sent. And that would enable you to re-correlate it. It's a very precise sort of action.

And maybe you don't need this level of precision in either of those two dimensions. So you might instead say, well, we're really just doing this for load testing and, you know, planning for scale. So I want to know the hour of the day when this event occurred and I want to know, you know, to the kilobyte, what data was sent.

But now if it was 22,341 or 22,342, those would look the same in terms of the number of kilobytes because you've backed off the resolution that you're logging that at. And you should think about what business or engineering decision is driving your collection and maybe you can again go even further and say, well, I don't need to know it was May 4th. I just want to know what my heavy day of the week is. And actually, I don't need to know it to the nearest kilobyte. I can do it to the nearest 10 kilobytes.

Some simple math as you're writing out that log line can really reduce the resolution and therefore the amount of privacy that's impacted by logging this kind of message. So you can de-resolve virtually any precise thing, a duration, a time stamp, a size, any of those sorts of things.

Maybe it is the case, though, that you do need this level of precision in your log message, that what you're actually doing is, say, something to track abuse or an anti-fraud mechanism. You say, "No, no, no, it's absolutely critical, actually, that I know the precise number of bytes that were sent."

And that may be true initially, but you can also step back and think, "Well, how long does that initial purpose last for?" A couple days later, do I really care about the precise time that something happened? Because then I was already under a denial of service attack or not.

It's past. And so maybe you can go through and reprocess your logs, say, after a week, and say, "I don't need that exact timestamp anymore. I don't still have that need, that purpose to hold on to the timestamp." And then you can keep on doing this. So maybe after 30 days, you say, "Well, the time doesn't matter at all now.

And in fact, whatever I was using the individual bytes for, now I don't need that either, so I can back it off to the nearest kilobyte." And this is something that you have the opportunity to do on any device. You have the opportunity to do on any timescale, right? After six months, after two years, you can continue to reduce the amount that's held in each of these log files or each of these kind of data collections.

And lastly, let's consider this log message one more time. And this time, notice that it's an info-level message that says, "Action succeeded," right? It's not actually even talking about an error. And so you should ask yourself, "Well, why am I collecting this?" And you may well have an answer.

But if you can't answer to yourself what purpose you have this data for, then maybe it's better just not to log it at all. If you can't justify yourself why you're holding onto this data, it presents risk to you. And the best thing you can do for yourself and your users is simply not to collect the data. I put this one last so you didn't just write me off as a privacy nut on slide one. But a lot of times, you can really just not collect it at all, and that's the very best thing that you can do from a privacy perspective.

So I've covered these six different techniques. And like I said, you're never going to use all of them. But as you look at a piece of data that you're collecting, you should always reflect back on what the business or engineering decision is that's driving you to collect the data. And insofar as possible, use one of these techniques or something similar to reduce the impact on privacy while still bringing the benefits that you're trying to bring for your users.

I covered a whole bunch of very different things together, so I wanted to put the headline bullets together for you on one slide at the end. First one, discontinue the use of the UDID API and you should adopt the replacements. Pretty simple. You want to take these new data isolation classes on both OS X and iOS and make sure that you're testing them. And one more time, fail gracefully if the user says no. Give them a good experience. We have support for purpose strings inside of those consent or those permission dialogs, so add those to your info P list.

Go ahead and submit a privacy statement to the App Store now. It'll be displayed when we update the App Store. And lastly, as far as data collection, make sure users know what you're collecting and have some control over it. And lastly, collect only what data you need. And that's it, so thank you very much.