Video hosted by Apple at devstreaming-cdn.apple.com

Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2012-710
$eventId
ID of event: wwdc2012
$eventContentId
ID of session without event part: 710
$eventShortId
Shortened ID of event: wwdc12
$year
Year of session: 2012
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: [2012] [Session 710] Privacy Sup...

WWDC12 • Session 710

Privacy Support in iOS and OS X

Core OS • iOS, OS X • 43:39

Learn about new iOS and OS X privacy features and get details on new and updated APIs. Hear best practices for delivering great features and respecting your customers' privacy.

Speaker: Erik Neuenschwander

Unlisted on Apple Developer site

Downloads from Apple

HD Video (99.7 MB)

Transcript

This transcript was generated using Whisper, it may have transcription errors.

All right, hello, everyone. For the next hour, let's talk privacy. And this is a talk on privacy support in iOS and OS X. I'm Eric Neunschwander, like it says on the screen, and I manage Apple's product security team. So today I'm going to go through four major topics for you. First of all, I'll give you an update on the UDID, which may be familiar to some of you. You might be interested in that. And then talk about the increased support we have for data isolation in both iOS and OS X. I'll take you on a really brief tour of some of the new privacy UI in both of the platforms, and lastly, close with some discussion of best practices for privacy and data collection. So with that, let's start with the UDID. Now, who in here is already familiar with the UDID? You've heard of this thing? Yeah, okay. So I'm going to keep this recap super short then because I think most of you have it already.

The UDID is an API that we added in that first version of iPhone OS, iPhone OS 2 that supported third-party applications, and it is a unique and persistent device identifier. And it was there, it was API, and then one year ago, with the release of iOS 5, we deprecated it. And when we deprecated it, we explained-- we have that quote in the developer documentation-- we say, "Do not use the unique identifier property. "Instead, you can call this other function, "and then you can call this other function "and write it down and save it, and cool, right? "That's your replacement." And over the past year, I think we've heard the feedback, which I'll try to sum up really briefly as something along the lines of, "But wait! Wait! That doesn't do even like half of what we did with the UDID, and if this is what we're supposed to move to, that's confusing, and we don't understand, and things of that. I've gotten emails. We know that. So let me tell you what we've changed in iOS 6 and what the future of UDID is. And it boils down to three new APIs. These are new, each of them in iOS 6, and I'm going to describe each of them now, starting with the application identifier. So the application identifier is new API. In fact, it's a whole new class, the NSUUID. And you can call the UUID method on it, get back an auto-released UUID. And most of you are probably familiar with what a UUID is as well. It's just a unique 128-bit number which doesn't have any hardware details in it. And so, looking at this, it's really a Cocoa-level replacement or cover over that CFCreateUUID method we talked about before. And fundamentally, yes, it's just a random number. It's a random number that you can get. Every time you call this API, it'll give you back a new one. And then you can identify things with it. If you send it off to a server, you can use that in communications with the server. Or if you persist it into your application data, well, it'll behave just like your other application data does. And because this will differ across the three APIs I'm going to talk about, let me just briefly go through that. That if you take this random number that you get from the UUID and you save it, it's part of your application data. In terms of lifetime, that means that this identifier will last as long as the user has your application installed on his or her device, right? It's part of the app data. If a user uninstalls your application, well, this goes away along with the rest of the app data.

It's also backed up, so if you erase the phone, put the backup set onto that same device, this identifier, of course, comes along for the ride. And lastly, again, like application data, if you restore that backup set onto a new phone or an iPad, some other device, then, of course, all the data comes along, and that means that this application identifier, this UUID you've written down will be the same across two devices for your application. So with this new API, you have an identifier that's specific to your application that you can use within your application to identify things. And it's Cocoa level, so that's better than what we had before, and so it's part of the replacement set. But clearly, it isn't the whole story. And so let me go on to that second identifier that we've added in iOS 6. And that we call the vendor identifier. This is also a new API. It hangs out in a familiar location, though. Udidevice, current device.

This is where the UDID API is as well. So you can call this one with identifier for vendor. And this identifier is device unique, and it's per team ID. So those of you that hang out on iTunes connect, which is, you know, kind of the more corporate people for your companies maybe, the team ID is that thing on the app store that identifies basically your company, and it's the same for every application your company makes. So that means that if you call this API from one app or a second of your applications, you'll actually get back the same identifier. It's an identifier that will go across your applications. This is because iOS is maintaining a mapping of that identifier under the covers to your team ID. So that's managed by OS. And in terms of its lifetime, iOS will persist that identifier for as long as any of your applications is installed on the user's device. So I want to take you through a quick example of this just to make it really clear. So here's a made-up implementation of how iOS might manage this. We have the vendor IDs -- sorry, the team IDs on the left and installed app count because, of course, iOS knows how many applications from each developer are installed. And then you can see the UUID or the vendor identifier off there on the right. So here's the table iOS is managing. And let's say that the user of this device uninstalls one of those applications from that first developer.

Small change, that installed app count goes down to two. You uninstall a second application, that count goes down to one. And now the user uninstalls that last application from this first developer. And it's at that point that iOS will forget the vendor identifier. That gets erased. And now at this point, the user might install another application from that same developer again. It might be one of those three applications that just came off or it might be some new application from that developer. Well, at that point, it comes back into the table with a count of one, and this is the key point. It's a new identifier that gets created at that point. This isn't a hash of the team ID. It's not tied to the team ID. It is a random number. It's actually just another UUID, but it's one that iOS is maintaining an association for tied to that team ID. So as long as any application from your company under your developer account is installed on the user's device, any of your applications can call this new API and get back that vendor ID. It's something you can share across all of your applications.

In terms of its behavior, well, it's backed up. We're going to maintain that in the backup set. If it gets restored back onto that same device, that whole table basically will come back down. So it persists across as erase and restores from backup. However, we won't restore that identifier onto a different device. So if someone backs up an iPhone and then purchases, say, a new iPhone and restores that backup set onto that new, different device, then these identifiers won't come along for the ride. This is a long way of saying they behave just like the UDID does today, in that the UDID changed across devices.

Similarly, the vendor identifier for your apps will change across devices. So it's the behavior that you've had with the UDID. So now we have two replacements. We have that one that can be used for your individual application, random number, the application identifier, and a second one which is shared across your applications. But this still doesn't cover all of the key uses of the UDID, so we have one more new API, and we call that the advertising identifier.

This, again, is new API. It also hangs out on UI device current device, and you can call it with identifier for advertising. This identifier is unique to the device, and it's the same across all applications on the device. It's used for advertising. And in fact, we know it works well for advertising, because iAd, that's Apple's ad network, has already converted over to using it.

For every device running iOS 6 or later, iAd now uses the advertising identifier exclusively and no longer uses the UDID as it once did for advertising purposes. If you're familiar with ads, things like frequency capping, conversion tracking, this identifier works for those purposes. Because it's not a hardware identifier, it's something managed by software, that means the user can reset it by tapping Erase All Contents and Settings, and that'll create a new advertising identifier on that same device. And then lastly, again, to compare it to the other ones, this identifier is backed up, it's restored back onto the same device. It'll persist that way. But if you restore that backup onto a new device, it'll be a new advertising identifier. So this one, like the vendor ID, behaves like the UDID in terms of its behavior across devices. So for the replacements for UDID, we have these three new APIs-- the application identifier for single apps, the vendor identifier across all of your applications, and the advertising identifier for advertising.

And they have these behaviors. This is exactly what I said before. It's maybe so you can go back in the video. It just covers the scope, lifetime, the fact that all of them are backed up, and the fact that only the advertising identifier will be restored across devices. So you've heard me say replacements a few times over the past ten minutes, and so this slide shouldn't be too much of a surprise that the future of UDID is no UDID. There are these new replacement APIs available in iOS 6, and I encourage you to begin your transition now. For any new application you create, you should build it on top of these new APIs. And as you update your existing applications, move them off of that UDID API onto one of these three new APIs. And that's because the UDID and similar identifiers will be disallowed for use in the future. And when we do that, it's not going to change anything about the legacy behavior of existing applications on users' devices. They'll still be able to call, for instance, the UDID and get back that same number. So it won't be disruptive. However, for those new applications and updates, use of the UDID and similar identifiers won't be allowed. So it's motivating, and start your adoption now with these new APIs. So that's the update on UDID.

Let me talk next about data isolation. And I'll actually start by defining the term. Data isolation is just when the OS is going to get between your application and sensitive user data. It does this in a way which is largely transparent to your application. It's just that you call existing API, and that will trigger a user consent dialogue or alert. And you guys have seen these, actually, for location services. I have two examples up there. On the left, we have one from iOS, where camera is requesting access to location. And on the right, you can see Safari and OS X Lion doing the same thing.

So when these alerts, you also saw that the user could make a yes or no choice there. And so if the user says don't allow, then the application doesn't get access to whatever data is covered by that permissions dialog. On the other hand, the user can say yes, it works. And regardless of whether the user says yes or no to that permissions dialog, the OSs offer a choice for the user to review and to change his or her decision afterward, a no to a yes, a yes to a no, either by going into iOS settings or inside of OS X system preferences.

In terms of the implementation, they differ between the platforms. So let me talk -- I'll be talking here back and forth about both iOS 6 and Mountain Lion, and I'll start here with iOS 6. That data isolation builds on top of the sandbox, and so the data that is covered by data isolation actually starts off outside of the sandbox and remains outside until the user gives permission, which is an activity which iOS does on behalf of your application. And if the permission is denied, the apps have no access to that data. I mentioned a moment ago that a user can can go inside of iOS settings and change in terms of a yes or no decision. And for those of you that develop on top of iOS, you know that sudden death for applications is a common occurrence. And so it occurs here that if you have a background task, your expiration handler will get called to give you an opportunity to shut down. But then, as you probably know, the OS will kill your application. If it's not running a background task, the OS will just kill your application. And this ensures that when you reawaken, when you're relaunched, you're in a consistent state either with or without access to that data class that the user changed. So that's the behavior on iOS. On OS X, data isolation covers the purpose-specific APIs for each data class. And it gathers permission in a similar way and is ultimately an aid to you that when you're trying to access one of these classes covered by data isolation, OS X will gather the user's permission for you. And because applications don't get killed in quite the same way on OS X, we still offer through system preferences an option for the user to quit the application if he or she changes the choice, to quit the application then or quit the application later. But on this case, on OS X, it's up to the user. And I'll talk more about sandboxing in a little bit, but this works well with sandboxing.

The sandbox checks actually still apply separately from these permissions dialogues that I'm talking about. So in both iOS 5 and in OS X Lion, we covered location services. That is the current location of that device. covered with data isolation. What we're adding now is for both iOS 6 and Mountain Lion, we've added coverage for contacts, and then for iOS 6 only, we're also extending data isolation to cover calendars, reminders, and photos. And so I'll talk about each of these classes in turn. Let's start on the--oh, right. This is new support, but it actually occurs with the existing API, so you don't have to resubmit your application to our app stores. You don't even have to recompile. So for those of you running the iOS 6 beta, you may have already started to see some of these alerts even for existing applications that you have on your device. That said, even though you don't have to make any change, there are some coding changes that you could make that might improve the user experience, and that's why I'll go through each of the classes.

So let's start with the really dead simple way of how it works today. This is just to be grounding for the next one, right? If you want data today in a non-data isolation world, you call createX, you get back the thing with data, right? It's just as simple as that. You do it every day. So let's talk about how contacts works on OS X Mountain Lion with data isolation. Where it starts off again, you call some address book API, say, and then what happens is that OS X will go and ask permission. That dialog box will come up requesting the user's permission. Let's say the user says, "Okay," so you get permission. Then the API that you called will return to you the object with all of that data.

So this covers the API that you can call. It's the purpose-specific API, which is a whole host of APIs inside the address book framework. Two examples might be the shared address book, or if you init AB_PERSON or AB_GROUP. These are the kinds of API that will trigger that permission check. And as you saw in that last slide, the call does block while the user gives or makes that choice about permission. And so this is a user action. It will take a relatively long time compared to running code. And so if your application is in a state where it needs to remain responsive, you should wrap the call to these APIs inside of a dispatch block. But when it comes back, if you've been granted access, if the user said yes, it's just like the "was before data isolation" that you'll get back a populated object. It has all the data. If the user denies access to contacts, then you'll get back a nil object. So this is something you're going to want to check for, whereas it would have been unlikely in the past that you would have gotten back a nil object in the same way. And then in addition to the purpose-specific API for the address book framework, any other APIs which specifically use contacts data-- so sync services, spotlight if you're searching for a contact, AppleScript-- these will also trigger those same permission dialogs. So it's both the purpose-specific APIs and other system services that explicitly access contacts data. So let's talk about sandboxing and how it relates. And they're really separate things. The sandbox check still applies. So for those of you that are submitting to the App Store, you've sandboxed your application, you know that the address book is already -- it's by default outside of the sandbox. And to bring it into the sandbox, you have to request an entitlement. And you should only use the entitlements that you need. And so in this case for contacts, that entitlement is com.apple.security personal information address book. Descriptive long string. It's actually easier just to check inside the Xcode UI. There's a box which says allow address book data access. You check that, the exact same thing happens in your entitlements file. So if you have that entitlement, then the sandbox check will pass and you'll get to that permission check that I just went through. But the sandbox check occurs first. If your application is sandboxed and does not have the entitlement for address book data access, then the sandbox will fail that request even before the user is prompted for permission. So the two work together. You sandbox your application with the necessary entitlements and then the user gives you permission and whether or not to access the contacts data. And that's how it works in OS X Mountain Lion.

So this is the same slide, but now let's take it from here and talk about iOS. And in iOS, you have the call to create the object, and then you get back actually immediately an empty object. And that's because if you think about it, a lot of this data request happens at launch in some of your applications.

Maybe it doesn't need to, and we'll talk about that more later. But your application has a limited amount of time to launch and needs to remain responsive, and so that permission check is actually done separately by the OS. And then if you receive access to the data, then you'll receive a change notification stating that there's now data in this object. So that initial access, it's synchronous in the sense that you call and it comes back, but the permission check is occurring outside of that. That initial object may be empty. So it's very important that for the classes that support it, you handle change notifications. And though I'm bringing it up here in the context of data isolation, the change notification is something you already should have been handling, right? Because in a world with iCloud and data changing from underneath your application, these change notifications were how you kept in sync already. But in this case, a change notification will be more commonly occurring for all of your applications that are accessing contacts data, and so it's important that you really pay attention to it now.

Because that story that I just told, we're finding that it might be a little bit confusing or difficult to handle that permission check out of band, we actually have a new API we'll be adding to iOS 6. That said, it's not in the seed that you have now, so don't go looking for it. And I'm now going to describe something that we're working on, and so if when you read it in the release notes of a later seed and it doesn't look like this, well, you know, don't blame me. I'm trying to give you the early idea of it, but who knows? We may still tweak it. But the point of this API is that you can call it for each data class and get your individual applications status. So you could say, okay, does my application have access to contacts? And you might think that's a yes or no question, but as we're designing it, this API will probably have four different return values. The first one is not determined. That means your application has never requested access to contacts, so we literally have no idea what the state is. It's simply indeterminate. The next two, more straightforward.

If you've requested permission and been given it, then you've got granted -- you've been granted permission to the data class and we'll return that. Or if you requested permission and the user said don't allow, then we'll say, no, you've been denied. But then that last one, restricted. So we've also enhanced restrictions, kind of the parental controls or things that enterprises can do with managed devices to also include each of these data classes and whether or not the user is allowed to even answer that permission question. So in the case where, say, access to contacts has actually been disallowed from you and that's done by restrictions, this means that your user may not even be able to make that choice to allow your application access to contacts.

And so we're proposing this fourth return value to allow you to tailor your user experience to the cases where your user may not have control and therefore not be able to give your application access to the data. So four return values: don't know, yes/no, And then no, but the user may not be able to change it. From your application's perspective, denied and restricted will look the same in that you don't have access to that data class.

So I talked about the classes that we're adding to iOS. It's contacts, calendars, reminders, and photos. Let me go through each of those now, starting on the iOS side with contacts. So in iOS, it's access to the AB address book ref that's managed. That's where the permission check will happen. So you know this method, abaddressbookcreate. Well, we've actually deprecated in iOS 6, and so we have a new method, abaddressbookcreatewithoptions. The options parameter is reserved for future use, and that pointer to the CFError is something that can tell you if you've been denied access. So that'll actually be set to a value if your application is denied access to contacts.

Now, of course, deprecated APIs, you should move off of them, and so you should move on to the new one, but I do want to be really clear that even if you call the deprecated API, the permission check occurs there as well. So permission happens no matter which one of these you call, but that second one gives you a little bit of additional information in the case where you've been denied access.

Initially, first time your app runs, it'll be in that not-determined state. And what this means concretely is that ab_address_book_create_with_options will hand back to you, kind of in that immediate, synchronous way, it'll give you back an empty read-only object. What you should then do is register that address book instance you got with a callback method by calling ab_address_book_register_external_change _callback. You pass in your callback function. And then when your callback gets called, you should call ab_address_book_revert. And what that does is it takes the ab_address_book_ref initially and makes sure that it's in sync, which in the case of a permissions check means that you go from an empty read-only object to one with, you know, 500 contacts suddenly in it because it's as though the entire address book was just synced into your copy of it. And so you'll want to definitely sync up the address book reference that you got initially because that's how you'll really get access to the data after the user has said yes. Otherwise, you're just holding on to a stale instance.

So in calling that method to get the address book object, you might get back a nil object. That's going to be true if permission was denied in the past. And that means that the error will be set if you call the new API. And with that nil object, you should just keep in mind, adding entries to that nil object, that's not going to be useful. It's not going to do what you expect. Similarly, you may get back an address book that's completely empty if permission is still pending. And in that case, if you were to aggressively start a two-way sync inside of your application against an empty nil or a data set that looks like the user deleted all of his contacts, that again might not do what you expect. So you'll definitely want to test all the possibilities and go through to check to see how your application behaves and if the behavior is appropriate in each of those four cases I talked about. So let me move on to calendars now. And at calendars, it's ek-event-store where we manage access, similar to the address book. So you probably know about ek-event-store We've gone ahead and deprecated that as well in iOS 6. So we have a new method, and it's initWithAccessToEntityTypes, and this allows you to specify within the event store what you're asking for access to. So in particular, in the case of calendars, you want to pass ekEntityMaskEvent, and the event part is what specifies that it's calendars that you're after. And that will generate that same permissions check, again, with this call. Reminders is really simple, similar to calendar events, because they're part of EventKit as well. So instead of passing the EKEntityMask event, you're going to pass EKEntityMask reminders-- a reminder-- and then that will indicate that you want access to a reminder rather than calendar entries, and that'll enable iOS to tailor that permission dialog to say which your application is asking for access to. For those of you that work on both platforms, you probably know that EventKit is supported on OS X as well, so I just want to clarify that although EventKit is on both platforms, the data isolation applies to iOS only. it's not something that applies to EventKit in Mountain Lion. Lastly, let me talk about photos. So it's actually access to-- not access to ALAssetsLibrary that's managed in the photos case. Instead, you get back the AssetsLibrary, and permission requests are deferred until you call one of the get or set methods. So if you've looked through the API documentation, you know that many of them are asynchronous already, and they have failure blocks. So if you, for instance, call ALAssetsLibrary, enumerate groups with types, then you can pass in the block to do your work. you can also pass in a block that's called on failure, and that denial case is what we'll call the failure block. So when you try to use one of these get or set methods on the assets library, if when the permission check comes back, the user has denied access for your application to photos, then your failure block will be called. And one thing to point out is that photos do include all the metadata.

So you probably know that in those exif headers on photos, the time that the photo was taken, the location that the photo was taken can be stamped into that as part of the metadata. When your application gets access to photos, you are receiving access to the actual asset. That means the photo and all of its metadata. And so location services is what's covering access to the device's current location, but the photo's location is managed by this access to the photo's data class. So let me go back to these consent alerts. And you saw them before, but now look at the fact that they have these purpose strings in both OS X and iOS. And purpose strings are something that you can add as part of your application development that will go into these dialogs that the OS presents.

They're optional. You don't have to set one. In that case, the dialog will just have a blank space. But I certainly encourage you to set a purpose string when you're asking for access to users' data because it gives the user more context. If the user understands why your application is asking for access to this data, they'll be able to make a better informed decision. You can choose a purpose string for each different class of data, so contacts has one, reminders has one, so on and so forth. And you set the purpose string inside of your application's info.plist, which you're welcome to do by hand, but Xcode also offers some UI. So in the info tab of the Xcode project editor, you can look for these keys which start, helpfully, named with the string privacy, and you can provide a value. So for instance, you can have photo library usage description, and there's one available for each kind of data. but you can also see there the UI inside of Xcode where you can specify a value for the string. So let me talk about testing on each of the two platforms, going back now to OS X. And on OS X-- you'll think it's silly-- I say just run your application, but wait for the next slide. On OS X, you can just run your application. That's how you test. Of course, applications can only trigger this prompt once.

The user makes a choice, and then they can go manage it afterward, but that's not the same thing as prompting again, and you developers are going to want to actually prompt repeatedly. You're going to tweak your purpose string, want to see how the new purpose string looks. So we have a new command line utility. It's called tccutil. And if you call from terminal tccutil reset address book, then that will clear out all of the previous choices that you've made on that system for applications to access contacts. And it has a man page, of course. You can type man tccutil for more information about that utility.

I want to remind you that on OS X, you should be testing all the cases, which is both when the access was already previously denied to you-- previously granted is kind of easy-- or when you haven't yet requested access to the data and the user either says yes or says no. And in those cases where a user denies your application access to contacts, please fail gracefully. Make sure your application works as much as possible, provides as much of the experience you're welcome-- you know, you could abort and just quit and say, "Ah, you know, forget you." But you should really try to make your application behave as much as possible, even though you don't necessarily have access to certain data classes like contacts. So that's OS X. On iOS--well, why did I say run your app before? Because it's not true on iOS. Isolation is not supported inside the iOS simulator. So if you want to test data isolation and those permissions dialog and the behavior, you need to run it on an actual device. So run your iOS applications on a device to test data isolation. Again, the prompting happens only once, so inside iOS, you can also do a reset. You do that by going into Settings, General, Reset, and then down at the bottom, there's Reset Location and Privacy. You tap that, and that'll erase the consents for all the data classes, and you can again see that alert pop the next time your application tries to access a protected data class. You want to test all cases on iOS as well, and those first three, they're the same, but remember that fourth one still as well, that you want to test when permissions have been restricted. That's that somebody has imposed the restriction on the device that prevents the user from even seeing, and therefore responding to that permissions dialogue. And of course, on this platform as well, please fail gracefully if your data access request is denied. So I've been talking about some of these new capabilities. I want to just give you a quick couple of screenshots for the new privacy UI. So this one is OS X system preferences, and under the security and privacy pref pane, you can see that we've now expanded the privacy one. So once you've selected it, location services there, as is contacts, and then diagnostic and usage. And for contacts, you see the applications listed. And... If you change the checkbox to say whether or not you want the application to have access to it, then you can choose-- then the user gets to choose whether to quit the application or not.

On iOS, inside settings at the top level, we have a new top-level privacy item there. You can see it with the hand icon. And if you tap into that, there are the five different data classes that I've been talking about. And then I show you photos there on the right. And note that text at the top that's shown to users inside the photos privacy area, that photos stored on your phone may contain other information, such as when and where the photo was taken.

And that's present there to remind the users, just as I reminded you, that photos, along with their metadata, are given to applications to which the user's given their permission here. So we also -- I talked about how restrictions have been updated. So if you go into restrictions inside of settings on iOS, then there's a new privacy section where those same data classes are shown.

And if you tap into one of them, you see at the top "Allow changes" and "Don't allow changes." And this is where, through parental controls or through device management in an enterprise, someone can choose both whether or not the individual applications should have access and then also whether or not the user of that device is allowed to change or respond to new access requests from new applications on the device. So this is how restrictions plays into the data isolation. So let me now just broaden it and talk not about particular changes we've made to our platforms but instead go through some comments on best practices that you can apply when you're doing data collection or doing things that may impact users' privacy. And most of you are probably client-side developers, but a lot of what I'm saying also applies as part of business decisions and as server programmers as well. I think you'll see as I go through the examples. I'm going to cover three areas. First is transparency, then talk about user control, and then data collection techniques that you can use to best respect privacy. So starting with transparency, privacy policy or statement. These things are really good. It's important for your company to have one, and in fact, there's a new feature now in iTunes Connect that you can submit a link to your privacy policy or your privacy statement to Apple. And in a future release, an upcoming release of the App Store, the page for your application will display that link so that a user is able to view your privacy policy before ever even downloading or purchasing your application. So this is a feature coming to the App Store, but you can start submitting your privacy statements now through iTunes Connect, and then they will become visible to users in that App Store update. So this is a very good thing to do. But even after explaining to the user up front how you're going to collect data, how you're going to use data, if possible, it's important to give the user opportunities to inspect precisely what your application is providing about them. So if you can provide UI inside your application or a method on the website for the user to inspect that, again, it just goes to open yourselves up, show more transparency, and have the users be better informed about the data collection. Now, you can be as transparent as you want, but it's also important that you give users control. And so part of that is asking permission, and really asking permission with context. So I went through the purpose strings for the iOS and the OS X permissions dialogs. Those are part of it. It's also important-- and this is more under your control-- to ask when you need access to the data. So it's a bad situation in many ways to just ask for everything up front when your application launches. Number one, it's not a great user experience. User just bought your application, launches it, gets a bunch of alerts. But from a performance perspective, if you're not needing access to the data right then, remember, we're only requesting permission when you're instantiating one of these objects to access the data. So if you don't need access to the data, that actually means you're instantiating objects you don't need to use right then, and that's slowing down your application launch, so it has a performance impact.

It's better if you ask just before you need it. Maybe the user taps a button that says, "Oh, I want to upload a photo or something," and then you say, "Oh, well, let me go get access to the photos database." A user's more likely to understand why your application needs access to that data, not only if there's a purpose string in the permissions dialog, but also if they have an idea that the action that they just took is the one that's causing your application to request that access. If they can connect these two actions in their mind, they're more likely to understand the context of your request.

And then lastly, even if you've given them that initial choice of yes or no, it's also important to give them control thereafter. Maybe they changed their mind. And it could go either way, right? They could have said no initially, done some additional research, decided that they really do want to give your application access, or they want to turn it off, maybe even temporarily for some reason. And so allowing the user to remake that choice over time is another way to increase their control over their privacy. And I'm just going to keep on beating home. Fail gracefully, because your application should try to provide the best experience in light of the control that you give in the user. So in terms of data collection techniques, I really believe it's true that all collection efforts reduce privacy. If you're getting data, that's impacting the user's privacy. That doesn't mean that the data collection is necessarily bad or wrong or anything of that nature, but you do have to think hard about the negative that's coming from the data collection and make sure that that's outweighed by the positive in terms of how you're improving your application, providing a better experience to the user or a more tailored experience to the user so that the The positives outweigh the negatives from the data collection.

And as I mentioned, I think that's true both on the app and the server side. And as you collect data from users, it's important to remember that holding on to rich data has risks. I think you guys read tech news. You've probably seen in the past couple weeks a number of passwords being leaked. We've seen other data breaches in the past. And if you have that data, then an attacker might be able to get access to that data. And wouldn't it be just really annoying for you and sad for your company's brand if an attacker ferreted out data that you were holding onto that you didn't even need. So trying to reduce the data down to just what you need to provide your products or services reduces the risk to your company as well. So let me go through some techniques that you can keep in mind. You're never going to use all of these. They're never each appropriate all the time. But six different ways that you can think about how to reduce the data that you're collecting while still satisfying your business or your engineering goals. So I'll go through each one of these in turn, starting with anonymization. So I'm going to use log lines here.

So let's start with this first log line. And you can see it's something-- it could be in a server log, it could be in syslog on the device, it could just be something you're writing down and submitting to some back-end analytics of your application. But in this case, it says error, and there's an illegal token, and then a whole path, right? And paths are very interesting or scary things because if something's in the user's home directory, well, the home directory is named with the user account name, and a lot of Mac users, that's exactly their name. So here we've collected the fact that this user is John Appleseed. And in addition to that, we know a couple other interesting things out of this log line. We know that there's some Project Zanzibar, whatever that is, and that there's probably something going on in fiscal year 2013. All of this when we were really perhaps focused on this illegal token. So we could do better and not collect that whole path. Maybe you do want to know the files that are causing illegal tokens. instead. This is clearly better because we're not collecting all this ancillary data that wasn't related to the actual document. But we still have a file name there. That file name could be named project Zanzibar fiscal 13. There's still a lot of information there. And so if you're really maybe focused on when these illegal tokens occur, you might want to just step back and think, I want to know what the type of the file is. I don't care about the file name at all, but I do want to know that it was a keynote file that had this token. That maybe tells me where I need to go test more. And so this would even further reduce and make sure that there's no user-specific string there in the way of the path or the file name that you're collecting, just the type.

Let's think about aggregation, and now take that same log line, but realize that it's registering a particular event. It might even have a timestamp associated with it, but it's a particular time and instance where that token was seen. And maybe this isn't important to you either. If we think back to the fact maybe all you cared about was that it was a keynote file, maybe what you really care about is the frequency at which these are occurring. So you might just log over the lifetime of your application or once a day or once an hour. You might just write down the number of times you've seen an illegal token in each different file type. So here we have it occurring 21 times in a keynote file, three times in some other kind of file, and this would point you to the fact that the keynote files are a much more interesting area to investigate than this other file type. And in this log line, you haven't logged anything precisely about a single event. It's just a record of a set of events that occurred over a period of time, so you're getting much less data about the usage pattern while still getting all the data about the occurrence of these illegal tokens. That's aggregation.

You can also think in terms of sampling, statistical sampling. So if we consider this again as a particular event, you can ask yourself, "Well, do I need each and every event "from every single user who's opted into my collection?" And maybe you don't. Maybe you could say, "Well, I could actually sample from just one in 10 "or one in 100 or one in 1,000 of the user's computers." And if it's a statistically random sample, it'll be representative of your population, and you'll get overall the same distribution and same shape of the events without actually gathering data from the vast majority of your users.

So you've improved privacy. You know, 90% or more of your users aren't going to have any data collected from them. But if you think about it a little harder, you're still collecting complete pictures of those individual users. So if you really only care about the events, you could go one step further and instead sample on the basis of an operation. So you just roll that random die each time you're parsing a file in this case. And if you get one in ten or one in 100 or thousand operations, then you choose to log it then. You're getting the same quantity of data, but you're no longer collecting a complete picture of any of your users. And so this is even better from a privacy perspective than sampling on a per user or per device basis.

Let's take a different log line now, and I want to talk about derezzing or de-resolving the data. So here's a log line. Maybe you might see this one on a server on a client that, you know, some action succeeded at that particular time, and precise number there-- 22,341 bytes were sent. And this looks like a pretty anonymous log message. It doesn't have that same sort of path information in it. But especially with that precise byte count-- and let's say that this was a client message and there's a corresponding server message-- between that timestamp and the bytes, you might be able to start to recombine those two logs that you thought were independent, because you can see that at 3:03, 22,341 bytes were sent, and that would enable you to re-correlate it. It's a very precise sort of action. And maybe you don't need this level of precision in either of those two dimensions. So you might instead say, well, we're really just doing this for load testing and, you know, planning for scale.

So I want to know the hour of the day when this event occurred, and I want to know, you you know, to the kilobyte what data was sent. But now if it was 22,341 or 22,342, those would look the same in terms of the number of kilobytes because you've backed off the resolution that you're logging that at.

And you should think about what business or engineering decision is driving your collection and maybe you can, again, go even further and say, well, I don't need to know it was May 4th. I just want to know what my heavy day of the week is. And actually I don't need to know it to the nearest kilobyte. I can do it to the nearest 10 kilobytes. Some simple math as you're writing out that log line can reduce the resolution and therefore the amount of privacy that's impacted by logging this kind of message. So you can de-resolve virtually any precise thing, a duration, a time stamp, a size, any of those sorts of things.

Maybe it is the case, though, that you do need this level of precision in your log message, that what you're actually doing is, say, something to track abuse or an antifraud mechanism. You say, no, no, no, it's absolutely critical, actually, that I know the precise number of bytes that were sent. And that may be true initially. But you can also step back and think, well, how long does that initial purpose last for? Couple days later, do I really care about the precise time that something happened? because then the -- you know, I was already under a denial of service attack or not. It's past. So maybe you can go through and reprocess your logs, say, after a week, and say, I don't need that exact time stamp anymore. I don't still have that need, that purpose to hold on to the time stamp. And then you can keep on doing this. So maybe after 30 days you say, well, the time doesn't matter at all now. And, in fact, whatever I was using the individual bytes for, now I don't need that either. So I can back it off to a single -- to the nearest kilobyte. And this is something that you have the opportunity to do on any time scale, right? After six months, after two years, you can continue to reduce the amount that's held in each of these log files or each of these kind of data collections. And lastly, let's consider this log message one more time. And this time notice that it's an info-level message that says action succeeded, right? It's not actually even talking about an error. And so you should ask yourself, well, why am I collecting this?

and you may well have an answer. But if you can't answer to yourself what purpose you have this data for, then maybe it's better just not to log it at all. If you can't justify yourself why you're holding on to this data, it presents risk to you. And the best thing you can do for yourself and your users is simply not to collect the data. I put this one last, so you didn't just write me off as a privacy nut on slide one. But a lot of times, you can really just not collect it at all, and that's the very best thing that you can do from a privacy perspective.

So I've covered these six different techniques. And like I said, you're never going to use all of them. But as you look at a piece of data that you're collecting, you should always reflect back on what the business or engineering decision is that's driving you to collect the data, and insofar as possible, use one of these techniques or something similar to reduce the impact on privacy while still bringing the benefits that you're trying to bring for your users.

Okay, I covered a whole bunch of very different things together, so I wanted to put the headline bullets together for you on one slide at the end. First one, discontinue the use of the UDID API, and you should adopt the replacements. Pretty simple. You want to take these new data isolation classes on both OS X and iOS and make sure that you're testing them, and one more time, fail gracefully if the user says no. Give them a good experience. We have support for purpose strings inside of those consent or those permission dialogs, so add those to your info.plist. Go ahead and submit a privacy statement to the App Store now. It'll be displayed when we update the App Store. And lastly, as far as data collection, make sure users know what you're collecting and have some control over it. And lastly, collect only what data you need. That's it, so thank you very much.