Engineering Privacy for Your Users - WWDC 2016

System Frameworks • iOS, macOS, tvOS, watchOS • 38:51

Learn about new developments in Privacy on iOS, macOS, watchOS, and tvOS that impact you and your apps. Explore techniques to respect your users' privacy while building great features into your apps.

Speakers: Jessie Pease, Julien Freudiger

Unlisted on Apple Developer site

Downloads from Apple

HD Video (1.2 GB)
SD Video (310.8 MB)
PDF Slides (3.6 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Hello everyone and welcome to Engineering Privacy for your Users. I am Julien from Privacy Engineering and I'm super excited to be here today. Do you ever wonder how to learn from your users without creeping them out? Do you ever argue with your team over which controls and transparency to others in order to gain user trust? Well to answer these questions today we're going to go through privacy technologies that we adopt and how you can adopt them too.

As many of you know we at Apple care deeply about privacy. Just last year Tim Cook quote. "People have entrusted us with their most personal information. We owe them nothing less than the best protections that we can possibly provide." I mean it's not just Tim, all of us at Apple from the top down believe that privacy is important to gain user trust. In order to build a healthy ecosystem that protects your data.

So privacy engineering, we're a team that works with many teams across Apple in order to build privacy into our products. And year after year we learn that best products and services are those that respect user privacy without compromising the user experience. They are great services that have amazing features, and also respect user privacy.

They encourage users to engage with your app by offering transparency, control, consent, but also securing the data, minimizing the data that you collect and limiting the use of the data you collect. As more and more sensitive information is put on our devices, we believe that privacy is becoming a more valuable commodity and it's worth your investment.

But privacy is also more than just a set of rules as they said here, it's an opportunity to do smarter engineering. To consider other types of designs for the benefits of our users. For example, it encourages you to do more on the device, to bring the intelligence down to the device for faster user experience and less stress on the network.

So it encourages you to only collect the data that you need in order to improve your services. So today we're going to talk about identifiers, how to collect data tied to those identifiers, which kind of controls and transparency to offer to your users, as well as meaningful choices and how to build privacy into your app. Let's start with identifiers.

Many of you every day rely on identifiers in order to get to know your users and analyze their behavior over time. And this is great, after all great stories start with good identifiers. But, if you track everything that your users do all the time, it creates an uncomfortable atmosphere for your users, sort of a chilling effect where people start to wonder, if I tap on this link is that link going to be tracked back to me? And as soon as your users hesitate you start to lose them.

Identifiers are very helpful in order to build customized services are essential to get to know your users and offer them useful predictions. But if you use them for too long to track everything you start to build creepy suggestions that may put your users at unease. Identifiers are also super helpful in order to fight fraud and to prevent and detect malicious use as early as possible. But they do not justify collecting data indiscriminately from all your users. So what can you do?

Well there are a number of best practices that we recommend. First of all, favor short-lived identifiers. You don't systematically need to make use of persistent identifiers that never change. So you can make them change automatically over time, or make it very easy to reset them. Stick to random identifiers instead of always and systematically using somebody's username, email address, or phone number. Most of the time you don't need that.

And ask yourself do I really need to authenticate my users, or is it enough to have an anonymous session, an anonymous experience with my users. By following some of these best practices you're going to make it very easy for your users to engage with your app without having to worry about their privacy. So let's have a look at some examples.

In our own products we seek to adopt these very techniques. So for example in Spotlight and Maps we have identifiers that change periodically automatically so users can engage with these products without having to worry about their privacy. We customize the experience of both for those products within those session where identifiers are the same.

We also seek to make it easy to reset identifiers at any time. So for example, in Siri, it's as simple as toggling off Siri off and on and you get a new identifier it can start over again, you're in control. We also generally seek to identify sessions and not users, because most of the time that's all that we need to provide our services.

We also want to make it easy for you to have this kind of techniques and adopt these techniques. So we have a number of APIs that allow you to generate, very easily generate identifiers. So for example, the universally unique identifier's API is just a simple line of code in Swift, and allows you to create an identifier whenever you call it.

So a quick example, if you call, whenever you call this API you get a new 128-bits random numbers. It's such a large number that this is going to be a globally unique random number whenever you call it. So if you call it again, you get another number, call it again etcetera. So then it's up to you to decide what to do with this random number. You can use it to identify objects, sessions, or users.

But what if you're looking for something more stable, you're not looking to generate an identifier, but you want to access one. Well the vender IDs here for that. The problem it solves for you is that it can just access this API, it provides an identifier for a device, it's very easy then to customize your services, you don't even have to create an identifier. And the way it works is whenever you call this API for a certain app, for a certain vendor, you will get the same string returned back to you, the same UUID.

So if you have the two apps on the App Store, made by the same vendor, they can access the same UUID and this just makes it easier for you to customize your services across your apps. Now, an app from a different vendor will get a different UUID altogether. Note that users have control over these by uninstalling your apps.

Another stable identifier persisted by the operating system is the advertising ID. This one is to be used for advertising purposes only and the way it works is whenever an app calls this API, it will be able to identify and get a UUID for the current device. And so this number will be constant over time, unless the user decides to reset it, or turn it off altogether. So in summary there are a number of APIs that we provide and each have a different code it's about either about generating an identifier or accessing identifier persisted by the system.

In addition to identifiers for the system, there are network identifiers. So whenever you scan in order to try to connect to a Wi-Fi network, you reveal you MAC Address for everyone to see and that can be a location privacy issue. So in iOS 8 we changed the behavior of scanning so that we start using locally administered MAC Addresses that change over time automatically, which is great for your privacy, you don't have to do anything, because we make it much harder for people to track you. And we're very glad and proud to adopt this technique not only on iOS, but on macOS as well. So if you're a Wi-Fi lender and you have some equipment, we encourage you to not only test your equipment with iOS seeds, but also macOS seeds.

You must have heard of messages and support for app extensions. This is a super cool feature and it is privacy friendly. So app extensions will not be able to access your context, what they will see is a random UUID instead. And that UUID will be different for each app extension and in each chat. So it's super privacy friendly.

Now, even if you're not trying to access or generate identifiers, you may just be collecting some data that looks innocuous. But the problem there is that by putting this data together, you might actually end up with a persistent and stable device fingerprint or an implicit identifier. And that can be a problem in case of data breech. Where you thought your data was anonymous and then some researchers out there showed that actually it was very easy to identify users in there.

So you don't want to be in that situation so you should seek to minimize the data you collect. And we also try to help you there. We seek to build a strong and secure Sandbox that protects you and your app from malicious third parties and also protects you from unintended user fingerprinting.

So what we do there is we provide you with strong and amazing APIs to build features, but we provide a protection and we limit the set of APIs that could be used for fingerprinting. So for example, this year we duplicated some APIs that could be used for fingerprinting and we also whitelisted some properties that are helpful for you and blacklisted other properties that could be used for finger printing. So in summary what does this all mean to you?

Identifiers are great, but we recommend that you favor short-lived ones whenever possible and that you also stick to OS provided APIs in order to access or generate identifiers. We think that the users will find it easier to engage with your app and to contribute their data. So talking about data, let's look at data collection.

Many of you every day collect data from new users. And this is very helpful to do a data-driven approach to product development. And this is super cool. Now one day, you may end up in a weird situation where you learn so much about your users that you know more about them then they know about themselves. And that's weird.

So ask yourselves how would you feel if they knew all the data that you had about them? Would you feel embarrassed? What if you had to collect this data from your dad, or your mom, or your siblings? Data is as much a benefit as a risk. And you need to balance these two. So what can you do?

We recommend you put in place privacy friendly data collection practices. And in particular, instead of collecting speculative data, just grabbing everything you can because maybe you have some use in the future, try to stick to key performance indicators and collect data for those. Here is a number of techniques that you can adopt to do so. Bucketing, sampling and aggregation. Let's look at bucketing.

The idea of bucketing is to only collect the level of detail that is sufficient for your needs. So, let's say you wanted to know, how often are your users opening your in-app settings? So we ask this question for a particular user. So it's quite simple, you just implement a counter and you look and you measure. So let's say for this user, test user, you went 86 times on a given day to your app settings. So this information is helpful right because it tells you that maybe something is wrong with your configuration. Maybe you want to make something more discoverable.

But do you really need to collect this with this level of precision? Would it be sufficient to only collect that somebody went to your in-app settings more than 50 times? That would also give you the same information that maybe there's something to improve and to further refine within your app.

But what if you're interested to learn, not just about a specific user, but about a trend of users. Well, you can just collect and ask the same question to all users. And so you will say how often do my users open in-app settings on a given day? Well it turns out like if you're interested to collect the average you really don't need to collect this data from all of your users in order to get good statistically significant estimates of the average.

It turns out if you only collect data from 0.1% of your users, you get a good estimate of an average. In fact, if you collect from 10% of your users only, your estimate is as close as 0.1 precision. So you really don't need to systematically collect data from all of your users depending on what kind of performance indicator you're looking to collect.

But what if you're looking to collect some data that's very sensitive? Something that you really shouldn't be collecting in the first place. But you're interested to learn about it, what could you do? Well, what if there was a technology that allowed you to learn from the crowd while protecting individual data.

Well there is such a technology and it's called differential privacy. So, differential privacy is one of the safest ways for users to donate their data. It is one way to learn from the crowd while protecting individuals. It provides strong and mathematical guarantees of privacy. And this year we are adopting it in iOS and macOS.

So, before I get started here two things. First, we believe that this is yet another tool in our privacy toolbox. We have a number of tools that we use for privacy and this is a very exciting one and promising one. Second, this is going to be a quick intro into a fascinating technology and I'll do my best to inspire you and get you excited about it. So let's get started. I said it gives you strong mathematical guarantees. So let's start with some theorems.

No, I think we can try to do it and get the idea another way. So the idea is there's some data, something sensitive, and you add noise to it and you get some privatized data. And then you can aggregate that privatized data to extract some signal. So let's say an example, how many hours did I work last week to prepare these slides. So it was a busy week, so let's say I worked 128 hours.

So, I can add noise to that and then what that means really is that I change this number to something that looks different. So let's look a little bit more about what I mean by noise here. So, disclaimer this is a simplified description but that's going to give you the gist of how it works.

So I have this number 128, and we'll compute a projection, it's sort of a hash that I compute. So what I do is a print a very long vector full of zeros and I only write one 1 at position 128. So it's a huge vector full of zeros, there's only one 1.

Then by noise, what I mean is I'm going to flip some of these zeros and some of these ones. So some zeros are going to become ones. And some ones are going to become zeros. The trick here is that zeros have slightly more probability to remain zeros than to become ones. And ones have slightly more probability to remain ones than to become zeros.

So this is one way to obscure your data prior to sharing it with Apple. It's another thing that's an interesting observation here, is that if I were to obscure your data twice, even if it's the same number, each time it will give me a different result. And unlike a typical hash function, I cannot try to do a dictionary attack to try to figure out what this value corresponds to. So this is all done locally on your device.

So, this is all done locally on your device prior to sharing it with Apple. So, in this example let's say I want to know the amount of hours people worked at Apple last week, so I worked 128 hours, Jessie 48 hours and Timmy 130 hours. So we collect all of this from all Apple employees and the noise is added locally on device. And what Apple ends up with is a cloud of binary and noisy vectors, where individually these vectors are pretty much useless to us. But we're able to combine them together and average the noise out and learn some statistical properties.

For example, here the average number of hours people worked at Apple last week was 41 hours. So the way we do that is we actually sum up all these vectors and then we normalize. And then we look at column by column whether there is a statistically significant value that is larger than random. So why is this cool?

So we think it's really cool because it's one way to learn popular items privately. You can learn averages, the presence of certain attributes, or even frequencies such as histograms. And this is done in a way where each individual contribution is private. Is kept private from Apple. We can only learn from aggregate.

But you might ask yourself what happens if I'm the only one that is contributing data here? At some point maybe a user will become an aggregate in itself. Well, to avoid this issue we used the concept of privacy budget that restricts the number of contributions that can be made by any certain device during a period of time. We also don't have any identifiers at all associated with this information. And we permanently delete the data from our servers. So let's have a look at an example.

Say we wanted to learn popular emoji's across people that are opting in to diagnostics and usage data that we collect from customers. So when people will start to use their keyboards and use emoji's we will observe that locally on device, and then Julien here, Jessie and Timmy which contributed their data noised up locally on device prior to sending it to Apple. And what happens is as more and more people share their data, we're able to extract some information such as which ones are the most popular.

So what's cool here is that one use case we can then reorder the UI to reflect that. And we can come and do better predictions on device. And so this year we're adopting this technology in four types of use cases. We're using it for emoji's, as just discussed, we're also using it for new words. So when we learn words that are not in our local dictionary, such as slang.

We also use it for Deeplinks where we can see which Deeplinks are popular within apps in order to promote them in Spotlight and also for Lookup Hints. So this is super exciting. And we're looking forward to collecting this with privacy. But this is only collected from people that have opted in into diagnostic and usage. So if you are not opted in that path line, we're not collecting that data from you. So you are in control.

So in summary what does this all mean for you? Well we think that differential privacy is a very promising technology in order to do privacy friendly data collection. We're very excited to adopt it this year on iOS and macOS to do crowd-driven insight at scale, with privacy. We don't use any identifiers, we don't collect any raw measurements.

This is yet another tool in our privacy toolbox. And we hope to use it more and more in the future in order to build our future features. We think that your users will care because they will see the length at which we're willing to go in order to protect user privacy. So we hope this is going to encourage you to develop privacy friendly practices as well. In order to speak to transparency and control. I'd like to invite Jessie on stage next. Thank you [applause].

Hi. I'm Jessie. I'm also on the privacy engineering team. I work on privacy on for products such as Apple Music, photos, and education. So Julien went into great detail to talk to you about how you can identify your users and collect necessary data about them in order to build great features, but in a privacy friendly way.

So in addition to that it's also important to be transparent to your users about these identifiers and the data that you collect. When you do they have a better understanding of their privacy controls within your app. And when users have that better understanding, you gain their trust and you encourage their engagement within your app. So one way that we're increasing transparency and control is with advertising in iOS X.

So now in news and in the App Store users are able to see this blue Ad button on any advertisement. When they click on it they see more information about why they're seeing this ad. So when you and other developers are purchasing advertisements, you pick specific targeted groups of users to target your advertisements to. Well now users are able to scroll through about this ad to see exactly what, which groups they fell into for this advertisement.

In addition, users can also go to privacy in Settings, Advertising, View Ad Information. And we are being completely transparent to them. About all the user data we have about them to serve as advertisements in news and in the App Store. And this will be available in a later seed. So a way that we give users control in advertising is through Limit Ad Tracking. So, as Julien talked about earlier we have the advertising identifier and you can use this to track users for advertising purposes only on our platform.

When users toggle the Limit Ad Tracking flag on, this is signaling to us that they do not want to be tracked for advertising purposes. So in iOS x we're taking this a step further. So when a user enables Limit Ad Tracking the advertising identifier becomes a UUID of all zeros. It is now technically supported in code that when a user enables Limit Ad Tracking, we cannot track them using the advertising identifier. Now if a user disables Limit Ad Tracking, your app can get access to a new random identifier.

So you may be wondering, can you still show advertisements to users with Limit Ad Tracking on? Well of course you can. You can show contextual ads within your app. Things that make sense in the context. You just cannot track them using the advertising identifier if they have Limit Ad Tracking enabled. And remember that it forbidden to cache the advertising identifier in your app storage as this does not give users proper control over resetting it.

So, what does this all mean for you? Well, there's an increase in transparency for ad serviced to users in news and in the ad store, and Limit Ad Tracking is now technically supported in code. We know that when users are able to make these types of choices, like enabling Limit Ad Tracking this empowers them to make good privacy decisions that are right for them. Another way that we give users the ability to make these types of decisions are through consent alerts.

So in iOS, macOS, tvOS, and watchOS, users have the ability to decide which apps get access to certain protected classes of data. Now protected classes of data are things such as calendar, contacts, photos, HomeKit. So the first time that your app requests access to this protected class of data and calls the API, iOS shows an alert like this one. And at this point the user has the ability to decide if your app should get access or not.

When the timing and the context of the question makes sense, the user can make the choice that's right for them. We call this a just in time alert. I want to talk to you about all the new features that we have this year, that also have privacy settings and consent alerts along with them. The first is the media library privacy setting. This give access to Apple Music subscriber status, subscriber store front, adding content to a playlist, and now reading from the Media Library.

We know that users' Media Libraries and Apple Music subscription information is extremely personal and can be identifying. So we want users to have the ability to decide which apps get access and which don't. Now if your app got access to the Media Library in iOS 9.3, you will continue to have access in iOS X.

So how do you know if you already have access? Well with most of the protected classes of data we have what's called an authorization status API. It has a little bit of a different name with each protected class of data. But this is the example for Media Library. So you can call this API and get back one of these enum values.

If you see notDetermined this can happen from one of two reasons. Either one, you haven't asked yet, or two, you've asked for access to this protected class of data but since then the user has reset their privacy settings. If you see denied, this means that the user said don't allow. They do not want your app to have access to a certain protected class of data. And at this point you have a few options.

You can take the user down a different route in your app and show them all the features that they can use without giving you access to this protected class of data. Or, you can open up the privacy settings for your app. And at this point the user could possibly make a different decision if they so choose. Now if you see restricted, this can happen for one of two reasons.

Either one, there's a parental restriction set on this device, saying that your app cannot get access to this protected class of data, or two, this app is supervised, maybe it's in an education mode, or an ed enterprise context. And the device administrator has said that your app cannot get access to this protected class of data. And of course if you see authorized you're good to go.

So we have some other awesome new features that also have privacy settings. Another one is speech recognition. So this allows your app to send in audio files to Apple to be transcribed into text. Now the reason that we want users to have the ability to decide which apps get access and which don't is because we want them to understand that if they grant your app access you may be sending audio files that are theirs to be transcribed into text from Apple. You can request access to this ahead of time with this API. We also have SiriKit. This is really exciting. Users can now make requests in Siri that are completed with your app.

But in order for Siri to understand what types of request a user might make, your app must first send up contextual information to Siri about the user, and about your app. And we want users to have the ability to decide which apps send up this contextual information to Siri about them and which don't. So you need to request access to this ahead of time, before any Siri requests can be made. And you can do this using the request Siri Authorization API.

You also probably heard about the TV Provider single sign on feature. Now, this allows enabled television apps to request access to a user's subscription information and sign them into their app automatically. Now, because this is only for enabled apps, reach out to your WWDR partner manager if you're interested in becoming one of these apps.

And everything you loved about photos on iOS is now on tvOS as well, and same with HomeKit. And both of those also have privacy settings, on tvOS. So, in addition to these awesome new features, we also have some updates to how your app requests access to certain protected classes of data.

So in 2014, we started requiring all apps that requested access to location to declare so ahead of time and to declare a purpose string. We find that users are more likely to make the choice that's right for them when they understand why an app needs access to location.

So, in iOS X we are extending this to all protected classes of data. So if your app is accessing any of these protected classes of data you will need to declare so ahead of time in your app's info plist along with a clear and meaningful purpose string. So take a moment, look at this list. Think is my app accessing any of these protected classes of data? How many of your apps are accessing one? Yeah. How about two or three?

Okay well you're going to need to declare that ahead of time in your app's info plist. And if you're a tvOS developer, these four protected classes of data are available for you to request on tvOS. And you'll also need to declare that ahead of time in your aps info plist. Now, I'm going to quickly go through all of the purpose string key values, key names that you'll need to put in your app's info plist. Now, you don't need to write them down right now. Because they'll be available for you in the development documentation and in Xcode.

So, let's say you have the next best disco light app. The first time a user downloads your app they can scroll through the various features and look at what they can do. And then the first time that they go and add their first disco light, at this point, when it makes sense, your app will request access to HomeKit.

When the timing and the context of the question makes sense, the user will see the consent alert show. And at this point you have the ability to write a clear and meaningful purpose string to go in that consent alert. I'm going to show you now how you can easily add your purpose string using Xcode. So you go to your app's target, Info section, custom iOS target properties, add a new key. In this case we're going to choose privacy HomeKit usage description.

And at this point we're going to write a clear and meaningful purpose string. So we're going to put this will allow you to create disco light shows using HomeKit. When the context and the timing makes sense, the user now sees why we need access to HomeKit. And they're more likely to make the choice that's right for them.

So, what's going to happen to your app if you don't declare access ahead of time? Well your app's going to crash. So, to preserve privacy any app built on or after iOS X that attempts to request access to a certain protected class of data must declare so ahead of time in their app's info plist. If you don't, your app is going to exit. So, your user will just see your app exit on screen. But this is the error that you will see in the Xcode console.

It will tell you exactly which protected class of data you tried to access and which key you need to go add to your info plist. Now, the reason that we crash your app is because in the event that your app was compromised and you did not intend for your app to request access to this protected class of data, we do not want your app to get access to any data you did not intend it to do so. And remember your app is responsible for any third party libraries that you build into your app. So be mindful that when you choose those libraries that they do not breech your user's privacy.

So what does this all mean for you? Well there's some awesome new features in iOS X and tvOS. And they also have privacy settings. You need to declare it ahead of time that you're going to request access to a certain protected class of data in your app's info plist.

And having a clear and meaningful purpose string will help users make better decisions. So in addition to these awesome new features, we also have some changes to some existing frameworks, to help you build in privacy to your app from the ground up. So first I want to talk about some changes to pasteboard.

You can now set the expiration time and a local only value to any values you put on the pasteboard. Setting the expiration time will make it so the value put on the pasteboard will be removed after that date, or on that date. And setting the local only value to true, will exclude those values from the universal clipboard. So here's an example, in this case we have a UIImage that we're going to put on the pasteboard, along with the string "Hello, world."

We're going to set the expiration time to 120 seconds from now. And we're going to set the local only value to true. So in two minutes, this value will no longer be on the pasteboard. And, these values cannot be pasted using the universal clipboard. So why would you want to do this?

Well, if you know for a fact that a user is copying something really sensitive, such as a password or a phone number, or an email address and you don't think it should be on the clipboard for very long, set the expiration time. And if you don't think it's appropriate for those values to be used in universal clipboard, set the local only value to true.

We've made a couple more changes to pasteboard. The first is that all named pasteboards will be defaulted to non-persistent and setting the persistence of a pasteboard will cause a deprecation warning. In addition, trying to get an item off of the find pasteboard will yielded an empty object with no value. And we really want to encourage you to use shared containers instead of pasteboards when you're trying to share information on different apps on the same device that are all within your one team. It's a much more secure way to do it, than putting items on a pasteboard.

So a lot of your apps also integrate with Core Spotlight. And we have a couple of things we want to talk to you about today as well when you're integrating Core Spotlight into your app. The first is that every user tap should not result in an NSUserActivity being published. You don't want to flood the system with a bunch of events that don't make sense for the user. Only really significant events you think they'll want to go look back for later. And do not set eligibleFor PublicIndexing to true if the data is sensitive user data.

You also probably heard about our new support for widgets in iOS X. So when a user enables a widget, it's available on the home screen, and the lock screen. So because of this, I want to talk to you about some best practices. Things that you can do when you're creating your widgets to remember that they're also available on the lock screen.

So the first is that you should evaluate the sensitivity of data you put in your widget. If you don't think it's appropriate to be on the lock screen, don't put it in your widget, or engineer a way for it to not be there when it's on the lock screen. Make sure that the data is consistent and predictable. So when I enable your widget today, do I expect to see something similar tomorrow? How about a month from now?

It's okay if it's not the same data every day, but I should understand what type of data could be there. And remember that its data that is file protection type complete, will not be available on the lock screen. So I want to show you an example of what we're doing today with the Find My Friends widget.

On the lock screen the location is not available. We know that the location of your friends is extremely sensitive and should not be available on the lock screen. So we have a string that says unlock to view location. And the location is not there. Then once I use touch ID to unlock my device, we show the location. Consider doing something like this if you think that some of the data in your widget could be sensitive.

So Julien and I talked to you about a lot of different things today. We started out by talking about identifiers. The real root of how you identify your users within your app. And how you should favor short-lived identifiers and OS provided identifiers. And we talked about how when you're collecting data in order to build your features you should do it in privacy friendly ways such as bucketing, sampling, or using new technology such as differential privacy. And how when you give users transparency and control over those identifiers and that data, you gain their trust and you encourage engagement in your app.

And that when you let them make decisions about their privacy this empowers them to make good privacy decisions that are right for them. And of course we showed you some awesome new features and some great tools to help you build privacy into your app from the ground up. For more information about our talk you can visit this web page. Here are some related sessions you might want to check out this week. Thank you.

[ Applause ]