Fundamentals of Data Synchronization - WWDC 2004

Application • 54:45

Learn about the new data synchronization services in Mac OS X. Sync Services make it easy to synchronize your application's data between computers, and with other applications. This session will introduce the fundamental concepts of synchronization and introduce the Sync Services architecture and API set so you can begin to incorporate synchronization into your applications.

Speaker: Toby Paterson

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

development team. Two years ago, we introduced iSync. iSync was an end user application that let you synchronize your contacts, your calendars, with a phone, a PDA, an iPod. With .Mac, you could synchronize your contacts and your calendars to other computers. And we soon introduced Safari bookmarks. Today, it's my pleasure to be able to tell you about Sync Services, with which you can add sync to your applications.

That kind of begs the question, why would I want to sync? Well, for those of us with more than one computer, it's a great way of never having to worry about, did I leave the changes on this computer or that computer or over here? Portable devices are becoming more and more powerful, and so it's convenient to be able to have my information available on my phone, or my palm, or wherever.

But there's another reason too, and it's not one that's immediately obvious when you think about synchronization. People have got different applications that kind of do the same thing, but some apps are better suited to one particular kind of task than another. Sync can be used to share data between different applications.

What we're going to talk about today is I'm going to tell you what Sync Services is, going to get into the architecture, a bit of the data model, and just give you a roadmap to the APIs. I'm not going to get too deeply into what the APIs are.

We've got a lot of great documentation in the SDK that's on the Tiger DVD that you've got with you in your bags, and also available on the web. There's a second session that directly follows this one that's going to be a much more hands-on oriented approach into getting actually into how you write, how you incorporate Sync Services into your application.

So let's jump in and answer the question, what is Sync Services? It's a system service built directly into Tiger for data synchronization. The basic gist is, you provide us with access to your data, and we do the rest. We take care of all of the syncing and all of the sync smarts for that. It's a single solution for everyone. Today, if you want to get your contacts from your phone onto a Palm or some other PDA, you often need two, three, sometimes more different solutions.

We want to provide one solution for everyone. The problem with the existing solutions today is that they operate outside of your application. And this leads to a number of different problems. There's the multiple writers issue, meaning if you've got your application and there's a separate sync process, both trying to access your data store, you have to solve the problem of multi-process concurrent access to your data.

And that's a hard problem to solve. If you're going for a file sync solution, you want to keep your Word documents in sync with your laptop so you get some kind of file sync solution. There's a granularity issue there. What happens if you change the document in both places? What happens if you change just part of the document on one and part of the document on the other?

It'd be great if you could merge those two. Proprietary formats. If you're a device developer and you want to sync your device to some third-party application, you have to go in, reverse engineer their format, and figure out how they store their things. And that doesn't really create a lasting solution there.

What we're doing with Sync Services is giving you the ability to build sync directly into your application. You focus on syncing your app, and we'll take care of the rest, getting it to interoperate with other applications, other devices, syncing between computers. Sync's not just about applications. It's for devices, too.

We've worked a lot with devices with various kinds of memory limitations, devices that can only store a certain number of records, devices that can only store fields of a certain length. We've got some good solutions to help you manage that filtering, that formatting. You provide the application to manage and configure your device. You incorporate Sync into that application, and it's the same way. You sync the same way that you would any other kind of application.

You synchronize your devices the same way you synchronize your applications. We're going for one solution for everyone here. I want to talk a little bit about the design goals because if you understand how, what we were trying to do when we set out to accomplish things, I think it will really help you get into the mode of what we were trying to do. We started with a really simple precept. Your data, where you want it, when you want it.

And from that we derived three basic design goals. Syncs should be decoupled. Syncs should be extensible. And syncs should be invisible. Let's jump into what these mean a little bit more. Syncs should be decoupled. Now there's two things that we mean by this. Applications and devices should be able to sync independently of each other.

Sometimes when I want to sync with .Mac, my phone's not turned on. Later, when my phone's available, I don't have a network connection. And yet I still want my changes to flow back and forth between the two of them. So we want to be able to synchronize these things independently of each other. Now that being said, we also want to try to synchronize these things at the same time. If I'm syncing two devices independently, I have to do three syncs.

One to get the changes from, say, my phone into my computer, and then from my computer into address book, and then from address book back into my computer. And then I have to sync them. And then I have to sync them back into my computer. And then I have to sync my phone again to get the changes back onto the phone. So we want a way to be able to try and sync both of these things at the same time. It reduces system resources. It makes things go faster. It's easier all around.

The second kind of thing, the second kind of decoupling that we're talking about is that schemas are independent of applications and devices. The bookmark data model is not owned by Safari, it's not owned by Mozilla. Address Book does not own the contact schema. There's a clear separation between the two there.

The data model needs to be extensible. We provide some standard schemas for contacts, for bookmarks, for calendar entries. If you want to be able to exchange data between different applications and devices, you need to conform to that schema. At the same time, we want you to be able to achieve perfect synchronization.

We're not going to be able to think of everything that you're going to be able to synchronize in your application. We can't think of that ahead of time. And so you need a way to extend the standard data types to add your fields. You need to be able to add new data types if you want to synchronize photos, for example.

And the most important point is that syncs should be invisible. The user should not have to explicitly think about synchronization. The idea is that changes should just flow back and forth naturally between the two. To that end, we put your application in control. You sync when you want to sync. Now, it's important that you try and maintain application responsiveness during this. Sometimes, syncs can take a while. A device might be slow in responding. There might be a lot of changes going on. You don't want to give users the spinning beach ball of death there.

The model that we've come up with is what we call trickle syncing. There's nothing that we do to provide trickle syncing. It's a way that you write your applications to synchronize. The idea is that we do lots of frequent syncs. The more we sync, the fewer changes we do on each sync. The fewer changes we have to do on each sync, the faster the syncs go. The more frequently you can sync, and so on and so forth.

It creates kind of a virtual cycle there. So let's come back to the question of what is Sync Services and answer it again from a bit of a more of a geek perspective. It's a public framework. This framework provides administration. You register to synchronize. You register your schemas. It provides API with which you can give changes to the Sync Engine. The Sync Engine processes those changes and gives changes back to you.

It's a daemon. The Sync Engine runs inside the Sync Server there. It's a field level differencing engine, and I'll come back to that point a little bit later to explain exactly what we mean by that. But we're also coordinating syncs between multiple applications, devices, servers, and the Sync Server takes care of coordinating all of those. And finally, Sync Services is a UI.

We provide a standard UI for conflict resolution. We provide an airbag panel so that the user can protect their data from rogue devices. We provide an API for you to reset a specific device or an application. What I'm going to be focusing on in this talk is mostly the Sync Server and the Sync Framework there.

So let's get into some of the details of Sync. There's three things I'm going to cover here. We're going to talk about schemas. We're going to talk about clients. And then we're actually going to get into the meat of how synchronization works. So what can you synchronize? I don't know how many of you went to the core data talk a couple of days ago, but for those of you who were there, I'm going to do a quick recap over what we can synchronize. We use the entity relationship model. This is the same model used by the core data framework, and it's sort of based on the precept that if you can save it, you can sync it. On top of that, we've added some extras just for sync.

Now, the entity relationship model is an industry standard way of decomposing data. An entity describes a single, discrete thing, a contact, a phone number, a bookmark, an MP3, etc. An entity has a name. In our case, the name must be unique across the whole space. There's a global naming space for entities there. So we recommend using a DNS-style syntax, com.apple.contacts.phone. number, for example, to avoid or to minimize the risk of collisions.

Entities are composed of properties. A property is an attribute which describes a single characteristic of the entity, a first name, a last name, something along those lines. Attributes are strongly typed. You see on the monitor here the list of types that we support. It's a fairly rich type, and we can look at adding new types in the future.

The basic types: strings, numbers, data, date, etc. We've got some aggregate types. You can create arrays of these primitive types. You can create dictionaries of these primitive types. And we have an enum type. An enum is basically a string with a fixed set of values that are allowed. And the engine will enforce a certain level of data typing on this.

Relationships are very interesting. Often data by itself is not all that useful. What's interesting are the roles, the relationships between the data. A relationship describes a particular role between two entities. For example, a contact has one or more phone numbers. A bookmark has a parent. So a relationship is directional.

You have There are two kinds of ordinalities for a relationship. You can have a two-one relationship, or you can have a too-many relationship. A two-one relationship is a one-to-one association between two entities. A bookmark has a single parent. In a too-many relationship, you've got multiple relationships. A contact can have multiple phone numbers, for example.

There's the concept also of an inverse relationship. If you have a relationship from a bookmark to its parent folder, there's also going to be an interesting relationship from a folder down to the bookmarks that it contains. And we'll come a little bit later into why that's important to maintain. So, what happens when an entity is deleted?

This is, again, where relationships come into play. When you delete a particular entity, we find all of the relationships that point to that entity. And we nullify them out. You may also optionally say, "When this entity is deleted, I want you to traverse through this relationship and delete all of the guys that he points to." That's what we call a cascading delete rule.

Some of the extra stuff that we've added for Sync is the notion of an identity property. When you add a new record into the Sync Services engine there, we want to try and match that record up against existing records already provided against by other clients. Otherwise, we'll end up with duplicates. In the case of a contact, we want to find a contact that looks the same as this guy. And we use the identity properties for that kind of matching.

A contact, your identity properties will most likely be the first name and the last name. Any object, any record in the database that has the same first name and last name is probably going to be the same kind of record. And so the Sync Engine will merge those two records together.

An identity property can be a relationship or an attribute. For example, a phone number is bound to a specific contact. You don't want to match the phone number from one person to the phone number of another person. You want to match the phone number of another person to the phone number of another person. You want to match the phone number of another person to the phone number of another person.

You want to match the phone number of another person to the phone number of another person. You want to match the phone number of another person to the phone number of another person. You want to match the phone number of another person to the phone number of another person.

Even if they're the same phone number, they might be two roommates. And when one roommate moves away and you change the phone number, you don't want the other person's phone number to change at the same time. If you put the two contact relationship into the identity set there, what it's saying is that we will limit the set of records we look at to all of the records matched on that relationship.

There's the notion of dependent properties. For example, let's say that I have a calendar event. He's got a start date, and he also has a bit specifying whether or not he's an all-day event. These are two separate fields, and so I don't want to try and merge them together in the sync engine, but there is a semantic relationship between the two.

If I change the start date on one event, and I change the fact that it's an all-day event on another--on a different client, I want the sync engine to generate a conflict for that. By putting those two in the dependent properties set there, by marking them as dependent properties, the sync engine can catch and generate conflicts for those, even though they're two different fields.

I mentioned before the ability to extend existing entities to add your own fields, to add your own attributes. These are what we call entity extensions. You can even create new relationships on an existing entity to another entity or to a brand new entity. But it's very careful that you don't change the fundamental properties of the entity that you're extending. If you add a new cascading delete rule or something like that, you may end up causing bugs in some other clients that were depending on the original behavior.

Finally, we've introduced the notion of a data class. A data class is just an informal association of entities, an informal grouping of entities. There tends to be a lot of entities that are going to fall out in your schema. You've got contacts, phone numbers, URLs, AIM addresses, all of these kinds of things.

And yet, in the mind of the user, what they're thinking about for all of those things is the notion of a contact. And so the data class gives you a way of presenting a user-friendly name to your collection of classes there. The Sync Engine itself doesn't actually use data classes for anything intrinsic to the Sync operations.

Your schema is described in a file. This file contains the description of all of your entities, your extensions, the attributes on the entities, the relationships. It's a standard plist file. The format's well documented in our documentation, and it's contained in a Sync Schema bundle. That bundle can be located anywhere. You can include it in a framework. You can include it in your app wrapper. You can put it in a standard system location.

The key point to remember here is that your schema is decoupled from your application. Even if you're writing your application, your schema, and that's all that's going to be syncing that schema, in the eyes of the sync engine, the two are not related to each other. Your app does not own the schema.

We provide three standard schemas for contacts, calendars, and bookmarks, and those are located in System, Library, Sync Services schemas. And I encourage you to go in, open up the schema bundle, find the plist, and have a look through it to see the standard formats. Unfortunately, we don't have any documentation on that yet, but we hope to be addressing that at some point in the near future.

So let's come back again to the what can you synchronize question. We've talked about entities, relationships, attributes. This defines the data model that you can synchronize. What you actually synchronize are records. A record is the basic unit of exchange. In terms of the API, it's expressed as an NSDictionary. The keys are the attribute and relationship names. The values are the types that correspond to the associated attribute or relationship.

A record has an identifier, and that identifier must be unique across your entire entity space. If you have a contact entity and you have a phone number entity, the records are, the identifiers must be unique. And this is where we differ a little bit from your standard database terminology.

One key point is that your record dictionary must contain an entity name. You have to tell us what kind of entity your record is. The Sync Engine depends on being able to know what kind of record it is so it can do the right kind of matching, the right things with the field values.

So we've talked a lot about applications, devices. We sort of mentioned that the Sync Engine is a little bit agnostic. It doesn't much care whether your client is an application or a device. And so we came up with this generic name for them. We call them Sync Clients. A Sync Client has a unique identifier. That is how your client is identified to Sync Services. You can also give it a user-friendly display name and an image for display in some sync UI.

There's a one-to-one correspondence generally between a client and a data source. In the case of an application, the association is pretty clear. But in the case of a device, you've got to think that I'm writing a client for a specific kind of device, but the user may end up with multiple copies, with multiple kinds of devices. I've got many different kinds of phones. I've got a couple of PDAs, two iPods. Each of those corresponds to a unique client that you register with the Sync Engine.

A client description file provides a template description for your client. It generally contains just the static details. The type of your client, is it an application or a device? The list of entities that you've synchronized, and the specific properties on those entities that you synchronize. That's important to note.

Just because an entity defines a set of fields doesn't mean that your client is going to want to synchronize all of those fields. And so you can specify, I'm only interested in synchronizing, say, the first name and the last name on the contact entity. Some clients only push changes. Some clients only pull changes.

The iPod is basically a pull-only device. You only pull information onto the iPod. The iPod is never going to change that information. By specifying that in the client description file, you can help the Sync Engine optimize some of its processes. Now, a lot of the times there's going to be a one-to-one association between the client and the client description.

And so you can also include a display name and an image directly in the client description file. And if you're writing an application, that's probably what you're going to be using most of the time. You can also specify that information dynamically using the Sync Services. So when a phone is registered, when the user decides, I want to synchronize a phone, your client can figure out what kind of phone it is and register the appropriate name and image for that phone.

Let's get now into the actual meat of how Sync works. There's basically five phases of Sync. I'm going to cover them briefly here just so we have a framework for the conversation, and we'll start diving into the nitty gritty. The first thing you do is you create a Sync session. You then negotiate how you're going to sync.

You push your changes into Sync Services. We process all of those changes, and then you pull what changes are due to you back out of Sync Services. Now, there's a couple of things that you have to know first. And what I'm going to cover here is the truth database. The Truth database is an aggregate of all of the information from all of the clients.

If you have a client that is synchronizing contacts and he pushes in just the first name and the last name, you've got another client who's synchronizing the first name, the last name, and the company name. What we store in the truth is the aggregate of all of those fields, the first name, the last name, and the company name.

The truth is what the clients sync to, not with each other. If you remember I mentioned earlier that clients are decoupled from each other, and this is how we accomplish it. A client can sync into the truth, another client can come along and sync into the truth, and then they can pull their changes directly out of the truth.

Now we are storing a copy of the data here, and that's worth keeping in mind. If you're going to be synchronizing photos, if you're going to be synchronizing large data files, or things like that, you probably don't want to push that information into the truth. Because then you're going to end up with multi-gigabytes of data lying around on the user's disk. We're going to come up with a solution for that at some point in the future.

The Client State Database. What this contains is a snapshot of all of the records on a device. We need this information for a couple of reasons. What we do when we know a record is on a device, when the device gives a record to us, or when we push a record to the client, we store a copy of that record in the client state. The reason we do this is so that on the next sync, if the client gives that record back to us, we can pull what we knew was on the client before out of the client state and compare the two of them.

From that, we can figure out has this record changed at all? We can figure out specifically what fields on that record have changed. What we push into the sync server are just the field level differences. If you change just the first name, we're not going to push the whole record across. We're going to push just the first name across into the mingler.

The other place where this is used is when we're formatting records. I mentioned earlier that some devices have limitations on the lengths of the fields that they can store. A phone may truncate names at 20 characters, for example. So if we take a really long name and we push it onto the phone, the phone's going to truncate it.

If the phone then gives that record back to us, what we would do is we would look at the shortened name, we'd compare it to the longer name in the client state, we'd say, "Hey, this has changed," and we'd end up propagating the truncated name everywhere. And that would make people generally pretty unhappy.

So what we do is we store in the client state the formatted record. We're going to store the truncated name in the database there, in the client state, so that when the client gives that record back to us, we'll compare the two fields, we'll say, "Those are the same. It hasn't changed." Unless, of course, the user has actually physically changed the name on the device.

Now you're probably thinking, "Oh great, they're storing yet another copy of all of my photos and my contacts and the things." Well, it's not that bad actually. What we store in the client state is really just a hash of the information that we push to the device. Just enough information so that we can do that comparison successfully.

One important thing to understand is that the record identifiers are scoped to a particular name space. Each client has its own name space. The truth database has a name space, and there is no correlation whatsoever between any of these name spaces. So if a client has a record called foo, another client may have a record called foo, and those can be two completely different records. There is no association between the two of them there.

So putting everything together, the way things work is this. A client is going to take a record, give it to Sync Services. Sync Services is going to pull the record out of the client state and compare them. If the record isn't in the client state, then we know that this must be a new record, and we push what we call an add into the Sync Server.

If the record exists in the client state, we compare the two and push just the field level differences. So we're going to take a record, give it to Sync Services, and we push what we call an add into the Sync Server. The Sync Server processes those changes, merges them into the truth, and clients then pull all of the changes out of the truth.

So, coming back to the start of the process, creating a sync session. The first thing you have to understand is you may not be allowed to synchronize, and there's many reasons for this. It could be that some other client is already synchronizing. Now, because the sync server is writing into a common database, we can't allow multiple people to all synchronize at the same time. We need to maintain a certain state of integrity of the truth database there, and so we can only process sets of clients at a time. So, if a client is already in the middle of synchronizing, other clients must wait until that guy has finished.

It's important to be able to maintain application responsiveness throughout this. So we provide both blocking APIs for convenience, but we also provide non-blocking APIs so that you can basically request to sync services. I'd like to start a sync session now, please. Generally, you'll probably be able to go straight away, but if you can't, we'll call you back when you're ready to go.

Now that being said, I also mentioned earlier that we want to synchronize clients simultaneously. This is not a contradiction. What Sync Services provides is the notion of a sync alert. When you register your client, you can specify the kinds of clients that you want to synchronize with. Address book pretty much wants to synchronize with anything.

So he'll says, "I'll synchronize with apps, I'll sync with devices, I'll sync with servers." A server would probably only synchronize when other servers are synchronizing. A phone would synchronize when a server is syncing, or when another phone is syncing. So you specify at registration time who you want to sync with. When one of those guys then starts syncing, Sync Services delivers an advisory notice to the clients that have registered an interest. This is an advisory notice only. You're free to ignore it if you're not ready to sync. We definitely encourage you to sync if you can.

There are two ways that the notice can be delivered. We can launch a tool that you've registered. We specify on the command line to that tool the ID of the entity that's being synchronized, the ID of the client that's being synchronized, excuse me, and the list of entities that are being synchronized with that client.

Alternatively, you can register a callback directly in your application, an object and a selector, and we will invoke that selector saying, "Hey, now's a good time to sync if you like. If you don't want to sync, simply return without doing anything, and we'll pass you by this time. You can always sync later."

Now, why would you want to choose one method over another? Something like a server or a device is probably going to register a tool to actually do the synchronization. They don't necessarily have any multiple writers issues to worry about or anything like that. When they want to synchronize, we just launch the tool, and all of the logic's embedded in that tool. An application like iCal, on the other hand, when it launches, will register a dynamic callback. While iCal is running, we can call that callback to tell iCal to synchronize. When iCal quits, the callback is deregistered implicitly, and iCal won't sync anymore until the next time it launches.

After you've successfully created your sync session, you go through and you negotiate the sync modes. Now there are four basic sync modes that we need to talk about here. The first of these is fast syncing. Fast syncing is the preferred mode of synchronization. When you're fast syncing, you're basically just telling the engine what changes have happened since the last time you synchronized.

You tell the engine, "These are the records that were added since I last synced. These are the records that were modified since I last synced. These records have been deleted since I last synchronized." That kind of implies that you can maintain all of that state information, and not all applications, not all devices are set up to do that. Sometimes, even when you can maintain that information, you may not trust it. If you synchronized a device with another machine, that information may be out of date.

In that case, you will want to slow sync. When you slow sync, you basically give all of your records to Sync Services, and we figure out what's changed. You remember in the client state, we store a complete copy of all of the records that we knew to be on your device or in your application the last time we synchronized.

When you give us all of the records, we basically go through and we check off the records in the client store one by one. Anything left in the client store afterwards is a record that used to be in your client but isn't anymore, and we will generate a delete for those records. And that's a very important point to keep in mind. When you slow sync, you tell us about everything, we figure out what the changes are, and delete the records that you didn't tell us about anymore.

Sometimes bad things happen. A device can be reset. The user can accidentally delete your data store. If you were to slow sync at that point, what would happen is this. We knew you had all of these records before. Now you tell us you've got nothing. You must have deleted everything. So we delete everything in the truth. .Mac synchronizes. We delete everything on .Mac. By the time you get home, all your data's gone. I'm sure this has happened to some of you in the past.

In this case, what you want to do is do a refresh sync. When you do a refresh sync, we throw away everything in the client store. We forget everything we ever knew about you, and we go through this process of rediscovery. You give us all of your records. We're going to pass those into the sync server. We're going to let him figure out. He's going to take each of those records, compare them to existing records in the truth to try to find a match.

No deletes will be generated, but what will happen is anything in the truth that you didn't give us is going to come back to you at that point. So if your data store has been reset, if your device has been erased, you need to be able to tell us that so that we can do a refresh sync.

We also have this notion of pushing and pulling the truth. There are times where a user just wants to erase everything on a device or an application or a computer and say, replace it with the contents of this computer. I've got all of these contacts in address book. I know they're in a good state.

I want all of those on my phone. That's a mode that we call pulling the truth. What happens when you pull the truth is we expect you to delete all of the records in your client's data store and replace them with the records that Sync Services gives you.

The converse to this is pushing the truth. When you push the truth, what you're saying is, I've got a known good state in this specific client here that I want to replicate everywhere, through .Mac to all my other computers, to all of my other devices, and to all of my other applications. When one client is pushing the truth, Sync Services will tell all other clients to pull the truth. This is a very destructive operation. So you only want the user to initiate this operation. Clients themselves should never try to push the truth.

Now that being said, none of these sync modes are actually reflected directly in the API. Instead, what we did was we looked at these and said there's a lot of commonality between all of these different sync modes. When you're slow syncing or refresh syncing or pushing the truth, we want you to push all of your records out into Sync Services. In some cases, when you're pulling the truth, we don't want you to push any records. Sometimes we don't want you to pull any records at all.

And so what we've done in the API is we've focused in on those specific actions, and we've oriented our API around those actions. So don't be surprised if you go looking in the API and you don't see fast sync or slow sync or refresh sync mentioned anywhere. It's the concepts that are important.

So let's come back to pushing changes now. You've got a choice when you push your changes. First thing you're going to do is you're going to ask Sync Services, should I push all of my records, or can we fast sync here? If you can fast sync, then we only want you to tell us about the records that have been added, modified, or removed since the last time you synchronized. You've got a choice here too. You can do the hard work. If you know what specific fields have changed, you can tell us. We just changed the first name on this guy, we deleted this record, and we changed the company name on this guy.

Alternatively, if you don't want to go to all of that extra effort, you can just give us the whole record, and we'll figure out what's happened by pulling the information out of the client state there. We're going to package those things up and push the field level differences over to the Sync Server.

So what happens if something goes wrong right now? You're in the middle of pushing all of your changes and the device runs out of battery, or your application crashes, or God forbid, Sync Services crashes and takes you down with it. What happens at this point? When you first start pushing changes, we create an implicit transaction scope.

All of the changes that you push are going to fall into that transaction scope, which is closed when you tell us, "I'm done. I've got no more changes for you." And we ship the whole thing off to the Sync Server. If something goes wrong in the middle of that transaction scope, we're going to unwind the whole thing. We're going to roll them back. We're going to forget all of the changes you made.

The next time you synchronize, if you're smart enough to be able to figure out, "Well, I was halfway through pushing at that point, so I need to re-push all of those changes again," then by all means, go ahead and fast sync. It might be safer to slow sync at that point, however. You can tell the engine, "I'm just going to give you everything. You figure out what's changed."

Now, some of you might be wondering, why do they roll back all of the changes that we've already given them? Why don't they just take what we've given them, process them at that point, and we'll pick up where we left off? The problem is that when you introduce relationships into the question, there's a whole set of data integrity issues that come up, that come into play.

You might have pushed a couple of records in that refer to some records that haven't been pushed yet, because the device ran out of batteries, so you couldn't get those records. And so we erred on the side of safety, and just said, we're going to replay the whole thing to get back to a known good state. What we want to do is protect the data in the truth.

Mingling is the heart of Sync. This is where we take all of the changes from all of the clients and we merge them into the truth. We process the changes on a client by client basis. So we take all of the changes from address book, we merge them into the truth. Take all of the changes from .Mac, merge them in.

All of the changes from the phone, and merge them in. It's here that we do our conflict detection. If a phone has changed the first name, and the first name has also changed on .Mac since the last time it synchronized, we need to generate a conflict at that point.

Again, let's ask the question, what happens if something goes bad here? The answer is, you don't have to worry about it. That's our problem. Once those changes have been handed off to us, we're responsible for them. We will make sure they get into the truth, or we will take steps to recover by asking for all records from all of the clients again. So, we're going to use the schema bundle to specify some code that gets loaded into the Sync Server to handle those conflicts. He gets first crack at them.

When we detect a conflict, we're going to ask this code, can you deal with this? If he says yes, we'll merge the response into the truth. If he says no, we're going to store the conflict off on the side. We don't want to pop a panel up in the user's face right in the middle of sync.

Remember, applications and things can be synchronizing at any time, and having panels popping up saying, what do you want to do about this? What do you want to do about this? It's going to get really stale really quickly. Instead, what we do is we save the conflicted records off to the side, and we notify the user through a little UI element that he's got some conflicts. We need his attention. And when the user's ready, they can pull those conflicts up and resolve them, and they'll be merged in the next time they synchronize.

Pulling changes is pretty much the easy part. Clients pull changes directly from the truth database. They don't pull them from the Sync Server. Once the Sync Server has finished mingling, he's done. He's off, and someone else can come in and synchronize at that point. What we do is we maintain a snapshot of the truth database for some self-consistency there. It's held as long as clients are pulling truth out of it.

When you're getting changes out of the truth, you have a choice. We give you both the deltas. We also give you the full record. So you can go in and look. Did I change just the first name or the last name? Or you can take the whole record and push it onto the device or into your application. You can filter out records that you don't want.

There's two ways this can be done. We can give you all of the records, and you can tell us, I want this one, I want this one, I don't want this one. Sometimes it's much easier if you just write a little filter independently that's loaded into sync services that does that filtering for you so that you only get the records that you're concerned with. For example, in the phone device configuration UI, I might want to specify I'm only going to synchronize contacts in this one specific group.

We've got some standard filters for that kind of thing. And so your UI can say, let's use this filter for this client. That gets loaded into sync services. It gets rid of the records that we don't want. And only gives to the client the records that pass through that filter.

When the engine gives you a new record, we're going to make up an ID for that record. The reason is that there may be relationships referring to that guy. We need to know what to call him. We're going to use a UUID for that, but that may not always be convenient for you.

If you're going to push a record onto a phone or store it in your own database, you probably want to use your own identifier for that. And so record identifiers can be changed. Any earlier references in a relationship that we've already given to you, we can't change those, of course. But once you change a record identifier, any references that we give you after that will use the new record identifier.

Sync Services uses a two-phase commit process at this point. Again, to answer the question, what happens if something goes wrong? By two-phase commit, in this case, I mean when the engine gives you a record, you tell the engine, yes, I want this record, or no, I don't want this record.

Up to you to decide, but you've got to tell us one way or another. If you don't tell us that you accepted or rejected a record, we're going to give it to you on the next sync. and the next sync until you tell us what to do with that record.

Now, if you're talking to a low latency device over a USB connection to a phone, over a dial-up connection to a server, it's not going to be terribly efficient if you have to push the record and then tell us you accepted it, and push the record and tell us, yes, it made it there, okay. And so we allow you to do this batching process. What you can do is just tell us you accepted a record, got this one, got this one, thanks.

And then you tell us, I've committed the records that I told you I accepted or Now, if you're talking to a low latency device over a USB connection to a phone, over a dial-up connection to a server, it's not going to be terribly efficient if you have to push the record and then tell us you accepted it, and push the record and tell us, yes, it made it there, okay. And so we allow you to do this batching process.

What you can do is just tell us you accepted a record, got this one, got this one, thanks. And then you tell us, I've committed the records that I told you I accepted or What happens if something goes bad is we unwind that implicit transaction scope. When you first start pulling changes, we create a new transaction.

As you accept and reject changes, we write them into the transaction. And when you commit those acknowledgements, we commit and close that transaction and implicitly create a new one. So if something bad happens, we're going to unroll back to the last time you called, committed those changes, last time you told us you committed those changes, and we're going to give them to you on the next sync.

So the five phases of sync, you can think of them as a finite state machine. The phases must be traversed in order, but they can be canceled or finished at any time. The typical application sync model that we recommend for people is this. When you first launch, do a sync to pick up any changes that have been made since the last time your application was run. Do it in the background. Again, remember to maintain application responsiveness.

We give you the methods to query whether you need to slow sync, or we can tell you whether you think a sync is going to take a long time. If so, pop up a panel to the user and say, "Hey, this may take a long time. Do you want to do this now?" Or if you're going to be even more sophisticated, just do the sync in the background and let the user carry on with the app normally.

As they make changes throughout the course of the application, trickle sync periodically to push those changes out. When you save to disk, do a sync to get the changes out. When you quit, what we want is to get those changes out to the sync engine again. You don't necessarily at that point want to wait for the whole sync to complete. The user's quitting.

You want to get out of there as quickly as possible. What you can do is create a session, push your changes, and then finish it. You're done. You don't have to wait for the mingling. You're definitely not going to pull any changes at that point because they're quitting the application. They don't need it.

The device has got a much simpler model, typically. When a user initiates a sync, there's an explicit action on the part of the user there. The device is plugged in, they hit the hot sync button, a sync alert comes in because some other device is synchronizing. Just go through the whole sync at that point.

So let's talk a little bit about the API. What I'm going to do here is not actually get into the details. As I said, we've got some great reference documentation on the Tiger DVD. What I want to do is just give you a roadmap to some of the more important classes. Now, the API is Cocoa-based, but it's procedurally oriented for a number of reasons. What this means is it's easily wrapped in Java.

I've got a lot of experience doing that, so we kept that in mind while designing this API. You can also use it easily from C. We support almost all of the core foundation types, the core foundation toll-free bridge types. Most of the more important data types are toll-free bridge types. And so again, using this from Carbon is no problem.

There are five classes that we're interested in: iSync Manager, iSync Client, iSync Session, iSync Change, and the Record Snapshot. I-Sync Manager is the singleton object. He is your basic administrative point of contact there. He's where you go to register your schemas. He's where you go to register your clients. He's where you go to look up the clients that have been registered. Not a lot to him there.

iSync Client represents a registered device or application. This is where you can get information. The identifier, the display name, the image of the guy, what entities does he support, how is he going to sync. I want him to reset the next time he syncs. You can specify how he's going to synchronize. And use iSync Client to set up sync alerts to specify, I want this tool to be launched when these kinds of clients start synchronizing.

An iSync session encapsulates the whole sync process that we just talked about. He's got all of the methods to walk you through the state machine there. He's got the methods that you can use to query, how should I sync? He's got the methods to allow you to pull the changes out, to accept them, and to commit them.

And the key point to know here is that there is only one sync session per client, per machine allowed. We've got a Sync Serve that's going to gate that, and we will not let the same client sync multiple times. So you don't have to worry about preserving that kind of semantics.

An iSync change encapsulates all of the changes to a single record. The change specifies whether it's a new record, an existing record being modified, or a record being deleted. And he contains all of the field level changes to that specific record. The changes that you get from the Sync Server will contain those field deltas.

It will also contain a full copy of the record. If you're smart enough to be able to tell the Sync Server what the field level changes are to your records, you can create one of these. You only need to specify the field level deltas. You don't need to give us the whole record.

[Transcript missing]

The times where you might want to use the snapshot, for example, are to give you an example, in the case of a phone that's synchronizing calendar events. A phone doesn't actually synchronize the calendar lists themselves. And yet, when you create an event on the phone, it has to be filed in some calendar. So there's a bit of a paradox here.

What you can do is you can use the snapshot to get the list of calendars out of the truth. You can let the user in the configuration UI choose a specific calendar and remember the ID of that guy. And when your device, when your client is pushing those new calendar events into Sync Services, you just set up a relation saying, this guy belongs in this calendar, even though you're not syncing him. One thing to note is that the truth database is organized to be efficient for sync, not efficient necessarily for you. So don't use this too often as a general purpose database API. You'll find the results a bit disappointing in that respect.

So let's have a quick recap. What do you do? You register your schemas, you register your clients, you push data into Sync Services, you pull your changes out of Sync Services, and you provide the UI to configure your client. We take care of all of the rest. We synchronize your data, we detect conflicts, and we provide a standard UI for the user to resolve those conflicts. We give you an airbag to preserve the data integrity, and we provide a .Mac client to synchronize data between multiple machines. The design goals that underlie everything that we've done with Sync. Decoupled. Decoupled clients, schemas separate from the applications. Extensible schemas, and Syncs must be invisible.

Now, if you have questions, you can contact Patrick Collins or Xavier, and we definitely encourage you to file bugs. Radar is our friend in that respect. For more information, we've got a lot of great documentation. The reference documentation is on your Tiger DVD. It's also available on connect.apple.com.

The concept documentation, which I highly recommend you read, some great docs there, is only available on the web at this point. We didn't manage to get it onto the DVD. And we've got some sample code for some sample applications. It's in the usual place, in Developer Examples, Sync Services.