Sync Services: A Complete Tour - WWDC 2008

Integration • 1:05:58

The Sync Services framework provides data synchronization between Macs, mobile devices and the .Mac service. Get to know the APIs, tools, and techniques available to keep your application data in sync across multiple Macs and learn best practices for syncing Contacts, Calendars, and Bookmarks to iPhone. Master the art of designing a sync schema for your application, and find out how Leopard makes syncing effortless for Core Data applications.

Speakers: Andy Belk, Bruce Nilo

Unlisted on Apple Developer site

Downloads from Apple

SD Video (612 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

If you weren't intending to attend this, you're probably in the wrong room. So what I'm going to do is go through a number of aspects of sync. First I'm going to talk about what's new overall. Then I'm going to go through sync services for those of you who are new to sync, because I imagine there are quite a few people who have come for the iPhone and haven't used sync services on the Mac before.

Then we'll talk a little bit about what we had in mind for Snow Leopard and what kind of things we've done there. What new and improved features have been added, new APIs. And then Bruce is going to come on stage and talk about some of the advanced sync stuff. So for those of you who are experts, who've done this before, who have already written sync client or are developing one, that should be useful for you.

What's new outside of Snow Leopard? A little bit of history. So back in 10.3, we had this thing called iSync that would let you sync to your phone to .Mac, and it would sync things like address books and contacts. So it was kind of restricted to that, and it was all Apple provided.

In 10.4, the big thing that we did is add the ability for you to write your own sync client. So that gave you access to our generic extensible sync engine, public API. We had data classes built into OS X that would sync to the phone, that would sync to .Mac, that could sync to other applications. And you got a bunch of benefits just by leveraging the framework that we supplied. Now in 10.5, we added some additional data classes like Dashboard, Preferences, Doc Syncing, Mail Notes, and additionally some other applications that sync. And then along came the iPhone.

So, since this time last year, a number of things have gone on. One of the things that we talked about at WWDC 2007 was a thing called the .Mac transition. That was where .Mac had been syncing some data types with an old mechanism that was sort of backwards compatible with Panther, and we were switching everybody over to the new one, which was more performant and more reliable.

We went ahead and did that, and that was pretty transparent for everybody concerned. Certainly none of the developers had to do anything, so that was the main goal of that. In addition, related to iPhone sync, we added Yahoo! Contact Sync, so that was available in 10.4 for iPhone and iPod Touch customers and available on Windows. And in 10.5, we made it available for everybody.

And in 10.5.3, we introduced Google Contact Sync. We've been working with Google on that. And that will be coming out shortly on Windows as well. And in Snow Leopard, it's going to be available for everybody. So what you can see is the sync ecosystem, as we call it, has moved along over time, and we continue to evolve this. So now, of course, .Mac has become MobileMe. We've got Microsoft Exchange in the picture, syncing to the iPhone.

And now we've got Windows in the picture because we've got Windows syncing to the iPhone and we've got Windows now syncing to MobileMe as well. So the whole sync environment is getting kind of complicated. So again, just to summarize, MobileMe, same sync capabilities for the Mac as you had before, so all the same data classes are going to be syncing. They've added new online applications, which give you all this great friendly access to your data online. With Windows, the addition of MobileMe for Windows gives you sync between Mac and Windows and between Windows and Windows.

And there's also over the air for the iPhone. And the way that works is that your desktop, be it a Mac or Windows, is syncing up to MobileMe, which is syncing down to the iPhone. So there's some other things that we haven't really got time to cover or we haven't got around to thinking about. With iPhone Sync, you get Bookmarks syncing, Contacts, Calendars, it's all built in.

When the next release of the iPhone software comes out on July 11th, Tethered is built in, Over the Air will be built in with MobileMe. One thing to be aware of is that you'll have to choose one or the other. And that's to avoid what we call sync loops or triangle problems. Some of the existing sync developers will be familiar with that.

As regards third party iPhone application sync, if you're writing a third party application for the iPhone, you might have been wondering whether you can sync it with your desktop. And at present, we don't have any API on the phone for sync. So the current suggested solution is that you have some sort of server running, and you sync directly to that server.

So, what I wanted to do now was to just very quickly go through a summary of what sync services is for those of you who are completely unfamiliar with it. We weren't given a lot of time for this. We've only got one session. So, I'm going to have to be very, very quick.

Any questions? Okay, no, right. So that was actually the presentation from 2005 in its entirety, which was a great coverage of sync services, but it was a little bit condensed. So I will actually really go through this. So the big picture of sync is all of these cooperating applications and devices exchanging data.

So a few concepts and terminology so that at least when we get on to the later part of the talk, some of it will be more familiar to people than it is right now. So we're syncing all this data around. What's going on? What's happening to it? Well, first of all, when you give us this data as part of the sync process, we store it in something that we call the truth. So we're keeping a copy of any data that's provided to us. That enables us to do differencing when people send us new changes and to see what's changed since the last time they synced with us.

Do you as the developer, or we as Apple, describe that data? Well, we use an entity relationship model. And so you can think of the schema as sort of sitting between the application and the sync engine and its truth data. And the schema sort of represents what we put in the truth. So this is a sort of overall picture of what a schema would look like.

The schema consists of one or more data classes, which is sort of a collection of entities, and entities which represent the kind of thing that you want to sync, and they have attributes and properties and relationships. So, the sync schema is describing what you have to the sync engine. An entity is an element of the data model that you're describing.

The data class is a collection of those entities. We have relationships. So, for example, if you have contacts and phone numbers, you can have a relationship between those two things. And then we have properties, which are things like name, phone number, type of phone number, start date of an event, something like that.

So this is a simple example. Say you have a piece of data that you want to describe, and it has two things: the name of the event and the start of the event. What your schema contains is the name of the property, which is name, and the type. And again, start as a date, because it's a date of a vacation.

Now this is what a more complicated schema would look like. These are the sort of the entities which might exist in the context schema. As you can see, we have email addresses, street addresses, because any particular contact can have more than one. So those represent relationships to other entities which contain those email addresses. Contacts can also be in groups, another relationship, so you can group these things together.

Now, that part covered describing the entirety of the sync data that was available for any particular kind of data class. But how do you describe what your client does? Well, we use a thing called a client description. It's a property list that you store in the file system and you register with us when you first start up your client. So it kind of syncs between, it kind of sits between the sync client and the sync engine, but it's more associated with the sync client.

So again, the sync schema is describing what data is for the sync engine, whereas the client description is kind of a bridge between the application and the sync engine. The sync schema is describing the entirety of what you can possibly sync. Whereas the Sync Client Description is again, it's specific to your client, what you want to sync. So that may well be a subset. You may not want to sync the middle name of a contact, so your device doesn't support middle names.

Now, in your application, you have some data Which is in, I don't know what format. It could be in SQLite, it could be in some proprietary database format, it could be just written out in property list format, something like that. Sync Services uses a very generic format in order to represent data.

We just use dictionaries essentially as a record format, and we use key value kind of coding. So, one of the main things that your sync client will have to do as it interacts with the Sync Services API is convert the data to and from your proprietary format back into sync services format and the converse.

Okay, so we figured out we've got to translate our data around, we've got to describe, we've got our client description, we've got our schema. Now we're ready to go. What do we do? Does it actually start to sync? Well, a sync session represents a transaction with the sync engine. It consists of five phases.

So at the beginning we start the session. Now the session can be started either by the client itself requesting a sync, or it can be actually started by the sync engine telling your client that a sync is about to start and would it like to participate. So for clients like .Mac, when .Mac syncs, and I probably should use the term mobile me, when mobile me syncs, It's usually syncing a whole bunch of different things, and quite often it's going to invite all sorts of applications. So on a basic Mac OS X system, we're going to start up Address Book Sync, Calendar Sync, Safari Sync, Mail Sync, etc.

So once your session started, from the client's perspective, what you do next is that you have to decide what kind of sync session it's going to be, what you're going to do. There are different kinds of sessions that you can perform depending upon the state of both the client and the server.

So these are the basic four listed. I don't want to go into these in too much detail because Bruce is going to cover those later. But essentially, you need to know what you want to do, and the sync server, the sync engine might override what you think you're going to do and tell you to do something different. And this basically boils down to whether you send things that have changed as far as your client is concerned, or whether you send the whole data.

So the next phase is the push phase. That part of the process, and this represents these five states, basically represent a finite state machine that the sync engine is managing. So then the push phase, it's your client's responsibility to provide the data to the sync engine. So graphically speaking, it looks something like this. You've got your data in a database, and over it goes to the sync engine.

Now that's when we get involved. We then look at the changes that you've made. These might be updates or adds, deletes. You may have added something to a relationship, you may have removed something, you may have reordered things, etc. So the sync engine looks at those, makes sure that you're not breaking any rules, and then propagates those into the truth. Any of the resulting changes come back into the sync engine, ready for the next phase of the operation, where the client then pulls the changes from the sync engine.

At the end of all that, we'll either finish the session and we're good to go, or you might cancel the session, or the sync engine might cancel the session if something was amiss. And we can actually cancel the session in most of those different phases for some reason or another. If there's a runtime exception or something serious like that, that will usually cause the session to be canceled. In which case, the transaction didn't happen and everybody goes back to where they started.

At the end of the sync session, the idea is that your application data is now looking the same as what was in the truth and vice versa. So everybody's in sync. So that's the five phases. I'm assuming some of you are beginners with sync. What's the best way to get started in this endeavor?

Well, if you're really new to sync, you might well be new to Cocoa programming and Mac OS X development, and you might be writing a new application, and you might be thinking about using Core Data for that application, because it's a kind of logical consequence of wanting to store stuff.

If you're in that category or you already have an existing application with Core Data and you want to sync that data around, well, we have this technology called Core Data Sync, and that is very straightforward. You simply mod--you annotate your-- Your Core Data model in Xcode. Tell it which properties you want to sync and which relationships you want to sync. And basically we'll go off and deal with most of the hard work, track the changes, etc. So this is great because it minimizes the amount of code you need to write and minimizes the amount of testing you need to go through.

Underneath we have some other additional things. We're making sure that you don't, the changes that your application makes doesn't result in you syncing every time. There's a bit of throttling heuristic. And I wanted to note that the one caveat with this is that some of the existing schemas, because of the way they're set up, can't be synced like this. So this is primarily for your application data that's custom to your application.

So as I said, all you need to do, annotate the Core Data schema, add a few lines of code, It looks something like this. So the registration part, referring back to what I was talking about in the intro, registration is when you basically say this is the schema I'm interested in and this is my client description.

This is a bit of Core Data magic. It says, I want to track sync changes. And this is how you start a sync. And that's pretty much it. So, very little code involved. Some people say, "Well, Core Data is way too complicated. I don't want to do Core Data." Well, we now have Preferences Sync. In Leopard, we introduced a mechanism which if the user just turns on Preferences Sync in System Preferences, all their preferences for all of their applications pretty much automatically get synced over MobileMe back and forth between Mac and Mac.

So that's kind of interesting because you basically don't need to do anything to your application as long as you're using NS user defaults or CFPreferences to store your preferences, we're going to sync them on your behalf. So for some applications, if you don't have a large amount of data for yourself or you, you know, literally your application is fairly simple and most of what you want to sync is stored in preferences, then this will do it for you. And there are a couple of exclusions that we do automatically.

We don't sync by host preferences because they are meant to be specific to a host machine. We filter out some things that change too rapidly. And we have a built-in heuristic, again, to throttle syncing for those applications. applications that change your given preference too frequently. But that's great. So that, for some people, that just may give you your sync for free.

Then there are some more complicated clients. And specifically, these are the ones where Core Data Sync isn't going to cut it because you want to sync Contacts, Calendar and Bookmarks, something like that. So this can get complicated. Sync is a highly cooperative endeavor. You know, your application, when it starts syncing with some other application, you're basically saying, "If that guy decides to delete this record, then I have to trust him and I have to delete mine." And vice versa. So you have to bear that in mind when you're writing a sync client, that you're inherently going to start trusting other applications to do the right thing. Not the least of trusting us to do the right thing.

So the first thing to think about is start simple. We don't want Sync Cone to be the entirety of your app. We're trying to make this as simple as possible. So we provide this class called iSyncSessionDriver, which does most of what a sync client needs to do in a sort of packaged manner. It handles the control flow for you, that whole phase diagram thing. You don't really have to worry about the pull and the push and the mingle, etc.

And for 90% of applications, that's probably fine. And basically, the only thing you need to worry about is that whole data transformation thing. You know, I've got my data here, I need to put it in sync records and convert it back and forth. You also need to figure out How you're going to describe your schema. That again can get kind of complicated, so Bruce will talk a little bit about the details of what's involved in thinking about a schema and what we call identity properties.

So, okay, so we tried to persuade you to use iSync Session Driver, but no, that's, no, you've got other things to do. You want something a bit more complicated, a bit more meaty. So, why would you not want to use iSync Session Driver? Well, you may want a little bit more control over how things work. That whole push-pull phase thing, you know, you may be talking to a remote server, you may be doing some other things, but you need a little bit more fine-grained control over what goes on there.

So, in that case, you may want to use our more complete API, which is available, where you get involved and you have to handle the control flow and the phases as well. The other reason for doing that is that you may want 10.4 compatibility because iSync Session Driver was only introduced in 10.5. So that's it.

Okay, so what's new in Snow Leopard 10.6? We have a few things going on. The focus for Snow Leopard for us is making life better for the end user. Improving performance, which of course is also, in principle, making things better for the end user. So we've got a couple of subcategories: Usability and Resilience.

So what are the complaints that we get with syncing, which again, if you're completely fresh to syncing, you probably won't be aware of it, but some of those veterans in the crowd will know about this. We have built-in checks so that if a large number of changes come through or we detect a conflict, we'll put up a panel and we'll ask the user, you know, which, whether they want to accept a large number of changes. Or if it's a conflict, which version of the records that are conflicting do they want to use? But there have been a few problems with that in the past because sometimes we've put up conflicts where the user can't really tell the difference between the two sides.

And for the data change alert, when we say, "Hey, you've got 100 contacts that are being deleted." We're showing things which aren't relevant. So those have been trimmed away, and we're trying to make sure that this is streamlined and only appearing when absolutely necessary and providing the user with completely relevant information.

I see there are some people who have experience with this. Interesting. And I might want to point out on that. There are things that we can do for our schemas, so for things that Apple is responsible for, like contacts and calendars, et cetera. We can make those improvements.

But if you're a developer and you've added schema extensions, for example, or you have your own schema and your own data that you're syncing, it's important for you to think about these things and about what you want to present to the user and try and make sure that that is... any dialogue that does appear is going to make sense.

So the other thing that we've done is focused a little bit on resilience. We've refactored the sync service so the parts of the process of syncing run in a completely separate process. So that sort of bulletproofed the whole system and generally makes performance better. We've made a number of fixes to filtering both in 10.5 updates and in Snow Leathern. We fixed some thread safety issues that we discovered after a bit of testing on 8-way Xeons, which was nice.

And we've been doing some work to make the Apple sync clients more bulletproof. As I said, the whole endeavor of sync is very cooperative. So you basically have to, when you're writing a client, you have to accept what you're given and be fairly polite about what you give other people. It's kind of like the Internet, but different.

So the other things that we're focusing on for Snow Leopard is performance. And this is performance in sort of two ways. One aspect of performance is we want your We want the sync to be performant for the user, both from the perspective of not using much CPU and memory and disk I/O on the system that they're using, but also from the perspective of they change something and something else happens.

So, partly prompted by the integration with MobileMe, we're looking to make trickle syncing, which basically means I make a change in an app, it immediately syncs it to the sync engine on the desktop, and that will immediately get synced with MobileMe. So the idea is going to be that, you know, literally if somebody makes a change, ping, it goes up to ., up to MobileMe, and ping, it's on your iPhone.

The other things we focused on are many folks have noticed that with .max syncing, mobile me syncing, we've now got a large number of data classes for the user to sync and so if you -- I don't know how many people have actually done this but if you're like me and you turn them all on and you do a sync, it can take quite a while to work through each one of them. So we've -- we're working towards pipelining all of those network operations. So the whole thing is going to be much, much faster. And of course, short cutting anything where there's nothing actually to sync.

Now, going back to the focus on performance again as regards user perception on the desktop, we've split the truth store out. So up until, you know, in 10.5 and in 10.4, we had the truth database, and it literally contained all the data that you had in it for every single data class.

And there were some occasions where we would end up having to sort of touch a bunch of records that were irrelevant to any particular operation. So what we've done is we've split these things out, so we've now got sort of contacts over here and calendars over here and bookmarks over here, etc. And that's streamlined the whole process so we can... We could do certain operations and we will only ever touch records that are relevant. So that just reduces memory footprint primarily.

We're also introducing a custom NSString class. This should be entirely transparent to you guys. But that basically lets us do quicker comparisons of identity.

[Transcript missing]

We've also added some logic that's going to short circuit some syncs, so where the client doesn't actually have anything to do, if the sync engine decides that it doesn't have anything to do, we can make the whole process go a little quicker.

And I mentioned this before, we've refactored the sync server. So one of the... One of the downstream effects of this is that because we've refactored the sync server and we've split up the truth store into different truth types data classes, we have the potential to actually start running syncs in parallel. So if you have a client that syncs more than one particular data class, and of course on our side an example would be MobileMe, we could actually start running those in parallel. And that for the user would result in a much shorter time spent syncing.

So other Snow Leopard news, I wanted to, you know, the whole conference has been emphasizing the iPhone greatly, but I did want to mention that, you know, we have iSync Plugin Maker. That's a developer tool which is used by companies like Sony Ericsson and Nokia, and they can develop plug-ins for iSync which allow pretty much any sync ML-based phone to sync with the Mac. And it's been updated for Snow Leopard. There'll be more info available in the lab session this afternoon if anybody's interested in hearing about it.

So, harm sync is going to be no more in Snow Leopard. It's PowerPC, it's CFM, it hasn't been updated in rather a long time. We'd love Palm to jump in and write a Sync Services client. But for the moment, so we're going to be removing it in Snow Leopard, and we recommend existing Sync Services solutions for Palm right now.

So that's it for Snow Leopard. So what I'm going to do now is invite Bruce Niloo, who's our tech lead for sync services up on stage. And he's going to go through some advanced sync topics for all you guys who are experts. And everybody's an expert now because I've covered it all. Thanks.

I'm Bruce Niloo. I'm the current tech lead for Sync Services, probably not the last. I'm glad that Andy left me a lot of time to go over this topic. A little bit of a difference between my talk and Andy's. There's not going to be much graphics, no short movies to make up for that. I'm going to kind of share some bad puns with you about sync.

And it's kind of hard to work in sync services and not come up with bad puns. It's kind of a sport. I also want to talk about something very important. There's a little bit of contention about how you spell the word sync for sync services. Well, a former tech lead of sync services put it quite aptly. There's no I in team and there's no H in sync.

Okay. So now without further ado, let's get to the The topics. As Andy said, the primary focus for Snow Leopard is going to be resilience, performance, and usability improvements. But we have made some API changes as well, and I'm going to quickly go over them. They're fairly minor. And then we're going to sync about some things together. And in particular, I basically want to share with you some of the aspects of sync that are worth understanding a bit before you go and write a sync client.

Mm-hmm So, some of our changes in the API are really declarative changes regarding new keys that we're going to support in the schema. As Andy has already talked about, a schema is how you define to sync services what your application is interested in syncing. It defines the structure, the properties, the types, and so forth. We've added a new key for specifying a different kind of identity property, which we're calling a compound identity property, and I'll get into that in more detail in a bit.

To kind of support usability issues where conflicts perhaps appear when you don't want them to appear, we've added another key where you can declare that certain properties should be ignored for the purposes of a conflict. We've added a new container type called a set, which has the semantics one might expect of a set, such as no duplicates and the like.

And this is something we actually did in Leopard, but we're going to kind of emphasize this and slowly get rid of it. Strongly ordered relationships, for those of you who know what that is, was a great idea, but there were all kinds of difficulties associated with it. Not only that, but there were also some problems associated with it.

Not the least of which, there was no really good UI that we could come up with reasonably for showing what a conflict was in a relationship that had lots and lots of members. So that's now a no-op, which basically means it's going to devolve into being the same as a weakly ordered relationship.

Ping is now API for some of our developers. They already know what this is all about. Some sync clients actually take a long time to sync. Maybe they're talking to a really slow device. Maybe they're talking to a server up in the cloud. And they, for control reasons, they actually need to start the sync session, then do their business with their own backing store.

Sometimes, if this takes too long a time, Sync Services says, you know what, I'm going to throw this client out of this particular sync session. And sometimes that's not what you want. This is a way to tell the sync server that you're still actually interested in syncing. It's just taking you longer than we might hope it would take you.

Finally, we're going to improve filtering. This isn't API so much. It is a bit because up until now, you've had to really go through contortions to do filtering right. This is kind of personal to me, and in Snow Leopard, filtering is going to be a lot simpler and actually do what you expect it to do most of the time. So let's go into the API changes a bit.

The compound identity properties is just a new entry that you put in the schema plist. It's an array of arrays. Currently, identity properties is just an array of property names. Now you can specify a kind of a preferred ordering of those. And again, I'm going to go into that in more detail in a bit.

Ignore conflicts, same deal. You can type a particular kind of attribute, say a UUID, that really makes no sense to present to the user. You can have two GUIDs popping up in the conflict resolver, and the end user is going to have no idea what to do about that.

You can now say, "Don't show that for the purposes of conflict." And what that means is, we're going to choose one of them, and your application should be resilient to the fact that the UUID might actually change out from under you. This is something your client needed to be anyway. It's just that now we're taking the user out of the business of knowing what's going on. Okay, improved filtering. So we had some issues with filtering in the past. In particular, you could have relationships which would refer to objects that you thought you had filtered out.

Well, that's probably not a good thing to do. Relational integrity rules like cascade, delete, required relationships kind of didn't work the way you might expect them to with filtering. And so to work around that, clients had to actually filter effectively all of the -- all of the entities that were related to one another, even though really all they wanted to do was say, "If a contact isn't in a specific group or an event isn't in a specific calendar, I'm not interested." Now you'll be able to do that.

Andy wanted me to emphasize that sync is cooperative. This is kind of a tautology. Andy spoke a lot about sync being an ecosystem. And it is. There's a bunch of different clients, some of which can really make your day pretty miserable when they decide to delete all the records that you rely on.

So we actually try to make an effort to prevent this from happening. We will raise exceptions of sundry sorts to your clients if they push in things that don't coincide with the schema. This is kind of why it's very important that when you're defining your schema, you type it appropriately and so forth. So we will raise an exception if you try to push in a property of the wrong type.

If your relationships are incorrect in the sense that you're pushing in a reference to an object of a wrong type. However, what we don't do, and this is a lack, is we don't do any semantic checking. So, for example, you might push in a start date and an end date for an event, and the end date might be earlier than the start date.

And maybe you're going to actually, your client or your application is going to be a little bit slower than the start date. So you're going to want to take the difference to figure out the interval here. It's going to come up to be a negative number and maybe you're going to crash.

Well, since it is a cooperative endeavor, this is why you need to be defensive about those types of things. Because some clients just are not going to interpret these fields necessarily the way you expect them to be. Kind of in this line, I wanted to share a little bit with you why we don't promote as API pushing the truth. Basically, you know from .Mac if you've ever synced with or MobileMe, if you've ever synced with MobileBe, you can tell MobileMe to reset your contacts, your calendars, or any data class from MobileMe onto your computer.

We provide the capability in Sync Services to extend your entities with new property types, even new entities. If you're going to be pushing the truth and you don't know about those new entities or new properties, other clients are potentially going to lose data. In fact, we don't even have to be in a situation where a scheme has been extended in this way. Some clients don't sync all the properties of the public schemas.

So you may only sync a subset of something, and if your client was able to push the truth, again, another client like Address Book might lose data. So really pushing the truth is a safe operation, only a certain subset of cases. It also is probably something that we always would want to do. We always would want to be a user-initiated action. So for these reasons, this is why to date we still have not kind of, how should I say, accepted or decided to implement this as a public API.

Okay, now to the part of the talk that gives you some insight into some of the problems and some of the more key concepts that you are probably going to be interested in. So, sync seems simple, and in fact it is. Even if you use the most procedural API that we have, you can probably do 80% of what you need to do with a very small amount of code, and it's going to work for the most part. There are some exceptions, however.

And one of them is your records sometimes duplicate for inexplicable reasons. Sometimes the sync fails. And interestingly, if you were to Google on one of these errors, we get there's quite a number of hits on them. Sometimes your records disappear and you have no reason why. Sometimes your records appear in meaningless conflicts.

Sometimes your client actually doesn't want to be in sync. Now, I'm going to talk about that a little bit right now, but I'm actually going to not talk about it in general. But suffice it to say, if you have a client that really doesn't want to be in sync with the truth or in sync with what Sync Services thinks you should be in, you need to take extra special care that you exercise all of the corner cases in sync. And in particular, you can end up in situations where you have two clients that are presumably in sync and the end user looks at them and they're not.

Okay, let's think about some things. To me, these are some of the key concepts that are, that as developers of sync clients, Everyone should master, or at least understand, before they actually get about to the task of writing code. So the negotiated sync modes. We're going to go through fast, slow versus refresh and pulling the truth. We're going to talk about identity in sync services. We've talked a little bit about schemas, and we're going to talk a little bit more about them as well.

[Transcript missing]

is a member of the Google team at Microsoft. He's the founder of the Google Cloud Platform. is doing in a sync session should really be what the sync server expects your client to do. In particular, you both have to agree on fast, slow, refresh, or pulling the truth. Another point to keep in mind is that at any point in a sync session, a session can be cancelled, and that might actually affect the next mode of the sync.

Some fallout of bad negotiations. These are problems that I've helped clients debug over and over again. The server thinks you're fast syncing. You push everything with different local identifiers. Your records are going to duplicate. The server thinks you're slow syncing, but you fast sync. All your records will be deleted.

You might say, "Well, how can my client ever get into this situation?" There's actually a couple of common patterns where this can happen. The most common being that somehow your backing store has been restored somehow. Maybe you've restored it from a backup, maybe you have an unarchived facility, in which case it's very likely that your client is now going to be in a different state than what Sync Services thinks you're in. To handle some of these, we actually provide API in the form of Sync Anchors. I'm not going to go into the specifics of that, but I suggest that if you do have that kind of feature, you might consider using Sync Anchors.

This is actually a pretty accurate finite state diagram of sync services. Basically, the main line is you start, you negotiate, you push, you mingle, and you pull. There's some pauses in between this. And from any one of those states, you can either finish it cleanly or the sync can get canceled.

So, what this diagram shows is what can happen to you, what is your next sync mode going to be as far as the server is concerned if the sync fails to complete for some reason. And what's interesting is that if you're refresh syncing or slow syncing and you get to the point that you're about to be, that you're going to be mingled or you're being mingled by the sync server, your next sync is actually going to be a fast sync. That means that Sync Services is going to remember the changes that you've pushed previously even though the sync was cancelled on you.

So I want to talk a little bit about slow syncing versus refresh syncing. This is a common, a source of confusion for many developers. First of all, from the client's perspective, slow syncing and refresh syncing looks the same. It's basically the sync server will tell your client that you need to push every single record that you know about. The nuance between the refresh and the slow sync is really on the server side.

The sync client, at any point, can say, "You know what? I actually want a refresh or slow sync. I want to push all my records." And that's what those two methods on iSync Session do. So, on the client side, the client can tell the server that it really wants to do a refresh. One of the fathers of sync services likened this to negotiation of rock, paper, scissors. Slow beats fast, refresh beats slow, and pull the truth beats refresh. Until we pointed out to him that that was only true if paper didn't beat rock.

Okay, I still haven't gotten to what the difference is between a slow and refresh, and I'll do that right now. So a slow sync is a facility or is an aid that we give to the users of sync services where based on past syncs, we generate, we figure out the differences to send from your client to the sync server. So it's very possible, for example, if you slow sync and you have no changes at all in your backing store, that we will send absolutely no changes to the sync server. But to do that, we actually have to know what you synced before.

On the other hand, a refresh sync is as if this is the first time you've synced. And all of the past information, assuming you have synced before, is going to get discarded. Now, what is that past information that we discard? Your local identifiers. It's probably once a month or maybe a little bit less frequently, I'm asked the question, "Why can't you remember my local identifiers if I'm refresh syncing?" And the The main reason for that is that, and I'm going to get into identity in a bit, is that the local identifier is how your client specifies a record to sync services. A refresh is a brand new sync, so we don't know anything about your records.

We also will discard on a refresh sync all of the previous record values and record specific client information. There's a fairly handy method on iSync Session which allows you to associate an arbitrary bit of information with a specific record that your client can use as it deems necessary.

So what are some of the consequences of refresh syncing? Well, the biggest The consequence is that if you've been syncing and you now refresh sync, and since and between the time of your last sync and your next refresh sync, some other clients deleted some records, those deletes are going to get lost.

In certain situations, records may duplicate. The duplication can happen for a couple of reasons. One reason is that once you've actually synced, you're free to change the identity keys of a record. But once you change identity keys and you start again and you refresh sync, those records are no longer going to identity match. Finally, it's a little bit more CPU and I/O intensive.

Negotiating the sync session correctly as far as pulling the truth. Basically, the only thing I want to say here is that you really don't want to delete your backing store until, at a minimum, prepare to pull changes returns yes, but you probably don't want to delete it until you've actually pulled all the changes and written out your backing store. Okay, identity.

Identity Keys are only used when a record is first pushed by a client to sync services. After that, we don't care what the Identity Keys are. It's used by the Mingler to match records that are pushed anew. And it matches those records against records pushed by other clients. And other is emphasized here because it's perfectly legal. In fact, it's a feature that a client can duplicate records. Sometimes developers are kind of surprised by the fact that we, in fact, allow them to duplicate records.

So, how does identity matching work? We require all of the identity properties to be equal, including the null values. So, for example, Sara Bellum from Synaptic Insights and Sara Bellum with no company name on Leopard would not identity match. Now with compound identity properties on Snow Leopard, these two records will identity match and there'll be a conflict on the company name.

As I said, once a record is identity matched, or it's pushed or it's pulled and a client accepts that record, it is from that point on, that record is identified to sync services by the local identifier that the client specifies. It's used both in fast syncing or slow syncing. And I realize in this presentation I didn't talk at all about fast syncing.

In many ways, fast syncing is, from the point of view of sync services, is just like slow syncing. Slow syncing requires the client to push all records, but internally we're generating all of the deltas that we then feed off to the actual sync server. Fast syncing gives the client an ability to optimize this process, and we actually do that in two ways.

One is, is there's API where a client can push in the full record, and then sync services will generate the deltas of that record that are passed. We also give another capability to the client, and that is they can actually generate the deltas themselves and pass those on to the sync server. And that's the most optimized form of fast syncing.

The Local Identifier is any string your client wants it to be. There's a couple of unspoken invariants about Local Identifiers. They have to always be associated with the same record. And you can't reuse them. Once you've associated with a record, it has to be consistently associated with that record. Unless you're refresh syncing or that record has been deleted.

If you don't do this and somehow get confused about what the Local ID is, there's a bunch of different runtime errors that you can run into. You might ask, "Well, how can you ever get into this situation?" Well, there are clients, for example, that have used the row ID out of a database to be their Local Identifier. And if that database ever got repopulated, the row ID would change, but they were still continuing to use that row ID as a Local ID in, say, a slow sync. This causes problems.

Internally, we generate a global ID that is associated with the local ID of every client that syncs that record. Sometimes clients really want to get a handle on that global ID, and in particular clients that filter do this. We recommend that if you're in such a situation and you need the global ID of a record that you can get, for example, via the snapshot, that you look into using the record reference, iSync record reference object that we provide.

Okay, this is one of the more common errors that are encountered. And I want to talk a little bit about how that error happens. Sometimes it's our fault. And we've had this because of filtering issues, as I've said before. We've had this because of the way that we sometimes refresh sync.

The entire API of Sync Services is entity-based. One of the things that's unique about Sync Services is that we actually sync relationships between other entities. It's that relationship and the -- it's those relationships that are important. And it's those relationships in the context of refresh sync that sometimes cause those kinds of problems. However, that's not the most common reason for this.

I want to walk through a very typical example, and it has to do with the ordering that a record is pulled. So imagine you have a client. There's a record X, it has a relationship R, and it pulls a value for that relationship R with those funny-looking wids that are the generated local IDs that we hand back to a client. It pulls X, it saves that record out, and then it pulls the reference to that relationship.

And it decides to remap that local ID to some other arbitrary string. In a future sync, if that client pushes X with the relationship with those local IDs, the original local IDs that were pulled, you'll run into the exact same problem. So how do you work around that?

You can order how your client pulls the changes. That's one possibility, although sometimes that doesn't work. You can also keep an unaccepted record map. But in particular, you need to patch that back relationship one way or the other. One of the helpful methods that many clients use here is the set client info for record with identifier method. Okay, miscellaneous tips.

Type your properties and relationships. Use exclude from DCA and ignore conflicts. And if you're syncing to a back end which is, or records which are back ended on a server, consider not syncing the record itself, but information about how to access the server. For example, CalDAV accounts or CalDAV calendars are synced with a CalDAV account object so that other computers know how to access the back end server. An example where we don't do that is subscribed calendars.

MobileMe sync performance issues. We've had certain clients or experienced certain cases where sometimes a sync client wants to basically sync a big object graph. They can do that by archiving it out to a data object. And this data object or this object graph maybe is pretty volatile. And maybe it's large.

Maybe it's a megabyte or more. If this object graph is volatile, and let's say only a byte or two changes every time that object is updated and it gets synced every time with a megabyte object graph into sync services, that means that .Mac is going to get loaded with a lot of changes that are each a megabyte long. You may want to consider restructuring your schema to optimize that.

You should also write a sync client and, if at all possible, fast sync as opposed to slow sync. And even better is to trickle sync. The difference between fast syncing and trickle syncing really has to do with fast syncing specifies how you're syncing with sync services. Trickle syncing has to do with that you actually sync your changes as they occur in your application.

I think it was mentioned by Andy, in Snow Leopard we're going to be, MobileMe client is going to start trickle syncing. So this will actually help propagate your data if your client trickle syncs to other machines when the data changes. In Leopard we added a performance boost that you could add to your client description P-list, which had to do with whether or not your client formats relationships. If you don't know what I mean when I say if your client formats relationships, well in Snow Leopard it doesn't matter because we're going to assume your client doesn't format them.

In Leopard we added a keyword which could indicate that you didn't format your relationships. And just to be clear, to support many kind of older phones, it's often the case that the values that are pulled by a client are not the values that they actually are able to keep. So you might have a client that has a relationship to phone numbers. It pulls 10 phone numbers, but the device can only store three. So we give you the ability to indicate which three of those 10 you've actually saved away.

And again, this is just added to your P-list of your client description. We're trying to improve logging. For those of you who've been sync services developers, we log a lot of information when you turn it on. We also used to log personal information, which we now don't as of 10.5.3, but sometimes you actually want to see that the actual data that's being synced, and there's now a default that you can use to turn that back on.

The Sync Respecter is a developer tool that all developers of sync clients should use. At a minimum, you should use it to put your client through the ropes and make it refresh sync, make it pull the truth, make it push the truth, and make sure that your client actually handles all those cases. It's a fully supported developer tool. Andy wanted me to put that in. I think that means that if there are any bugs, I have to fix it.

Syncing is also scriptable. You can script it in Perl, you can script it in Ruby, you can script it in other languages as well. I think I already mentioned SyncRespector allows you to test it in different modes, you can test with other clients. It also provides a user interface for you to set debugging defaults, go into the truth, see what your records are there. It's really a pretty indispensable tool when you're developing a sync client.

So we have a pretty active dev list. We actually are fairly responsive to questions that come onto it. So we encourage you, if you're writing a sync client, to post questions. There's a number of developers that actually answer the questions, so we don't have to, which is always a good thing.

I'm not going to read this poem, but a sync intern of ours describes sync pretty nicely in this poem. There's one bit of it that I is rather funny, which is the last line, which says, "To sync, perchance to dream, aye, there's the rub. For in that sync of mobile me, what records may come."