Advanced Core Data Usages - WWDC 2005

Application Technologies • 1:08:06

Advanced Core Data techniques will be covered in depth. Learn how to implement multi-store techniques for writing data to multiple places on disk per document, create custom stores, manage object context migration, build models in code, thread Core Data implementations, and much more.

Speakers: Ben Trumbull, Melissa Turner

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Okay, hello and welcome to Advanced Core Data Usages, session 143. I'm not Ben Trumbull, I am, however, Melissa Turner. We helped build Core Data and they, in exchange, made us get up on stage and tell you all about it. So this is what we hope you already know about Core Data.

Core Data is a model-driven object graph management and persistency framework. It provides scalable object lifecycle management, automatic undo/redo support, user interface synchronization through our integration with the Cocoa bindings. We help ensure data consistency and correctness, and we take care of reading and writing to various types of files for you.

Why do you want to use Core Data? Okay, can I get a show of hands? Everybody here who's written a piece of code that, well, they knew somebody else out there had implemented, they maybe even knew the implementation, they just decided they could do it better, so they had to write it? Can I get a show of hands for this? And how many of you who aren't putting up your hands are just lying to me or yourselves? 'Cause I suspect it's the rest of you.

We believe you of having to reinvent the wheel. We think we did a pretty good job of it, and quite honestly, it's probably better than you're going to do yourself, so we think you should use it. It'll give you a faster development cycle. Every line of code you don't have to write and test, well, that's a line of code you can spend writing and test, or the time you can spend writing, you know, features that your customers are going to find unique and interesting about your application.

It's likely going to give you a more robust application. We test the heck out of this stuff. We've released it to all of you guys. You build stuff. You go down through code paths that we really didn't think people would ever try, so we're pretty sure there's not a lot of bugs, and we're devoted to fixing those, the ones that there are, and that's something you can't guarantee if you write the code yourself.

And, you know, it prepares your application for future features in Cocoa. We don't know what those features are, but, you know, if somebody comes along and, say, switches your architecture, you know, processor architecture out from under you, That's code you don't have to worry about porting. We're going to have to do it.

What are we going to cover this session? Well, a bunch of things. We're going to cover the general architecture of Core Data, the individual components, how they fit all together. We're going to talk about some advanced topics, some of the tougher questions you're going to have to ask yourselves or run across when you're building an application. And we're going to talk about some of the decision points and the things to think about when you're building a Core Data application.

So Core Data architecture is essentially built from a bunch of components. There's the models that define how data looks. There's the managed objects and the object context that are the living representations of data in your application. There's predicates and fetch requests which specify which particular object or objects you want to be working with at any point in time. And there's object stores and store coordinator that take care of putting your objects to sleep between runs of your application.

We're integrated with Cocoa through the NS document architecture and through the controllers. We've got a design tool in Xcode that if you were at any of the earlier Core Data sessions you've already seen, that can allow you to create data models. And Xcode includes templates for creating Core Data applications.

You hopefully have seen this slide before. This is the Core Data basic architecture, or what we call the persistent stack. Starts all the way up at the top with managed objects, which are contained in a managed object context, which fetch them using a fetch request from a persistent store coordinator, using the managed object model from a persistent store. Those are all the basic pieces. That's sort of how they connect left to right.

Managed Object Model. As Matt said in his intro session earlier, Managed Object Model is possibly the most important piece of the application. It's the piece that defines what the data in your application is and how it relates to itself or other pieces of it. It's an interaction diagram for interaction between application data types and objects. Designing a data model allows you to focus on the architecture of your application rather than control flow through it. And it's pretty much the baseline for doing model-based development. If you don't know what your model is, you pretty much can't be doing model-based development.

There's two main elements to a managed object model. There's entity descriptions and configurations, telling you how those entity descriptions are related to each other. Here's the basic Core Recipes model that is on the CD that all of you should have. We'll probably make references to this during the talk when we draw our little example cases.

So what's in a model? Models have entities, which have attributes, relationships, and fetch properties. Models also have fetch request templates, which are templates specifying how to fetch certain subsets of objects from your object stores. And they have configurations, which are strings used to label sets of entities, which allow you to specify which set of entity goes to which object store.

An entity, what's an entity? Well, it's a named collection of attributes, properties, relationships. There's several types of properties, attributes. These are essentially your IVARs. They contain the value data for your objects, scalars, NSValue classes. There's also relationships. These contain interaction information about how your objects interact. This chef object has a bunch of recipes. This recipe has a bunch of chefs. This recipe has a bunch of ingredients. Those are your relationships. They're essentially pointers to other managed objects.

There's also fetched properties, which are much like fetch requests, which are relationships specified by a predicate. This allows you to specify weak unidirectional relationships. And there's a number of special predicate variables that are in our documentation that you can read about that allow you to do interesting things specifying relationships from target object. And one thing to note about fetched properties is they only refresh when the object, the instance of the entity they're on is refreshed. Thank you.

Properties. As I said, they're the data containing elements of an entity. They're the values, the scalers, the NS values, they're the pointers. They can be transient or persistent. Transient properties are really, really useful. They can be used for storing cached data that was calculated during your application processing that's not really important enough that you want to save it to disk anywhere. They can hold unsupported data types. This is a good way to store NSColor and NSRect or any of your custom data types, very simple ones that you want to have us do change tracking for.

You can use transient properties to model cross-store relationships. We'll get into that in a lot more detail later. And you can use them to store derived properties, which is properties whose value is computed from one or more other properties on an object or on related objects. Relationships. These are your pointers. Relationships are sets. They are not arrays. There's no ordering.

We'll talk about that on the next couple of slides. But it's important to remember that unless you explicitly specify an order in your UI, if you put an NSArray controller on top of a set, the order is not guaranteed to come back the same from time to time. It also gives you unique correlations, which means that an object can only be in that set once.

Core Data manages inverse relationships for you. Any relationship, when you think about it, really is two ways. A chef has recipes, recipes have chefs. What Core Data does is allow you to, when you create a recipe and set that recipe chef to be chef A, then we will automatically go in to chef A and say, "This chef has a new recipe," and we'll tidy up that back pointer for you. This means that you have to specify delete rules when you create a relationship, so we know what should happen if you delete an entity that has relationships to other objects.

If you specify the nullify delete rule, we'll simply go through and find all the objects on the other end of the relationship and nullify the back pointer. This object no longer exists. We can have a cascade delete rule that says if we delete a Chef object, then we also want to delete all of the recipes associated with that Chef.

There's the deny delete rule, which is if I try and delete a Chef which has recipes, I'm not allowed to do that until I assign the recipes to some other Chef. There's also the no action delete rule, which says, okay, Core Data, don't you do anything. I'm going to take care of it myself. If you use this one, you are fundamentally going to be responsible for fixing up all of the back pointers.

Relationships, as I said, are not ordered. Why? Because when you really think about it, most data isn't ordered. If you have employees, what's the inherent order of employees? You can order them by first name, you can order them by last name. Employee number is a popular one. There's hire date. If you're having some kind of company photo op, you might want to order them by height. There is no inherent order in how employees are ordered. Same for books.

Title, ISBN, Library of Congress number, Dewey Decimal number. If you're an interior decorator, it may be color. Some data are inherently organized, though. If you have entries in a ledger, it doesn't really make sense to look at entry A without the context of all of the entries came before it.

It's really hard to tell if you're going into overdraft on your checking account if you don't know what order you made transactions in. Customers do, however, like order. They like to see things in predictable ways in their UI. So you really have to ask yourself when you're designing an application, Do you really need order or do you just need the appearance of order?

If what you really need fundamentally is the appearance of order, it's kind of arbitrary, users can switch it around however they want, there's a few ways to achieve that. You can use fetch requests and specify a sort order on the fetch request, which will say, this is the order the array in which we return objects should be sorted. Can you sort orderings on the array controller using bindings?

Or you can, if you've just shoved all your managed objects into an array, you can use NSArray sorting methods, also use sort descriptors. The tricky problem is if you have, you really, really need order. That's a complicated case, and in that case you're going to have to explicitly create an order attribute on your object.

And when you do that, you take responsibility for explicitly maintaining that attribute. Anytime something gets added to a set, you're going to have to figure out what the ordering is, and you're going to have to set that order attribute. Something gets removed, you're going to have to remove that and go through and clean up the order attributes on all of the other objects.

Skipping over to loading models, how do we load a model? We've created one, presumably in the Core Data modeling tool. Now I want to load it in my application's runtime. There's a few ways to do that. If you've just got a single model, use init with contents of URL on NSManagedObjectModel.

If you've got multiple models, you're a framework or you're an application loading a number of models from a number of frameworks, then there's a couple ways to do it. You can tell NSManagedObjectModel, merge model from bundles, and pass it an array of bundles. We'll go through, find all the managed object models in those, create a merged model from it.

You can also do model by merging models. You can go off and specifically pick the models on the system that you want loaded into your application. Use model by merging models. There is one caveat to this, which is that there can't be any entity overlap. If those models that you're loading and trying to create a merged model with contain any entities with the same name, you're going to get an error.

If you're using configurations, those configurations will be merged during the merging as well. So if you have a configuration in Model A named public and you have a configuration in Model B named public, all entities in both of those public configurations in your merged model are going to end up in a public configuration. So that's something you have to think about if you're doing a configuration-based application.

This is something we've had a few questions about this week, so we rushed and harassed our graphic designers to put in a slide about it. What's good model design? Well, there's a few things you should keep in mind when you're building a model, just in terms of how data is loaded from the disk and how objects relate to each other. If you've got a really large data object, you know, a movie, a picture, Chef's homepage in its entirety, you probably want to put that at the other end of a to-one relationship.

What this does is allows you, when your entity is instantiated, to avoid the overhead of actually loading in all of that data if you don't actually need to. This is a way of essentially lazy loading large blobs of data. Something else to think about is that you can't query an NSData type. It just doesn't work in most databases or data stores. So you might want to consider, if it's a large text blob, consider extracting the bits of information that you might want to search. on.

These last two points are kind of complicated. You want to normalize your data sets so you can avoid having duplicate data in your database. In the Chef's Re-- the Core Recipes example, you don't want to have the chef name on every single instance of an entity because, well, It's space consuming and it's a pain in the neck if the chef goes off, gets married, and changes their name. So you want to normalize that out as much as possible.

The problem is that joins, which is following relationships between objects, is expensive. So you also want to consider denormalizing data that you know you're going to be searching on frequently. So you have to play this balancing game. And there is no one right or wrong answer that will fit all applications. You really have to look at your data set and how you think your users are going to be querying it in order to determine where various pieces of information need to live.

And something we've also run across is the data migration question. You write version one of your application that has model version one. But when you listen to your customers, they say, "Oh, we need this new feature. Oh, we need this new feature. And why won't you let us store that piece of information?"

You come up with v2 of your model and it's got all that new information. How do you read those old model files? Users aren't going to be happy if you say, "Oh, well, this is v2 of my application. Just forget all of those old files. You're not allowed to use them anymore."

you come up with v2 of your model and it's got all that new information. How do you read those old model files? Users aren't going to be happy if you say, oh, well, this is v2 of my application, just forget all of those old files, you're not allowed to use them anymore.

But the basic principle is you create two separate stacks, one with the old model, one with the new model, you load objects into the old stack, you copy the data out of them into new objects, which you then insert in the new stack and save. You'll notice implied there is the fact that you have to keep your old model around to do this. Don't just go edit your new model without having a copy stuffed off in CVS somewhere.

Some tips and tricks to make this a little bit less painful than it sounds and a little bit less memory consuming than it could be. You can do multiple passes across your old data, move discrete chunks of the object graph separately. If you've got little pieces that live off in their own world that are only connected to, you know, themselves, move those first, then move other chunks, and eventually it'll all be there. This helps minimize memory use, which can be really important if you've got a sufficiently large data set.

And models can be modified at runtime, so you can temporarily disable validation things that would make doing multiple passes hard. You can use managed objects instead of custom subclasses if you really don't need all the overhead that goes into the business logic on those. And like I said, there's an upgrade example on the WWDC source CD. And now I'm going to hand you over to Ben, who's going to tell you all about managed objects. Great. Thank you, Melissa.

And now I'm going to hand you over to Ben, who's going to tell you all about managed objects. Great. Thank you, Melissa. And now I'm going to hand you over to Ben, who's going to tell you all about managed objects. Great. Thank you, Melissa. So you're going to get the data from your rows or your XML nodes. These are the things that you're going to fetch. You're going to insert, save, delete. If you're looking for a Core Data noun, the subject of an operation, it's going to be a managed object.

and each managed object is described by a single NAD description, and those are in your model. They possess a unique managed object ID, which we'll talk a little bit more about later, but every managed object has a managed object ID, and they're associated with a single managed object context, and the context is what's responsible for managing changes and stuff like that, and we'll have a whole section on that.

Managed objects are the objects that you're gonna subclass, so you can implement custom business logic on these classes, you can implement the framework callbacks, we'll go over those, and you can also implement custom validation, and all managed objects respond to key value coding and key value observing, so you can use the root NSManageObject class to get a whole lot of work done. You can just, you know, value for key, set value for key, and that's gonna get you pretty far. So the life cycle of a managed object is a little bit different than an NS object.

[Transcript missing]

So all the objects, like I mentioned, have key value coding and key value observing support. And Core Data provides this over all of the properties you've described in your model. So Core Data will look at the entity and make sure that all of those objects have space and respond to that.

So there are mutable proxies for key value coding. In Panther, there's a mutable array proxy, and in Tiger now there's a mutable set proxy. And these are the proxies that you're gonna use on your too many relationships. As Melissa described, your too many relationships are always sets and they're unordered. And you can use this mutable proxy to make changes to that that'll post KVO notifications for you and handle most of the issues. And managed objects, you can use the standard KVO dependent key notifications. And it's just a little caveat down here at the bottom.

Remember that key value coding is only gonna do a retain. It's not gonna do a copy. So we've had a few people, you know, they'll pass in a mutable string to say a name property, right, and they didn't realize that we didn't create a copy form 'cause we're just using standard key value coding. So that's not gonna work. So on managed objects, we also implement something called primitive key value coding. And the difference here is key value coding and your public assessors issue KVO notifications.

They maintain relationships for you. You'll look in, if you've seen the examples or in some of the other sessions, a lot of the inverse maintenance just happened for you. When you deleted objects in the hands-on session, they showed how you can propagate a delete. So if you delete a recipe, you also delete the chef that's associated with that recipe. And you can just see that all happen, right? Primitive key value coding doesn't do any of that.

This is here as a way, both for the framework and for yourself to just move properties in their backing store around in a very primitive way. It doesn't go through any of the key value coding accessors. It doesn't perform validation, argument checking, any kind of integrity maintenance or type coercion.

It's just a way to move properties around in a very primitive way. And it's basically here for when you subclass a managed object and you need to get at where the framework has put the storage and you need to move it around for implementing your own accessor methods.

So as I mentioned, every managed object has a managed object ID. And this identifies the managed object across applications. It's scoped to the particular file that that managed object has been saved in. So it's unique to that file. And as I keep mentioning, they're vaguely analogous to a unique XML node or a unique SQLite row.

They can be archived as shown in a bunch of the sessions before the UI representation. You can take that URI representation and you can customize it. You can pass it to launch services. If you saw in the Core Recipes example, you can export a recipe to RTF. And there's a little link in there that you can click and come back into the Core Recipes.

And that's done by turning an object ID into its URI representation. So just a couple of things to note. When you're passing the URIs around, you can use object with ID on the context to get back the real object that's associated with that ID. And you can also do a fetch. So you can write a predicate that says self equals whatever the ID is. And the framework will take that predicate and we'll find the row where basically that object has that ID.

Using the fetch request is probably better when you're passing around the URI representations, because while someone's looking at the RTF file that has the little URL link, someone else might've gone through and in the Core Recipes example, deleted the object. So doing a fetch will make sure that the object's actually still there and will give you back zero results if it's been deleted.

Object IDs are pretty much the only thing from Core Data you can pass around with impunity in a thread safe manner. And it's important to keep around that temporary objects, things that have been newly inserted, also have temporary IDs. So if you're going to hand off this object ID to something else, then you should probably save the object first or take that into consideration that the ID will change when the object gets saved.

Similarly, if you do a store migration, it's very easy to do on the persistent store coordinator, convert a store, do a save as basically, all those object IDs are going to change because the object IDs are scoped to the file, to the store that that object is in.

So as I said, NSManageObject is basically your opportunity to subclass here in the framework, get callbacks. and it's just important to note that the framework is kind of responsible for maintaining the lifecycle of these objects. When you do a fetch, you don't have to worry about initializing, alloc-initing these things. The framework goes off, finds them, takes the rows, and gives you back objects. It's also important not to use init.

You need to use initWithEntity, insert into ManageObject context. And the reason for that is we have to know what the entity is. We can't do anything useful with ManageObject if you just pass it a blank init. And another thing is you should do your cleanup in didTurnIntoFault, and I'll explain exactly what faults are a little bit later, but this is actually where the framework releases all the data associated with that ManageObject. And in doing this instead of dialog is important because both with init and within dialog, the object isn't in sort of the state that you're used to dealing with it. The framework isn't done, for instance, initializing the object at the end of init.

It's got to associate it with the context and do a bunch of other things. And in the same way, in dialog, you may no longer have a ManageObject context. So the object is kind of, it's coming into existence or leaving existence, and it's not in the same state that you would expect it to be in when you're working with it most of the time. So you just need to be careful about that.

And for that reason, we have a bunch of callbacks here that make it a little bit easier, awake from fetch and awake from insert. And these are called after the object has been completely initialized and the framework is done doing everything it needs to do with it, and it hands you back, you know, to the object.

It hands you back your ManageObject, and it will call awake from fetch on it after it's fetched it. And this is a place where you can initialize things. You can set transients. You can set default values. You can hook up cross-store relationships, which is something that you kind of have to do a little work with yourself.

Awake from insert is when the object is newly created. You get that. DidTurnIntoFault is the method that basically the framework will call. This is, you know, the object is sort of going to sleep. It's being cleaned up. You don't need any of your resources. You don't need any of your resources anymore. This is a good place to call back into. WillSave and DidSave will happen when the context is saving the object.

So that's if you want to, you want to update a last save timestamp or something. WillSave is a good place for that. And DidSave is just the inverse. And then the validation callbacks are there where you can do some very customized validation on your object. All these methods are in nsmanageobject.h. There are comments in the header file, so.

and I will be talking about the core data application in a little bit. It's a little more efficient if you let Core Data manage the space for the properties. So Core Data knows about all the properties you've declared in the model, and it's easier for us when we fetch a row or a reading from XML to just basically slap stuff down in our own storage. You can put attributes in your own IVARS if you want to on a subclass. You can do that.

It's mandatory for too many relationships and fetch properties that you let us do that because we don't actually expose the actual class that those objects are. And for too many relationships, we encourage you to use the immutable set value for key proxy. It's gonna be more efficient than writing your own accessor methods. It'll be a little bit faster. You can use your own accessor methods if you have aesthetic reasons, you feel like it, but for the most part, encourage the framework to just do its thing.

You can use these IVARs, as I mentioned. They should be immutable value classes, like NSString, NSNumber, and you can also make them scalar types. Key value coding will automatically do boxing and unboxing across a large number of the standard C types. And when you're doing your public access or methods on your subclass, just remember to post the KVO notifications that are appropriate. The examples and the templates that you build using the design tool will show you how to do that properly.

Unmodeled properties are basically IVARs that you've put on your subclass that you don't tell Core Data about. So you have a subclass, you make some IVARs, and you don't put them in the entity's description. You don't put them in your model. This is fine. You just have to manage all this yourself. So this is where you can stash data aside.

Basically, Core Data is not going to persist these. We won't know about them, so we won't include them in our undo management. And it's preformed customization. You can maintain them in the same callbacks you would maintain a transient in. So you can maintain them in will save and did save, awake from fetch, awake from insert, and you can get rid of them, clean these things up, and did turn default. So NSManagedObject is your basic subject of these actions, and the managed object context is where you're gonna find a lot of the actions. So here's where it is in our architecture diagram.

And this provides a context, a scratch pad, as we've discussed in the previous sessions for all of your changes. It does change tracking, relationship maintenance. It's where you're gonna find a lot of the verbs. If you wanna save, fetch, undo, reset everything, you'll find those actions described as methods on the Manage Object Context. So this is a good place to go for your actions.

So for finding data, we've covered this in previous sessions, go over briefly fetch requests, which have an entity, that's the type of data you're looking for, a predicate, which is optional, and a sort descriptor, which is also, again, optional. And predicates and sort descriptors are both described in foundation. You pass the request to the context you want to fetch these objects and associate them with using the execute fetch request method. Two things that weren't covered very deeply in previous sessions, there are two other ways.

If you have an object ID, you can ask a context for exactly that object. And you can use object with ID, which will always give you a proxy. So it's not going to go to the file and make sure that that row exists. It's going to give you a proxy for that ID. And whenever you use it, it will automatically go off and get the data.

You can also use object registered for ID, which will only give you back a result if it's already been registered with the context. So that's a way to distinguish between something you've already fetched. And you need to take an ID and turn it into an object or something for which you just want a proxy.

It's important to note unsaved changes affect fetching. So the context is providing you a view around a set of changes. And when you do fetches, those changes are gonna be reflected in the results. So what this means is that the context has to take the predicate that you've associated with the fetch request and check if any of the inserted objects match it because the XML file or SQLite isn't gonna be able to know about these newly inserted objects that you haven't saved yet.

And it's also gonna have to take into consideration deletions that you haven't saved yet and whether or not any changes you've made will change the results of that predicate. For this reason, if you have a lot of unsaved changes, it's gonna be a bit slower when you do a fetch.

Deleting an object from a context is pretty straightforward. There's just a delete method. And what I really want to drive home here is that a lot of these things are queued up until you actually save. So with the delete method, you have a pending delete, and that object is marked. You can ask it, you know, are you deleted? And it'll say, yeah, I'll be deleted in the next save. But those changes are coalesced, and you can still work with the object. It's perfectly valid until you save that deletion.

Saving changes, the managed object context, as I said, it coalesces all these changes together. So it'll grab all the inserts, the deletes, and it'll create a request, and it'll ask the persistent store coordinator to go off and handle that request. And the persistent store coordinator will figure out which objects should go to which file, which store, and send off appropriate requests to those stores, and each store will then use its own native mechanism for dealing with it. So our XML store uses foundations and as XML. The SQLite store obviously talks with SQLite, and the binary store is using key archiving.

So there are a bunch of different things if you've made some changes and you want to get rid of them. Rollback will set you back to the previous saved state. So that's, you've made some changes and you want to go back to what you're doing and continue working with the context.

Reset will turn the context back into its zero state, its initial state, and that will invalidate all of the objects associated with that context. So you won't be able to use any of those references anymore. This is really mostly useful if you have a context and it's wired into a nib or a controller and it's really difficult to just release it and have it dealloc and create a new one and replace it. If you're working with code, your own context, you'd probably just throw away the old context and create a new one.

And then we provide undo support for you. We can undo and redo across saves, handle relationships, deletions propagating through a series of relationships, and that all works. We use the standard NSUndoManager API provided in foundation. You can get an undo manager from the context. And configure it using the undo manager API. So for instance, you may, oh, and we also do event-based undo groupings with the undo manager.

So you can call process pending changes to alter that and to immediately coalesce all the changes that the context might make. And this is how you might, for instance, get the undo manager from the context, flush out any pending changes using process pending changes, and then tell the undo manager to disable further changes and do some stuff, flush out those changes, and then re-enable the undo management.

So for synchronizing changes, you have a bunch of different contexts in your application. There's some issues with keeping data fresh. Basically, if you've pulled in data into a context, we try not to disturb that unless you ask us to. So if you refetch the data, we're gonna give you the view that you were working with.

So you'll need to use refresh object with the merge changes parameter to force us to throw away the view of the object you're working with and get the latest. You can also use the set stillness interval, which will kind of automatically tell us that you wanna go back to the file and refetch new data at a particular time interval. And the alternative here, which is kind of what we encourage, unless you are specifically poking at an object that you wanna refresh, is to just let the data get stale. And if there's a conflict, handle the conflict. And you do that with a merge policy.

So talk a bit about merge policies. And what merge policies are is a way of handling conflicting rights. If you have multiple threads working with different contexts and they both try to write to the same data, you're gonna have to resolve that. And the merge policy is a way of telling the framework what you wanna do. But you can also get into a situation where you have a bunch of different contexts, just a single thread, say you have an inspector window, and they both manage the same object. They do a save. you're going to have to do something about that.

So basically what the framework does is we detect these conflicts and it's an optimistic locking paradigm. This is a fairly standard paradigm where basically we keep information about the objects you fetched and whether or not change has been made to those objects. And we write out those changes and we assume that the saves are gonna succeed. And if the store says, nope, I can't do that 'cause the data's changed up from underneath you, we'll get this failure, we'll roll back the transaction, and we'll report it to you.

It's important that there's a slight difference between the store types. So the XML and the binary store types are atomic, which means you read and write them in their entirety. So they'll work like an NS document. They'll just overwrite whatever's on disk. So it will only detect conflicts within your application, within the same persistent store coordinator. But the SQLite store does partial reads and writes, and it will handle, if you have multiple processes writing to the same SQLite file, it'll handle conflict detection there.

And the basic merge policies we have, we have five of them. They're fairly simple. The error merge policy is the default one. This is mostly a reminder that you haven't set one, but it also has a user info dictionary that will describe in great detail, which objects had conflicts, which properties they were. And you can use that information if you want a very customized approach. You can get that, you can refresh those objects and then save again using the information in the user info dictionary.

The next four are prepackaged. They'll do all the work for you. They won't even report a conflict. They'll just resolve it for you. And the overwrite merge policy is basically the last writer wins. This is if you want sort of an NSDocument-style behavior. The rollback merge policy is the first writer wins, and any subsequent writers, excuse me, basically will throw away their changes and pull in what the first person did. And then the next two are basically those, but on a property-by-property basis. So any properties that haven't been changed, you'll be able to write over, and then the properties which conflict, you'll resolve as either a first writer or last writer wins.

Now, you can also have conflicts that are sort of semantic conflicts with data that you haven't changed. For instance, if you go to Expedia and you book a flight that has a bunch of different legs, then you try to check out, you'll notice that if the flight prices have changed in between the time you started the process, I find that when I book flights near midnight, this seems to happen a lot, then when I check out, it says, I can't do that because the prices have changed. So one way you can use is there's a method on the context.

It's called detect conflicts for object. And what this will do is this will flag an object that you want to make sure that it hasn't changed out from beneath you. So if you have a running total that's saved on a third object, and then you have cost and cost two on separate objects, and the costs haven't changed when you make your running total, and you just want to write out a new running total, you can mark those cost one and cost two objects. And ensure that they don't change out from underneath you, particularly like with the SQLite store if you have another application working with that file.

So I'm going to talk a little bit about using multiple contexts with working with different change sets.

[Transcript missing]

So a few details. Inserted objects have temporary IDs, so only the context that has that inserted object will be able to see it, and other contexts won't. So again, same as if you're moving objects between applications using a URI representation, you have the same issue here, working with different contexts. You probably want to save the object before it's visible. It's in the next context.

Also, if you're trying to decide whether or not you want to work with one persistent store coordinator or multiple persistent store coordinators, one thing to keep in mind is sharing a persistent store coordinator is going to share the caches that Core Data uses to help speed things up and know which objects you're working with and which you've already fetched. And multiple persistent store coordinators is going to duplicate that caching. All right, so I'm gonna hand you back over to Melissa right now to talk about the persistent store coordinator.

So what you've heard until now is pretty much the stuff that you are mostly going to be working with. But just because it's really useful and sometimes will avoid having you shoot yourself in the foot, it's helpful to know about how things work under the covers. And that would be the persistent store coordinator and the object stores and all the stuff down there at the bottom of the stack.

Persistent Store Coordinator is essentially a bridge between the object lifecycle management that is done by the managed object context and the persistence mechanism, which is the stores. In a very real sense, it's the center of the frameworks world. You've got one persistent store coordinator per stack, but you can have any number of managed object contexts on top of it, and any other number of persistent stores spewed out on the bottom. There's only ever going to be one persistent store coordinator.

Its main purpose is to provide a facade of there being a single store to the managed object context. The managed object context doesn't need to know or care about where its data is coming from. It just knows it talks to the persistence store coordinator, the persistence store coordinator takes care of stuff.

When you create a persistence store coordinator, you initialize it with a managed object model, and at that point when you do that, that managed object model gets frozen. We talked earlier about modifying models at runtime. You can only do that until you attach one to a persistence store coordinator. If you try and do it after that, you'll get exceptions.

If you've already looked at the documentation, then yes, we've heard this. A lot of you have said, "What? There's no NSObjectStore, NSPersistentStore API?" No, there isn't. All interaction goes through the NSPersistentStore coordinator. And you create stores using the AddStore API, and we have a number of types that we support out of the box. It's the binary, XML, SQLite, and in-memory store types.

Which brings us to a decision when you're building an application. What kind of store do we want to use? Each type has its advantages and disadvantages. For example, the XML store is probably the slowest of all of our stores. As we all know, XML is big and slow to parse.

It takes in and writes out the whole object graph all at once, but it does have the unique advantage of being externally parsable, and you can transform the data in your XML store using an XSLT so that other applications can read it. We've also got a binary store, which is a lot faster than the XML store and even faster than the SQL store if we're talking pure qualification in memory.

It takes a bit longer to load in, a lot longer if you've got a huge object graph. And again, it works on the whole graph. This is probably the best kind of store for if you want to do application preferences or that kind of thing. Fairly small object graph that's neatly contained.

Doesn't have any of those other advantages though. There's the SQLite store, which is going to be the fastest for most things. It has a really, really huge advantage in scalability in that it's capable of having only pieces of your object graph read in at any one time. Just the subset of objects that you're actually working with now. It's really, really useful. You don't need to care or manage any of the other stuff.

We also have an in-memory store, which is fast and which also manages the whole object graph, but doesn't have a backing store. And why would you care about that? I mean, it's just like any other store. It doesn't save to disk. Why would I use that? It can be useful if you've got objects that you want to have transient.

We have an example of this in the Core Recipes application where we have the smart groups that appear in the left-hand browser for the library and for the imported files. We just create those and stuff them into an in-memory store, which we can then use and query and join relationships off of.

They never get saved because we don't really need to take up space on disk with it. I mean, they're always the same. We can create them up front every time, but we need them to act like managed objects. That's how you do it. They also have one other really huge use, which is handling legacy file formats.

Well, At this point, I'd say it's a safe bet that none of you have a Core Data application, so none of you have your data being stored in one of the Core Data specific formats. If you did, I'm pretty sure somebody wants to talk to you about violation of NDAs.

So how do you deal with Core Data? How do you use Core Data? There's a lot of advantages in Core Data that aren't just specific to persistence. You may want to use all of the object lifecycle management stuff, but you still, for one reason or another, need to live with your legacy file format.

You create your stack and back it with an in-memory store. And in your initialization methods for your application or your load methods for your application, you read in your data in your legacy format. You go through, create NSNNs, and then you go through and create your own legacy file format. So how do you deal with Core Data? How do you use Core Data? How do you use Core Data? There's a lot of advantages in Core Data that aren't just specific to persistence.

You may want to use all of the object lifecycle management stuff, but you still, for one reason or another, need to live with your legacy file format. You create your stack and back it with an in-memory store. And in your initialization methods for your application or your load methods for your application, you read in your data in your legacy format. You go through, create NSNNs, and then you go through and create your own legacy file format. - I'm afraid.

And then you save, and they get shoved off down to the in-memory store, and you can then register and catch for the NSManagedObjectDidSave notification. And at that point, you go off, pull all those objects associated with the in-memory store out, serialize them using whatever mechanism you were using before, and stuff them back out in your legacy file. This gives you the ability to switch slowly over to Core Data.

You don't have to go whole hog. You don't have to do it all at once. You don't have to write everything. You don't have to severely upset your management chain when they're going, well, we have this file format we've had for the last 10 years. We want to keep it. You can still use Core Data and get a lot of the benefits of it by using the in-memory store.

Switching a little bit, we'll talk about working with multiple stores. Ben has talked about this. I've talked about this. You can have one persistent store corridor, multi-stores. By default, if you have a stack that has multiple stores, we're going to send the fetch request when you request information to retrieve data from all stores that could possibly contain objects of the type you're looking for. You can use the affected stores accessor on the fetch requests to narrow down your queries so you only get objects back from a specific store. We also, if you're using multiple stores, will automatically do object assignment at save time.

We'll basically go through all of the inserted objects in your object graph, figure out which stores they can be assigned to based on which entities are in which stores and which entities your object has relationships to. We'll figure out the best fit and put it in there. Sometimes we can't do this because you've done something that's a little not kosher and we'll return it and say, we've done what we can, you're going to have to do the rest. Or if you want to avoid that entirely, you can assign objects when you create them using the NSManagedObject assign object to persistent store API.

You use configurations, we mentioned those earlier in the model, they're a way of creating named groups of entities to assign sets of entities to stores. You do this when you add the store to the persistent store coordinator, one of the parameters in that call is a configuration name which you'll use to say which entities go in that store.

There's some considerations that you have to, there's some things you have to take into account if you're using multiple stores and Cocoa bindings, which is that out of the box, we don't support cross-store relationships. So if you tie this with the first bullet point on this slide, which is that we fetch, we retrieve all data from all stores. If you have, back in the Core Recipes example, chefs and recipes in a number of stores, you have a pop-up list backed by an NSArray controller that knows it is fetching chefs.

It may and will actually fetch chefs from both stores when you're in your recipes view and trying to figure out which chef to assign the recipe for. You're going to have to be careful and do some workarounds to, you know, make sure that the chefs that appear in that pop-up list... Don't accidentally create a cross-store relationship. So you're really going to have to think about this kind of thing.

This is where using the affected stores accessors to narrow down the sets of objects can come in really, really useful. So you have to think about that when you're using bindings and a core data application with multiple stores. There's lots and lots of stuff I could go on for like 15 minutes about it, but that gives you enough of a pointer to know where to go look for things, what to think about.

As Ben mentioned earlier, store migration, save as, is pretty simple. There's one call on the persistent store coordinator. Some things you need to think about when you're doing it is that it works by pulling everything in the store into memory and resaving it. It's kind of time and memory expensive.

And once it's done, the old object store is going to be removed. The managed object context is designed to take care of this. We'll switch object IDs out from under it. But the object IDs themselves in those managed objects that you've got a hold of will be invalid, will change, among other things, the store ID when we do this.

So you're going to have to get rid of those object IDs and somehow retrieve them. Best to hang onto if you know that you're going to be doing this managed objects rather than IDs. But I said it's simple and that's the call. That's all you have to do to do a Save As.

This is just sort of the random stuff section. So at some point, you're going to have to decide what kind of Core Data stack you want to build. Do you want to use an NS-persistent document or not? The answer is actually pretty simple. You're going to want to use NS-persistent document if your data is completely encapsulated, doesn't have relationships. You don't need to worry about data coming up from other sources. Otherwise, you're going to want a non-document-based application.

The big bug about cross-story relationships. We've had questions about this, and, well, it's not supported out of the box. But there is a way to implement it using the transient properties that we've talked about and/or using fresh properties. But in both cases, you're going to need a custom subclass of NSManagedObject to do the dirty work.

There's two basic patterns. One of which is that you update the persistent storage-- I'll explain that in a sec-- at set time when you set the attribute, the relationship. Sorry. You override your custom accessor to push data into a backing attribute, and you'll-- is the founder and founder of the Backing Store. He's been working on the Backing Store for over 20 years.

He's been working on a number of other things, including the Backing Store for the Create an object that has two properties, one of which is the relationship property, and you mark that as transient. The other of which is going to be an attribute, and it's going to be an NSData, and it's going to contain all of the information about the objects in that relationship.

For example, say we have a model that we want to have look like this. We've got a Chef object, managed object, has first name, last name, and a bunch of recipes. We have a recipe object with a name, description, and a relationship back to that Chef. What do we do if we want to have that relationship cross a store boundary?

Well, the first thing we have to do is on both objects we have to add an NSData attribute, call it recipes_urls and chef_id_url. And then we get into the writing of code. We've chosen to use the second pattern here in WillSave. Call super WillSave. This is our access.

Go off and grab the managed object ID for, I believe we're working on the recipe side of things. So this is getting the recipe chef, getting its managed object ID, getting its URI representation, and we shove it into that.

[Transcript missing]

Grab the managed object context, get the coordinator. This is a complicated thing, sort of. Or at least it takes some code.

Get the managed object ID for that URI representation, and then ask the context to get that object. This really requires that both stores be there in the before and after case. If the store at the destination of this cross-store relationship has not been added to the coordinator at this point, you're going to get a nil back.

Set the managed object ID you got in the previous step in the Chef relationship, and you now have a relationship that bridges the stores. It's a little bit clunky, but it's probably the best way to do it. Set the managed object ID you got in the previous step in the Chef relationship, and you now have a relationship that bridges the stores.

It's a little bit clunky, but it's probably the best way to do it. Great, so we're on the home stretch here. So just a quick note that there is spotlight integration. As Matt mentioned in the intro session, these are very complimentary technologies, and the Core Recipes example provides a sampling of how you can do this yourself.

The recommended way of doing this, particularly for small amounts of metadata, is to put that information in the store metadata. The coordinator has a method. You can set the metadata for the store when you save the store, and then you can get it back using metadata for persistent store with URL, and that will get it without creating a Core Data stack. So that you can do on an instance of the NSPersistentStoreCoordinator class, as opposed to creating a whole stack yourself.

Now, this does duplicate the information that you put in the metadata, because the metadata for the stores is saved separately from the actual persistent data for those objects. It is, however, the recommended approach. It's the fastest way of working with Spotlight and keeping your Spotlight importer importer lien.

So for really large amounts of metadata, the Spotlight team doesn't really encourage this, but you can create a persistent store stack within your importer. It's gonna be a bit slower, but at the least, then you won't have to duplicate the metadata. So your mileage may vary and you can experiment with that approach. The Core Recipes approach takes the first approach. and it's on your WWDC CD. I'm not gonna talk any more about that other than refer you to the example and you can post questions on the COCODIP mailing list.

Again, so memory management for these advanced topics, I'm trying to help you understand what the right questions to ask are. We don't really have time to answer everything, I'm afraid. But one of the most important things to note is the managed objects and their associated context do not retain each other. There's basically a weak reference between the two. So the managed objects, when you're done using them, you release them, they can just go away. The context doesn't maintain a separate retain count on them.

And in this way, the framework also provides managed objects to work as sort of lightweight proxies. And we call them faults. And it's very analogous to your virtual memory paging, your demand paging. If you fetch an object and it has a relationship to a bunch of other objects, you don't have to fetch those other objects.

You can use key value coding and just walk the relationship like you might normally with value for key path. And Core Data will handle figuring out where all those objects are and bring them into memory. And when you're done using them, you can get the faults to go away.

Malcolm Crawford, who's over here, has done an excellent job in the Core Data Programming Guide. And the section Using Managed Objects talks very heavily about the nitty-gritty details about memory management with Core Data. And that actually should answer all of your questions in this. So some notes on handling really large data sets. And by this, I mean many tens of thousands, hundreds of thousands, possibly even millions of records. Is basically to use the Escalite store. Okay.

When you can for this. So I do know people who have worked with tens of millions of rows with Escalite directly. And I myself, the performance testing that we do is hundreds and hundreds of thousands of records through Core Data. So it's really pretty well. The binary store is your second best option if you need the NS document atomic reading and writing behavior. And that will probably get you to a few thousand objects. And your keynote here is to minimize the working set.

So as you have more and more data, you're going to want your fetch requests to have predicates that are more and more specific. To only retain the things you need, once you're done with them, release the managed objects and the framework will clean up afterwards. And basically just let them get turned into faults.

So to talk briefly about threading, Malcolm has thoughtfully volunteered or been volunteered to write a programming guide on this. It's forthcoming, but this should get you started with some of the basics. So the Core Data classes, for the most part, except for the managed object ID, are not thread-safe intrinsically. A lot of fine-grained locking that you don't need would slow you down, so we don't do it. So NSManagedObject has similar restrictions to, say, an NSMutable dictionary.

The Core Data framework itself, internally, makes sure that it does the right thing when it's working, particularly between a context and that context coordinator, so you don't have to worry about objects that you're not working with. The framework will do the right thing for the things that it does behind your back, but for the objects that you are working with, you're going to have to take care to use them in a thread-safe way. And you really either need to lock these objects or not share them at all if you use them. And by use them, I mean you pass a message to them. You send a message to one of these objects, or you access one of these objects. I'm Arsene.

So some threading tips is I strongly recommend that each thread has its own private context. If you're not sharing the context, then you don't have to lock it. It makes things much, much easier to deal with. And it also provides, as I'll talk about in a little bit, a little more consistency in the implications of what all those change sets are doing, because each context is tracking a different set of changes. So you can only pass managed object IDs between threads.

And it's okay to pass them as just pointers. You don't have to convert them into URIs and pass them back, although you can if you want to. And I recommend that you do not pass managed objects or the context between threads directly. That gets into a lot of very fine-grained locking and can be very difficult to get right.

So here's basically the approach that I recommend you start out with. If you want to work with multiple threads in Core Data, and each of those contexts is working with a different thread, they share a persistent store coordinator. So you share all the caching, and this works for the XML and the binary stores, as well as the SQLite store. And you can have as many persistent backends as you would like.

One alternative is you can instead have an entire stack per thread. This will get you a little more concurrency. As you see, there's no bottleneck at the persistent store coordinator level. This only works with the SQLite store, and there's no shared caching, so there's gonna be a little impact on your memory.

So the locking isn't just about thread safety. The contexts are tracking changes and essentially represent a scope of a transaction, a view of a working set of changes. So by preventing interspersing, like locking and unlocking of these contexts, you also prevent interspersing of changes. You have different threads inserting changes into the context and maybe undoing and other threads changes, stuff like that. It gets very complicated.

So that's why... I recommend that you just have a private context per thread. It's important to note that the way the framework is designed, that a thread must own the context for a managed object in order to access the managed object. The managed objects themselves are treated as too finely grained to lock and unlock separately yourself. You should lock the context that owns them because the context is the one that's doing the change tracking.

One last note, so again, look for Malcolm's forthcoming programming guide. Malcolm, one second, be ready. Is that Cocoa Bindings is not thread safe. And if you wanna work with your controller, you must do that on the main thread. There's no way around that. There's nothing I can do about it. But if you wanna add or remove an object from a controller, you're gonna have to use perform selector on main thread. That's actually true for a lot of different app kit issues. It's much easier to work with it on the main thread.

and some notifications. This is a summary of the notifications we presented to you. And this is the best place for you to do your own customization of the framework behavior outside of subclassing and managed objects. So if you want to change the way a context or coordinator is doing things.

So the manage object context has two notifications. It has an objects did change notification that gets sent when it coalesces the changes at the end of the event. So every time a set of an event ends that has had changes, you'll get this notification. There are a bunch of keys in it, insert, update, and delete the keys. These are sets of the manage objects that have changed during that past event. And the controller actually listens to this notification itself to figure out its own behavior.

And then there's the did save notification, which is presented to you a number of times for using the in-memory store to track when someone's actually saved a set of changes. The persistent store coordinator also posts a notification, and that stores did change, and this is whenever a store gets added or removed, or the UUID, the real unique reference to that store has changed. It provides an array of the stores that actually changed.

The persistent store coordinator also posts a notification, and that stores did change, and this is whenever a store gets added or removed, or the UUID, the real unique reference to that store has changed. It provides an array of the stores that actually changed. And primitive key value coding does not trigger key value observing notifications.

section on some common errors that have come up and some things is basically changing the model and trying to read the data again. I know we've had a lot of questions about schema versioning. Currently version one of the framework doesn't support this. So you can't do it. You'll have to bring up a stack as Melissa described, reading the old data and say that again. If you're doing your own custom accessors, forgetting to send the KVO notifications will cause lots of things to go awry. The controllers won't know that you're doing that.

They won't know that you've changed things. Neither will the context be able to track those changes and update those inverses. So if you start misusing the primitive key value coding, you'll see things get out of sync. And then another sort of general kind of paradigm issue is not really letting the framework drive. Like I mentioned, the lifecycle of managed object is largely controlled by the framework.

It'll fetch them and save them and whatnot. So it's best to work within the callbacks we provide you or the notifications and let the framework do the heavy lifting. So one of these is that it could not merge changes. And this is if you come across a situation, maybe you have two contexts with different sets of changes in an inspector window, you try to save, and you haven't set a merge policy. You'll get this.

And then here, Core Data could not fulfill a fault. So here you've given Core Data an object ID and you've asked for an object back with it, and that object doesn't exist. It's been deleted out from underneath you. We had a proxy for that row, but the row's gone. So you'll get this message, and you're gonna have to discard the object because it's been deleted, basically.

And then here, this class is not key value coding compliant. This will happen if you have your standard typo in a key, or if you initialize the object accidentally with -init instead of init with entity in certain managed object context. Like I said, you have to initialize it with our designated initializer so we know what entity that object just belongs to, and we can find all of its properties. So here you can't assign an object to a store that does not contain the object's entity. You can only assign them to stores that have been configured with the coordinator when you added the store that has that entity.

and then here you can't reassign an object to a store. Once it's been saved, you'll have to basically insert it again into a new store and then delete it from the old store. So you basically have to do a copy. and then here you can't reassign an object to a store. Once it's been saved, you'll have to basically insert it again into a new store and then delete it from the old store. So you basically have to do a copy.

and at your fingertips here, the Core Data Programming Guide. A lot of these sessions, if you read the guide, will make a lot more sense. Malcolm's done a great job in explaining all the nitty-gritty details of various different issues with Core Data. And there are a bunch of other different guides. There are a bunch of examples installed with the developer tools on your system. and again, Core Recipes installed on the developer CD you have for the conference.

[Transcript missing]