Optimizing Core Data Performance on iPhone OS - WWDC 2010

Frameworks • iOS • 50:31

Core Data provides fast, easy, and efficient management of your app's data on iPhone OS. Learn Core Data best practices to obtain great performance while reducing memory consumption. Gain a deeper understanding so you can use Core Data to its fullest.

Speaker: Melissa Turner

Unlisted on Apple Developer site

Downloads from Apple

HD Video (293.1 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

[Melissa Turner]

Hello welcome to Session 137: Optimizing Core Data Performance on iPhone OS. I'm Mellissa Turner. I'm one of the core data engineers who works on well core data and does a lot of optimizing so they sent me up here to talk to you because I think my manager didn't want to get up on stage this year.

So what are we going to talk about? Well we're going to talk about performance, some design starting from when you conceive your applications, things you need to think about for making it a prominent application. We'll do some analysis of an application that's not performing very well, see you know what tools we have to determine what the problems with our application are, we'll talk about some of the things you can do to manage space in your application, minimizing the amount of memory you're using and what you can do to manage time or minimize the amount of CPU you're using. Of course as someone much smarter than me discovered about 95 years ago, space and time are related so often by minimizing one, particularly by minimizing memory you're also going to help with your CPU performance.

So at some point in the code cycle somebody says you know wouldn't it be really cool if somebody built an application that does insert really cool thing here and you know that's all fine and good that happens a lot but you know in the life of a special application, special code somebody will say you know yeah that's really cool I'm going to do that and at that point design has to start and you start thinking about things like how am I going to build this application? Where am I going to build this application? How do I do it?

And depending on what device you've targeted each device has different constraints, things like how much memory do I have to play with? If you're working with a desktop system as we all know you have effectively infinite memory, or you know close to. If you fill up your available RAM it'll swap the disk. This isn't true on an IOS device.

Associated with memory there's memory bus feed which is how quickly you can get memory or data out of memory and into the CPU processor. A lot of the time you've got big memory and big CPU and a big CPU bus and little tiny memory pipe going between them so you really want to try and you know not load any data that you don't have to out of memory because well that's going to make your application slow. You have processor speed to think of. How much time how much work can you afford to do before your user gets bored and hits the home button?

IO speed is another constraint. There's you know the obvious of loading data out of the flash but there's also you know pulling data off of friends phone via Bluetooth, or you know reaching out into the 3G cloud and pulling something off a web server somewhere and as we all know there's different responsiveness requirements on different platforms. It's perfectly acceptable, or you know people have learned to live with on a desktop system, things like the spinning pizza of doom. It doesn't work so well on iPhones and really the first step in optimizing your application is actually knowing what your application does.

It sounds really really silly but a lot of the time when you start studying what your application is doing you'll be going wait a second why is it doing that? And really understanding what it's doing is the first step to being able to figure out what it's doing wrong or what it could be doing better.

Versus what it's doing that well it just needs to be doing. You know if you've got a recipes application and you find that it's going out to the network first thing when it launches you might ask yourself why. If you've got an RSS feed reader that isn't you might ask yourself why.

So you know have this model, mental model of what your application is supposed to be doing, spend some time trying to figure out what it is doing and if there's any places where you don't understand what it's doing you know figure it out and keep going until you have a pretty good sense of what your application does because then you're ready to start optimizing because then you'll have some sense of what you can optimize and what you can't and when it starts with the stuff that you can't optimize, the first thing to do is figure out where the low hanging fruit is, figure out what's really easy and more importantly figure out where your time is spent so you can avoid optimizing the 1% performance.

If you've got something that's a heavy weight in your mind operation but it's only taking 1% of your application's total run time it's not that great a candidate for optimization so you make it 100% faster, taking half a percent of your application's run time it's really not an improvement, get something else that you dismissed that's taking half your application's run time. You improve that by 10%, hey your application is 5% faster right there. Pick the right targets. How do you do that? Well you can start with instruments. Instruments on the simulator, instruments on the iPhone, and you can supplement that using Core Data logging on the iPhone.

Instruments on the simulator: The thing to remember here is that the simulator isn't an emulator, you're not going to get perfect timing information of how your application is going to run on a device when you run it in the simulator but the simulator and devices are both 32-bit systems so you can get a lot of information about the memory use patterns of your device by running your application in the simulator. The big reason for wanting to run in the simulator is that your build cycle is a lot faster.

You're not going to spend any time R synching stuff down to a device. You just build and go, build and go, build and go and you're going to be doing this iteratively cause you don't want to just test once, you want to test once, make a change and see if your change actually worked.

You can also use the simulator to find out about file system activity. Find out if you're hitting disks, finding also if you're hitting networks that kind of thing, and in addition the simulator allows you to do the Core Data use the Core Data static probes, which are a bunch of de-trace probes we've embedded in the framework that allow you to follow Core Data's file system activity, when we're saving, when we're fetching, when you're firing a fault and when that fault requires a round trip to the database because it's not already in the cache.

There's also instruments on the iPhone. This is really good for determining where your application is spending it's CPU cycles, where it's how much time it's spending doing graphics, what random system activity you've got going on and it can be supplemented cause we don't have de-trace probes on the phone with the core data logging, which you enable using com.apple.CoreData.SQLDebug1 which will give you timing information about a lot of the operations within Core Data. And now I'm going to walk through an example of how to use instruments to find some bottlenecks in a demo application that we've written just for you guys and over here I have an iPad sitting on a Wolfe projector. I have a couple of applications.

You can tell they were built by an engineer and not somebody who knows graphic design so the applications are pretty simple, one one thousand, two one thousand, takes a little bit over 2 seconds to launch it. It's a simple contact management application, you can scroll up and down in a master list view, select contacts and they'll show up in the detail view and you know it takes a couple of seconds to launch, which can be an eternity in the iPhone world so we spent some time with that application and we built this one. One one thousand, not even a second. So you can see the kind of performance gains you can make using even fairly simple applications by using some tools we have in our system.

So again we have the applications and can I get a show of hands for people who've used instruments before or are familiar with instruments. Oh good a lot of you will know what I'm talking about but some of you this will be new and interesting information, because it is interesting, I think instruments is one of the best things about our system.

So I'm going to create a new template and basically this allows me to specify what kinds of profiling I'm interested in doing. You can see over on the left hand side here I can have a number of target platforms that I can set up a template for. The iPhone, which has you know templates for memory and CPU sound playing and random system stuff.

I can target stuff in the iPhone simulator, or you know if you're interested you can target stuff on Mac OS X and because I'm an Apple engineer and I know where all the bugs are buried I happen to know that even though the Core Data tools shows up only under Mac OS X, it can actually be used in the simulator as well so I'm going to pick that as the template I use to examine my app in the simulator because I'm interested in this point at looking at its memory use.

But I'm not just interested in the Core Data activity I also want to see you know what it's doing in memory so I'm going to come up here to the library, which will show me all of the possible instruments I could add to my template and I'm going to pick the memory option and in here we have a bunch of instruments that specifically focus on analyzing memory and if you come down here in the bottom of the library you'll see the instrument name and some information about it and specifically you'll see a couple of little labels here telling you what platform you can use the instrument on and in this case I've got allocations and it can be used on the iPhone and the Mac so I'm going to grab this cause I'm interested in seeing how my memory is being used and I come up here and specify that I want to run it against the demo system because I want the simulator and we'd have to go choose a target, my users WWDC, nope shared, desktop, builds, [inaudible] apparently I didn't quite get everything organized when I set up, there we go. So this will allow me to launch my application in the simulator.

We can see that memory shoots up and then plateaus shortly after launch and since what I'm interested in is my launch performance I can just stop it now and start doing some digging. Well if I look at my allocations the first thing I'll notice on this top line here is that I've used 9 1/2 meg of memory, that seems to be an awful lot for an application that has 1,000 contacts in it.

Shouldn't really need to use all of that much up front and overall I've churned through an awful lot, just over 100 meg and again because I'm an Apple engineer and know where the bugs are buried and I came here to share this knowledge with you, I know that manage objects actually don't show up under the name NSManageObject in this view they actually show up as bites so I know that they show up in the malloc'd 80 and malloc 84 collection.

You can see well I've got 1,000 of those, well over a 1,000 of those. If I go down and look at the Core Data fetches I can see why I have so many of those because well I did a fetch against persons and I got 1,000 people back. Just for an application that shows 16 lines on screen. It seems overkill doesn't it?

Well let's have a look now at actual performance on the device. I have down here or I have here an instrument that's set up to run against the first application sitting on the iPad over here and again we can see memory performance shoot up and then level off in our first application and that tells me my app has launched.

Again we can see we're using an awful lot of memory. Something else I might want to focus on as well see I know that I've got images in that contacts in those contacts and images are backed by CF datas and well I've got 1,000 of those, that seems wrong too.

If I come down here and look at the CPU monitor this is where I get the really interesting stuff cause this tells me you know this is my application, it's currently using 20 meg of memory and it took 2 and 3/4 seconds to launch. Well given all of that information that it's taking a long time to launch and some information about how we're using memory and loading too many objects we went off and we looked at the model in our application and our original model looked a lot like this. It's fairly simple; it's what you might do as a first pass when you're creating a contact management application. It's got a person who has addresses, emails and phone numbers.

Well we know we want to move picture off because we don't need to load the picture until we're actually displaying the object and maybe there's some other stuff we can do. Well our second pass looks a lot more like this. We've still got a person but now it's a detail person and we've moved the picture off onto its own separate attributes so I don't need to load the picture every time I load a person and I've also split out some of the information from person and duplicated it on a list person and the list person is what underlies the master view in the second version of the application, has a display first name and display last name and a canonical sorting name.

The canonical sorting name we'll talk a little bit about the concept behind it later but it basically it allows me to put the name of the person in a very simple easy-to-compare format in the database so I don't actually have to do fully Unicode or a string comparison where I'm trying to load the person view or when I'm trying to do a search against it and you can also see here that I've persisted the section information and we'll talk about that in the section about the fetch results controller. So what other changes did I make besides changing my model?

Well first up remember I keep talking about loading 1,000 objects and how I've only got 16 lines in my view, I decided I'm going to use Core Data's batching feature to only load a few of those objects, just enough for you know the ones in the view plus a few more hanging off the end. I've decided to set my batch size to 25 and this will cause Core Data to only load 25 of the objects that are underlying the list view at any one time.

We'll talk more about that in the fetch results controller section. And one more thing I've done: I've gone into the detail view controller and I set relationship key pads for pre-fetching so that whenever I load an object into the detail view I know that I'm going to want to access its addresses and its phone numbers so I'll load those at the same time instead of individually loading those afterwards.

So did it work? Did we make things happen better? Well I have yet another trace and I'm going to run this, again I'm going to check my allocations and my CPU use of the application running on the device and away we go. Memory spikes, memory spikes and then it flat lines and we're done.

Well first thing I can see here is that I'm down to under a meg of memory instead of the 9 we had originally and overall I've only used 2.6 meg of memory instead of well over 100, a lot less churn. If I come down to my 80 byte objects I've got 252 of them now instead of 1,300 I think it was before.

If I look for my CF datas I've got 25 of them, whoops that was an accident. If I go to the CPU monitor and look down at the bottom line here I can see that I've used a total of 6.5 meg instead of the 20 that the previous version was using and I launched in about 2 seconds faster than the original version.

So you can see the performance and just doing a little bit of performance analysis and start having some idea of what you should be looking for can really help you make your application a lot faster and just out of curiosity as we're building the second version of the application, we're going so how much data can we add before it gets performance characteristics like the first one. We actually managed to stuff 100,000 rows into it and if anybody has 100,000 contacts I'd like to meet them.

So having talked a little bit about how you can use instruments to optimize the performance of your application, I'd like to talk more generally about different problems that you're going to run across while building an application or that you may run across and some of the solutions to those problems. I'm going to start at the beginning with data importing.

Why do you want to import data? Well there can be a number of reasons. For a lot of people they want to do some kind of import when their application is first run. A recipes application may want to add a few default recipes so that when it first comes up it's not blank.

An RSS feed reader may have a few feeds. Sometimes inputs porting is part of an update process, once a month you go out to a website and get a whole bunch of new information. This is common or for stuff like you know magazines, or reference documents that have periodic updates. It could also be part of your application standard workflow. If you have an application that allows the user to go out and access a web server and download data and you're using Core Data as a local cache that's also an import.

So how do you do an import most efficiently? Well the easiest way to do it is just not to do it. Core Data allows you to set up stacks that have multiple stores so if you're doing something like a periodic application, or a periodic data update that comes in a batch just serve it in a Core Data store format and have your application add that store with the coordinator.

We'll query it, completely transparent to you whether you've got 1 or 10 stores. This is best in the case where the data is disjoint, where there were if you have user data that can be modified it's not related to the data in your seed cause cross-state store relationships are possible but they're a little bit tricky, but they are possible.

But if you do have to get all of your data into a single store, well this is the general recipe that we recommend. Use a separate context. Why do you want to do that? Well because you don't want your main user context to ever be blocked. You want it to be as responsive as possible; you don't want some other process stuffing data into it. You also don't want them to get confused with partially imported data being returned by their fetcher plasts, do batch saving. Don't save after every single object.

IO is the mind killer; you don't want to do that. Do it every 10, 100, 1,000 whatever the size of your objects is that makes your application responsive. Put a nested auto release pool around the batch import that'll flush out all of the temporary objects and help keep your high-water mark on memory fairly low. Avoid doing unnecessary fetching from the database.

If you have to create relationships between the objects that are being imported and the objects that are already in your system, fetch some of those objects, the ones you know you're going to need up front. Don't go one by one in search of them and try and cache unique objects that you know you're going to be creating relationships to.

How do you want to set the context up? Well there's a couple of ways. You can set both contexts up to use the same persistent store coordinator and the same persistent store. This has a lot of benefits. This will first off it minimizes the amount of memory being used by your Core Data part of your application. Second, it puts the imported data into the row cache down in the persistent store, which allows the user when they query from their main context to more easily access it without having to go to the database.

You could also if you have a really performance-critical application where you want to have the persistent store coordinator locked or the persistence layer locked for as tiny a window as possible, to set up two parallel stacks. There's a real overhead in this in that Core Data stacks are not cheap but if you really, really want your user to be able to access the data as quickly as possible, this will allow you to have the user's context locked out only for as long as it takes to do the transaction down in the SQL database. So it's sort of a choice there. Most of you we think are probably want to go with this model because most imports are actually relatively small but the other one is out there if you need it.

How do you import data? It's actually pretty simple. You set up your persistent store, contacts, coordinator whichever are the two patterns you've chosen, set up your auto release pool, iterate through the collection of objects that you need to insert, create and insert them, and then do batch save at the end of the process, every N times, pop the auto release pool, reset it and keep going.

So we talked about a bunch of things you can do and a lot of you at this point have probably been thinking you know this sounds like the kind of thing that's probably best done on a background thread and you're right and we'll get into talking a little bit about how to use threads with Core Data after the next session, which is about deleting because you know book ends, symmetry is nice. How do you delete data?

Well when do you delete data? First, it can be a user-initiated delete. They can decide for whatever reason that they need to get rid of 10 or 15 or 20 objects, or a couple 100 objects, or a couple 1,000 objects. It can be part of an application update.

You have some amount of data that's being replaced by whatever is in that new update. It could just be aging out. If you've got an application that shows upcoming events somewhere you don't really care about keeping track of the events that happened yesterday or 4 months ago. So how do you delete them?

It's very similar to importing. If you can, especially for the application update case, just get rid of a single store. Fast, simple, removing a file is much faster than removing rows in a database. If you can't you want to use again some of the other same strategies. Batch up your deletes; use an auto release pool and pre-fetch cascaded objects. What do I mean by that? Well this is probably a bad pattern.

My other approach of course is just to go through all of the objects that are to be deleted one by one and then call save on your context. There's a problem here and in deleting an object can cause other objects to be loaded and you can end up making repeated trips to the database in an attempt to get all of the stuff that you need to nullify keys in or what have you, make all of your relationship maintenance work and you really don't want to do that, you want to go to the database as few times as possible because IO is a killer. So yeah you don't want to do this what you actually want to do is something like this.

You want to create a fetch and go off and load all of the objects that are going to need to be loaded as a result of that batch. You set the relationship key pads for pre-fetching and then you execute that fetch and this will tell Core Data to go out and upfront get all of the objects you are about to touch and do relationship maintenance on and then you can go through and iterate through one by one all of the objects to be deleted. Core Data will process those deletes, propagate it out through the object graph and then you can save. And again this is something that's often best done in a background thread, which we'll talk about now.

How do you do threading in Core Data? Well how to do threading when why? The big one is to avoid being terminated, either by spring board or the user who's decided to hit the home button. When do you want to do it? When you go to large operation.

Often it's an import or a delete, sometimes it's an update, the user has done something that's caused a stage change in your database or in your object graph that's going to ripple out and touch 20 or 40 or 100 or 1,000 objects and also you know if you're doing a network operation, you're probably going to want to spawn that off and do a background thread rather than having the main thread blocked and unresponsive. How do you do it? Well the easy answer is using wood dispatch or GCD, they're really the same thing actually I suppose.

The dispatch is new on IOS 4; it's been on the Mac for a while those of you who are Mac programmers. It allows you to push blocks onto a queue for execution on some random thread, GCD will decide how many threads you've got and what's most efficient for the kernel. The dispatch cue's default to serial. If you're targeting the iPhone OS 3 platforms you also have the option, or IOS 4, you have an option of using an NS operation queue, which is very similar but has existed on previous versions using NS operations instead of blocks.

It allows you to push an NS operation onto a queue and then it takes care of managing all of the threads behind your back. The thing to note is the NS operation is defaults to concurrent queues rather than serial queues and we'll see why that's important in a second but you can also set on an NS operation queue the maximum concurrent operation count to 1 to turn it into a serial queue and once you've got these queues what do you need to know about Core Data and queues?

Well the first thing is you need to confine the manage object context to a single thread. Contexts don't really, aren't really designed to support having multiple things try and change data in them. It's a standard you know threading data problem. Having multiple threads try and modify the same piece of memory is a really good way to shoot yourself in the foot or other people in the foot. You have cache coherency problems where 1 thread won't be seeing what the other thread is seeing.

You have right timing issues. It's very very hard to get something very very complicated correct. So we basically say don't. All operations in a manage object context have to be done by single thread and that includes the manage objects that exist within that context, they also can only be accessed from the context thread.

So what does this mean sort of visually? Sometimes it's easier to understand that way. Well for every serial queue all because a serial queue guarantees that all blocks or all NS operations on that queue will be run serially. They can use the same manage object context but a separate serial queue, which will possibly be running its operations concurrently with the operations in the first queue needs to have its own context. But if we had a concurrent queue, each block or operation in that queue needs to have its very own manage object context cause who knows when those are going to run.

Other things to think about when you're doing currency with Core Data is the background threads may terminate with their work uncompleted, they're detached threads they just get shot. So if you're running a large process in a background thread, particularly if you're doing an import or a delete and you batched it you may need to have either the main thread postpone the exit cause you have something like 5 seconds after you've been notified that you need to exit in order to actually clean up your work and finish, so if you think you can finish in that time good do that, but otherwise you're going to need to track how far into an import or a delete you've managed to get and then re-run part of it as necessary the next time your application starts.

It's actually very easy to set up operation queue. I mean its 10 lines of code total for kicking off the import we saw earlier. Create the import queue somewhere in this case I'm doing it in init method. It's 3 lines of code, NSOperation queue allo init, I make it a confirmed operation queue and I kicked off the import. The import basically adds a block to that queue.

The block loads the data that I'm going to be importing from network or from where it gets it, goes through, kicks off the import method we saw earlier and then notifies the main thread that it's done. The main thread can simply pull this block this complete import could cause a variable to be set in the class. The main thread can then check before it exits to determine whether or not it's safe for the application to exit or whether it needs to mark some state so that you can pick up on the import later.

How do you communicate changes between threads if you can only have manage object contacts living within their own thread and you want to communicate changes that happened in one context over to another context? Well you can only pass object IDs between threads, they're the thread-safe unit of operation. You can pass the NSManageObjectContext did save notification between threads. If you inspect it you'll notice it has managed objects.

But if you're calling merged changes from managed object contacts did save notification we wrote it, we have done all of the special sauce to make sure that it handles objects correctly in that case. So those are your two mechanisms for getting notifications back and forth between contexts that are running, being managed by different threads. So we know how to import, we know how to delete. There's a whole bunch of stuff in the middle so I'll start with how do you manage memory?

The first thing to start with is only load what it is you're actually going to need. If you're dealing with objects your two strategies are going to be batching and partial falls. What are they? Well batching allows you to get when you execute a fetch request a collection back it basically knows how to go get manage objects but doesn't retrieve all of the manage objects data.

This allows you to use a very small token instead of a very large manage object in things past other APIs that need collections. For example the list view we saw earlier that wants a collection so if you use a batched fetch you can give this collection back and it knows how to respond appropriately when you ask it for an object at any given index.

Basically, there will be so the batch size is 3 when you try and access one of the first 3 objects poof there will be objects. You don't need to know or care how that happened. If you then try and access an object elsewhere in the array, poof there are objects. Again you don't need to know or care. So it gets big when it needs to be and it's small until then.

Partial faults are sort of the same kind of thing. Partial fault allows you to load part of a manage object because a lot of the overhead in loading a manage object is actually in the attribute data and you don't always need all attributes. If you've got a list view you need whatever attribute you're displaying in the list view but there's probably more detail associated with that object that you don't care about until you select the object and bring up the detail view so you don't need to load all of those extra attributes into memory. Just set up your fetch request so that it only returns the attributes you actually want and then when you try and access one of those other attributes they will magically be there.

Of course if you don't need an object you can always load an object ID. We talked about those objects is a little reference you can use it on the manage object context with object with ID or existing object with ID to get a reference to the actual object when you need it later.

Well what you need is meta-information and we actually have some other ways to do it. If you need to just get the number of objects you can use count for fetch requests. If you just want to find out if an object with an ID exists, you've got the object ID from somewhere or from another context, or you had it stored somewhere on disk you can use existing object with ID error. This will either return an object or return no and give you an error telling why you can't return that object.

You can also get dictionary results back, this allows you for example to get a list of unique values out of a database, get me all dates on which I imported photos. Get me all dates on which there were Core Data labs at WWDC. Can also use the dictionary results option to get aggregate values.

Get me the total time spent talking about Core Data at WWDC and the way to find all of the options that you can flag on NSFetchRequest, go dig in the NSFetchRequest documentation or an NSFetchRequest.h there's a lot of stuff in there that will allow you to very specifically target your fetch request to get the data you're interested in, in the way that makes the most sense for your application and something else to remember when you're talking about memory it's not just you know don't fetch what you don't need but get rid of the stuff you don't need any more. Use auto release pools liberally. That'll release any intermediate cross created and keep your high water mark down. Prune the object graph regularly. Get rid of objects you don't need.

The overkill method is use NSManagedObjectContext reset. This will invalidate all objects, break all relationships, cause everything to be flushed out. If you need more specific, more targeting pruning use NSManagedObjectContext refresh object merged changes. That can be a little bit tricky if you're using refresh object merged changes on an object that has changed relationships. You really want to set merge changes to yes, otherwise you risk getting your object graph into an inconsistent state where an object has a relationship but the object it's related to doesn't know anything about that relationship.

So merge changes no is good if your objects are unchanged and you just need to break relationships so other stuff will follow. Merge changes yes for objects have changes. Again in the middle there's how to manage your CPU time, have a number of ways to do that. The first and foremost of those is let the database do it.

Core Data in most cases for most of your applications will you'll be using the Core Data SQL store type. It's an SQL database. SQL databases are good at some things, they're good at filtering, they're good at sorting, doing calculations and they do it all without having to instantiate any objects or malloc anything. So let the database do it if you can. Operate let Core Data translate your predicates into wear clauses. Let it give us a sort operator and we'll make the database do the sorting.

Use the dictionary results typed with aggregates to let the database do the calculation. Other stuff: unicorn code aware string comparisons are expensive. I mentioned that on the application earlier that sometimes you really don't want to search in the way you think you want to search. This for example is kind of dangerous.

You'll note in here that we have, for those of you who are familiar with the predicate syntax, a case in diacritic insensitive search that means that every time I try and compare a character in the database with a character in memory I have to say well is it the character?

Is it the lowercase character? Is it the uppercase character? So the character was some kind of diacritic mark that I don't actually care about and there can be a lot of those. You really don't want to have to do all of that work in the database. I mean they're faster, normalize it.

Strip all of that out and store a canonical form in the database that's very easy and very fast to search and that allows you to use binary operators instead of string operators. This is basically a MEM compare that's a lot faster than doing any string operation. Look for strings that are bigger than frank and lower than frond and you'll notice I've set this up using a begins with there's a very specific reason for that and that's because a lot of the Apple applications and a lot of other applications that are emulating Apple applications only do prefix searching when they're doing searching for things.

They're just type along prefix, not looking for stuff in the middle of the word so try and avoid ends with, contains, like, or matches if possible. Matches in particular can be very dangerous because it's a full regex and you don't want your user accidentally tripping over the order 2N performance that a regex can have.

It's entirely possible you'll have a use case for it but it's probably something that you should make sure your user has to do intentionally and possibly you'll want to put some kind of background thread that does the search and then notifies the foreground thread. You can sort of update them along the way as their fetch is happening.

But you're saying well I've got a title and it's a string with lots of words in it and how do I do prefix searching because prefix searching only looks at the beginning of the string? Well in that case you probably want to start normalizing the keywords in that title that you care about searching on into an entirely separate entity. For example, if I have a book it has an author and a title.

What I could do is create a separate and I want to do a search like title contains red, as I said on the previous slide that's slow. What you could do is create an entirely separate keyword attribute that you store all the important title words in and relate that to the title or the book and then you do a search, the search we saw in the previous the quick search we saw in the previous slide then compare and follow the relationship off the keyword to all of the books that have that keyword in the title.

Something that we've seen fairly often that is kind of an odd thing is people will use a list of get me all objects where X is equal to something, or X is equal to something, or X is equal to something, or X is equal to something. Often it's better to do inquiries for that get me all objects where X is in this collection. SQL database can do that a little more efficiently than this very long string of or's, it's also faster, it's constructed and easier.

SQL doesn't have much of a query optimizer so you'll probably want to do your own predicate ordering and make sure you order the complex queries part of a compound predicate last. If you've got something that can do an integer compare order that first. If you need to do a regex order that after you've already filtered the rows down to as small a set as you can so you're doing the regex against the minimal number of strings that you have to.

Try not to follow relationships if you don't have to cause that requires making you join the database that can be also expensive. But there are cases where what you're trying to do is filter about the contents of relationship and for those we've provided sub-queries. They're correlated and they allow you to do filtering by relationship content. For example, if you want to find all people who have a roommate named Jane Doe.

You can construct a sub-query that takes all roommates, looks for individual roommates whose first name is Jane and whose last name is Doe and returns true for all people for whom that count is greater than 0. You can also use the aggregates on relationships, create a sub query where this will return all people whose roommates are all old enough to drink.

You want to make your managed object model reflect the workflow that your application is going to have. This is really hard to overstate how important this is. Don't design a perfectly ER model if that doesn't make sense for your application. But also take advantages of the fact that you've got a database there and you know use things the database provides for you. Indexes for example. An index allows a database to very quickly do a search on a column and you can set up indices in Core Data by going into the attribute inspector in the modeling tool, finding the index button and checking it.

This tells Core Data that you want to put an index on that column for optimal searching. When do you want to use an index? Well if it's a column you're going to be searching frequently and your application is primarily centered on doing searching. Because there's as there are with many things a performance tradeoff involved.

Indexes require storing additional data in the database and they need to be updated every time you change the contents of the column that the index is on, so if you have an application that spends 90% or 95% of its time doing inserts and deletes there's a lot of time you're going to be spending having to update that model, or update that index data.

Whereas if you're searching all of the time you don't really care if it's expensive to update or update the index because you've done an updaters delete because you're never doing them so decide whether you application cares more about insert and delete performance or search performance. If it cares about search performance consider an index.

Normalization: separate unlike things. I actually once had a friend who had this kind of structure in his life. He had N phones; he had like phones for everywhere. He was a salesmen, he traveled all over the world, he was a start-up so it was better to get him a phone for every continent he was on then just have one phone that he could take roaming everywhere and you know it's sign that you've done something wrong in your model if you have to change entities in your model every time somebody adds a phone so a better thing to do in that kind of situation where you notice you've got the same kind of data being put in multiple attribute fields is to create a relationship and move the repeated data to the end of that relationship.

This is standard entity relationship modeling and some of you are familiar with it and you're about to have the hair stand up on the back of your neck when you see the next slide which talks about de-normalization which is moving information back across relationships and sometimes this can be really important, like say for example I have an event and I want to in the view of my application display some information about a relationship that event has.

For example whether or not there's attachments like PDF files that need to be read in order before you show up at a meeting. You don't want to actually have to load the contents of that relationship to figure out if you should put the flag up so what you're probably going to want to do is de-normalize the contents the information about that relationship that there are contents to that relationship. Each one attachment account field on the event entity itself and just keep that updated as you update the contents of the attachments relationship. This will avoid having to load that relationship whenever you need to display a very small flag.

I talked about pre-fetching back in the delete part of the talk where I mentioned how it can be important to do pre-fetching before you start a delete in order to have as few trips to the database as possible. There's other times you want to do that as well for example, in our contacts application we did pre-fetching which will load all of the address and phone number information because we had it de-normalized as we just saw in the other slides. We want to load all of that when we load the detail view and you do that.

Here we have another example by setting the relationship key pads for pre-fetching. Just set all of the relationship key pads whose information you're interested in and Core Data will go off and grab all of that information from the database during the same fetch as when you're loading the objects. So for my example whenever I want to load information about my manager I want to load information about everybody who works for my manager, I can set the minions relationship and all of his will be minions will be loaded.

If you have multiple stores you can actually tell Core Data to only send a fetch request to a given subset of stores. This can be important if you've decided to go with the I have separate sets of information in each store and I know before I do the query which set of information I'm interested in.

You can specify that it should only go down to that store and it doesn't waste the time going down to all of the other stores for which you know you're going to get a null result. And taking a slight diversion from just pure performance I'd like to talk about the fetch results controller.

Fetch results controller is something that we've added to Core Data that allows you that will basically manage your data for you if you're using it in a iPhone application. It's integrated with the UI table view and what it basically does is manage the underlying it's in charge of keeping your view updated when there are changes in the underlying context or it can be anyway.

It manages your sections for you and it does change tracking and it can do caching. It's actually very easy to configure. You give it an NSFetchRequest that specifies the information that's being controlled. You give it a section name key path which is the key path on the objects that are being managed that it uses to sort them into sections. You give it a delegate that's used to inform the list view about change tracking and you give it a cache name, we'll get into both of those.

I said it will do automatic change tracking. Basically, if an objects in your object graph is changed as a result of something that happens in a background thread, or results of something that changed on another object, the contents in the fetch results controller and it's reflected list view will automatically be updated if you set a delegate that responds to at least one of the delegate change tracking methods. You can see those in the Core Data navigation application template if you create a Core Data navigation based application from the template it will give you a fully functional application that you can go dig through and see how all of the fetch result controller stuff is connected.

One thing to note is that if you're using the dictionary results option for the type of objects you want returned by your fetch request, you cannot track changes to those. It can't track changes to those objects because Core Data can only track changes to NS managed objects, it doesn't do it for any other types. I keep mentioning that the controller does caching. What's that really mean?

The controller can persistently cache the results of the fetch request that you configured it using. Why do you care? Because this can be really helpful in speeding your application launch. You'll store a list of all the objects you need to display in a list field and not have to re-perform that fetch against the database when your application starts.

This is enabled by setting the cache name and well if you're using different queries you need to have different cache names, otherwise somebody is going to get to that file first and your other controller is going to be really confused the next time your application launches. If you're using a fetch result controller for caching, you must not change the NSFetchRequest associated with it.

You're not allowed to change the predicate; you're not allowed to change the sort descriptors. You're not allowed to change the return type. The persistence mechanism requires that fetch request remain immutable otherwise you have the potential for the fetch request the results of the fetch request being executed or no longer accurately reflected in the cache.

Bugreport.apple.com cause I'm at the end of my talk and I figured I put this up before the summary. If you've run across something in Core Data that it's not working the way you expect it file a bug. If you run across something that crashes and you think it's our fault file a bug. We can't know that you're having problems unless you tell us. If you file a bug report it's going to be addressed faster if you give us at the very least steps to reproduce and you get bonus points if you give us a sample project.

Say run this project, look here and I'm expecting to see this and what I'm seeing is this and given that information we can look at what you're seeing and what we think you should be seeing and say oh right that's a bug we should fix that, or no really you've done this wrong and it's really supposed to work that way. Also use bugreport.apple.com for future requests for enhancement requests.

This is documentation you think you need needs to be fixed or just documentation you would like to see and if you've run across a performance issue we're especially interested in those. We saw a bit about doing performance analysis, talked about managing memory, managing your CPU use and how these can actually are interrelated and because when I was walking through this presentation last week my manager said there weren't enough puppies and kittens, there's a puppy and a kitten.

Is that good Ben?

[ applause ]

Unfortunately we're sort of at the end of the week for the Core Data stuff so I can't point you off at any labs or any other subsequent sessions. I can however point you at the Apple Developer Forums, we pay attention to those.

We're there on a regular basis. If you've got questions or you know you just want to talk over stuff it's a good place to go. Our Evangelist is Michael Jurewitz you can always send an email and there's a lot of Core Data documentation to help you get started on pretty much everything you could conceivably want. And we're done.