Tools • 43:56
Core Data is a rich framework for managing your persistent application data on Mac OS X. Learn the best strategies for structuring your Core Data model, and the design decisions that will yield the most scalable and efficient application data. An invaluable session for developers new to Core Data and for experienced Core Data developers looking to get the best performance out of their application.
Speaker: Melissa Turner
Unlisted on Apple Developer site
Downloads from Apple
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
So the lights are up and the music is down, and I think that's my cue to get started. So this is session 922, Creating Efficient Data Models with Core Data. I'm Melissa Turner. I'm one of the software engineers on the Core Data team, and I'm here to tell you all about how to make your data models nice and shiny.
During this session, we're going to do an overview of some of the basics, both of Core Data modeling and relational databases in general. We're going to walk through a fairly extensive modeling example, show you how to take a data set and put it into a database schema. We're going to talk a little bit about some performance issues that come up when you've got a lot of data that you need to manage, and go through a couple of tricks that will help you do some debugging on your application once you've got something that you think you want to try turning into a real app.
So Core Data, well, it's technology that has some very old roots and really, We have multiple stores in Core Data. We have an XML store, we have a binary store, we support your custom atomic stores. But conceptually, really, Core Data was designed to work around a relational database. We've made it work with all of the other stores, and it works really well with all of them. But some of the core concepts, the entity description, the property descriptions, these are from the relational data modeling world.
Core Data takes care of doing all of the infrastructure work, putting stuff into databases, pulling stuff out, keeping track, managing relationships. But there's no way for us to take away the fact that in order to do a good model design, you need to have some awareness of the underlying technology you're working with and how to most efficiently make that work.
And the techniques are useful across all stores. Just because I'm talking about something that's in all of my examples using a SQLite store, and we're looking at tables in a SQLite database, there's no reason those wouldn't also apply to an XML or binary or an in-memory store. A lot of the same performance hurdles are in those other stores.
Data modeling basics. What is data modeling? It's the art of taking a large blob of conceptual data and fiddling it and tweaking it and turning it into something that's nice and efficient and can be used to make an application that's as optimized and quick and responsive as your user wants it to be.
If you've got a bad data model, that can really hurt database performance. If you've got a good one, or your application performance, simply because you're spending a lot of time doing things you really don't need to be doing. Likewise, or inversely, it can also really help your application performance when you've got your schema laid out as neatly as you can.
Your data model is going to be heavily influenced by your interface and vice versa. One of the primary mistakes a lot of people make is to go off and do a lot of design at the database level without ever really thinking about the application that they're going to be putting on top of it.
A canonical example of this, for those of you who are familiar with the Microsoft world, is the kitchen sink window that has everything on it. Well, one, that's really hard for the developer or the end user to figure out what they're supposed to be focused on in any one place.
And two, it means you have to haul all of that data out of the database and put it there in front of the user who doesn't need 90% of it when they're looking at that window. There's two core techniques used when you're doing data modeling. There's normalization, which is all about pushing data out, and there's denormalization, which is all about pulling data in. And we'll go into those in a bit more depth later.
As for relational databases, this is pretty much all you need to know about them. Relational databases are tables. They're collections of tables containing information. Each table represents one entity, or as we'll see later, it can possibly represent multiple entities. Each column represents an attribute on that entity, and each row represents a single instance of an object. In the Core Data case, each row would represent one managed object.
That's really all you need to know about Core Data, data modeling, and databases in order to get some use out of this session. Up front, I'm going to say this: There are many right answers. What does this mean? Well, it means that for any given data set, there's multiple different applications that can be used and built to handle and model and manage that data set. And for each and every single one of those applications, the schema may need to be a little bit different.
You may build an iTunes application that's focused on managing songs. You may build one that's focused on managing albums. Somebody back there may build one that's focused on managing artists. And all of these has subtle implications as to what your schema needs to look like, where the primary entities in your schema are, and what the relationships off them need to be.
As in so many things, there's probably a few of you in the audience who are familiar with data modeling going, "But, but, but!" Because there's always the theory and practice bridge. In theory, there is actually one perfect model: purely normalized, shiny, clean, and completely and utterly inefficient because every single table has one column. And frankly, this means that every single window you want to load is going to be a 15-way join, and that's slow.
The optimal model is clean and purely normalized. Normalization is the only strategy you'll ever hear about in a lot of relational data modeling books, and you can forget that. Really, honestly, denormalization is where it's at. It's the useful thing. It's bringing information back to where you want it.
There's always a tension. Normalization, denormalization. There's a lot of tensions in data modeling and in coding in general, I'm sure you're familiar with. You want to do one thing for efficiency, and another thing also has a call on efficiency, and you have to decide which one is right for you.
Some rules of thumb. Design with your interface in mind. Start building your data model. What is the main window in your application? What data do you want to see there? Put that down in an entity. And then build out from there. Minimize duplication of data as much as you can. That's where normalization comes in. It's all about minimizing duplication of data for saving space, for preserving data integrity. You want to maximize locality data. That's where denormalization comes in. You've got more duplicated data, more risk, but sometimes it can be faster.
So, as some of you might have guessed, because I used it as an example earlier, we're going to work through our example using a data set involving a lot of information about songs. Songs ripped out of my personal iTunes library, actually. So, no criticism of my taste. As a first pass, because I'm building an application to manage songs, this is the root entity I've come up with. I thought, I want a window that can manage songs, so I'm going to stick everything on one entity.
What's that going to look like in the database? Because a lot of data modeling starts to make a lot more sense when you think about what the tables are going to look like, what the information in those tables is going to be. So our root table is going to be songs. I've added a few fields, added a few attributes: albums, artists, titles, song data.
And the first thing I'd like you to see when you look at this is the stuff down in this corner here. What you'll notice is that I've got multivalued entries in these tables. I've got multiple songs and multiple song data. If you ever see this in a relational database, it's generally a sign that somebody has done something wrong.
Multivalues are generally a bad idea because relational databases are all about searching. They're about being able to find field values very, very quickly. And by shoving multiple values into a single field, you interfere with that. Every time I want to look for the song "Vow" off the album "Garbage," I'm going to have to go off and do a substring search in that title field. And that's going to slow things down for no good reason. So I'm going to put, as my first step, all of those in separate rows.
Now I've got another problem, which is this block down here. And this should be one of your first instincts whenever you're looking at a table in a relational database. If you ever see duplicate data, this is a cue that there may be room to normalize. Get rid of duplicate information.
Why do you want to do this? Well, the first and most obvious reason is, well, it saves on space. Every time I duplicate that string, I've got another copy of it sitting down in the database. I've got another copy of it pulled up into the Core Data object layer.
For, you know, the string garbage, that may not be important, but if I've got the entire text of my thesis, that becomes more of a problem if it's in 10 different places, 20 different places. There's also a data integrity issue. And what do I mean by that? Well, if I've got, in the case of my example, garbage in I'm going to show you how to do that.
I'm going to show you how to do that in a few minutes. I'm going to show you how to do that in a few minutes. You'll think, "Yeah, why would that happen?" Well, iTunes is actually a great example for this. There's a certain subset of people who like to go off and fill CDDB with random garbage just for fun.
So if I download some corrupt data from CDDB, I'm going to have to go off and change it everywhere. That's not a huge issue with a model the size of the one we're going to be talking about, but most models really have more than two, three, four entities.
Some of them get up to 50, 100, 200 entities. And it can be hard to track where all of that duplicate information is unless you normalize it out and keep it in one spot. So how do you do that? What does that really mean? Well, it means you make a separate table.
And get rid of all the duplicates. But at this point, you're probably thinking, but you've lost something there. The previous table, it was very easy to tell what artist and what song, or what artist and what album a given song was from. You'd just follow across the row, and there it was.
But we've lost that relationship information here. It doesn't exist anymore. So how do we get that back?
[Transcript missing]
Pessimal choices would be strings, because you've got a number of, there's actually a number of reasons for which strings are bad. You've got representation issues, EBCD, ASCII, you've got Unicode issues, normalization issues. Floats are bad because, well, they've also got representation issues. And byte arrays, well, they have some of the same problems with strings where you have to compare every single byte in the string to determine if two byte arrays are equal.
We will laugh at you if you basically send us a problem and are doing this, and not in a Core Data app, but in a non-Core Data app. Don't pick database internal values that are not guaranteed to remain the same. Seriously. Because your relationships will all get messed up for reasons you're about to understand.
Relationships in a relational database are, describe the link between two tables, and they're basically, they're key-based. You take a primary key from one table and insert it into a database. You can imagine if the primary key on one table got changed because your database went off and did something behind your back, all of your relationships are now messed up. Don't do that.
So how do we use keys? How do keys really work? Well, I need to put a primary key on one of these tables. In this case, I'm going to put it on the albums table, because there's one album can have many songs, and it's easier to do it that way. So we'll just assign an integer, one per row on the table.
And we'll assign a foreign key column over in the other table, call it Album, and enter the primary key of the album that each song is on. And this way I can go from Album 5 over to the songs table and say, which songs are on Album 5, and find those very easily. So that's a relationship in a relational database. Melissa Turner And having said all of that, you can mostly forget about it because Core Data does all of this for you.
We assign primary keys, we do all of your foreign key handling, all of your relationship management, updates, inserts, we clean it up if you do deletes. For those of you who are familiar with printing out Core Data objects in GDB, you've probably seen how our relationships are represented by NS Managed Objects. You've got a fault and it has a relationship, it's just got a UID, and that's the UID of the object at the other end. the relationship.
So, where are we now? We had this originally. This was our original entity. We've turned it into this. Now we've got two entities and a relationship between them. You can see the relationship here has One arrow on one end and two arrows on the other end. That's the way of specifying that it's a one-to-many relationship.
So what next? Where are we going next? Well, I said when we first started this up, started The duplicate data was a bad thing. It was always a cue that you wanted to look at whatever it was you had and contemplate normalizing it. And we have more duplicate data here. This is where I talk about identity. Duplicate data is really only bad if it truly is duplicate. It's two copies of the same piece of information. But some information is duplicate because, well, there's two copies of different information that are coincidentally the same.
For example, it looks like I have two songs here whose title is Dare. But if I go off to Amazon.com and do a little bit of research, I'll find out that one of these, the one that's off the album D-side, is actually Dare Remix. Now, changing that, correcting it, doesn't automatically imply that I need to correct the other one. They're actually two separate, identity separate, pieces of information.
And if you have that kind of a case where you have information that's sort of coincidentally the same, that's not usually the best case for normalization. So we're just going to leave that the way it is. But there's other interesting stuff in this table, and that would be this column. It looks kind of obscure to you when you look at it, because, well, it's just a blob. It's actually the beginning of the MP3 that is the song data.
Why is that interesting? Well, because when I think of my interface, my songs interface that I want to display, It's not necessary for that interface. I don't need to load the MP3 data for a song in order to show somebody that they have that song in their library.
It's an excellent thing to break out. This is some other cues for when you want a normalization. Duplicates or to push out extra or expensive information. It will be data that's not used in the main interface or in a lot of cases blob data. Especially large blob data. What's large blob data? Well, Small blob data is something like a color, stuff that's down around a K. That's actually okay to have on the main table, might be worth considering pushing it out.
If you've got a medium-sized blob, you pretty much always want that to be on a separate table, especially if you don't always need it. If you've got a large blob, which is something over a megabyte, there is no good reason to have that on the main table. It's going to interfere with all of your searching.
Because when you think about how a database has to be represented, it's going to interfere with all of your searching. Each row is actually going to be a structure. It's going to be a record. And in order to look at any field in that record, I'm going to have to load the entire record.
And you can see where this is going, because if I'm going to be looking through a lot of different records in order to find, you know, the name of a song, and I've got the blob that is the song data sitting in that record, I'm actually going to have to load all of that song data in order to look at the song titles. Move that out.
Push that somewhere else. Make it easy for me to look at the titles of my songs by getting all of the information that I don't need, that the user is never going to search on, out of the main table. So I'm going to do that. We already know how the keys are going to be set up for this, so we don't need to see those.
Let's have a look at our model now. Now we've gone here. Song data is out. It's now on the destination of a one-to-one relationship. Things are getting cleaner. Of course, we're not done yet. We have a bunch of other attributes that I haven't even started to look at.
Like this column here, the lyrics for each song. Well, there's a few interesting things to note about this column, one of which is that, well, A lot of the fields are empty. And this brings us to nulls. Null in your average database is undefined. It is not equal to anything except itself, and it is not not equal to anything itself.
If I try and do a search that says title in dare, dare remix, and null, your average database is never going to return anything for that third hit. You have to have to do special queries to determine whether or not something is null. So having a null in a column is really an excellent sign that this is a good candidate for being normalized off the table.
In this particular case, I've kind of got a special problem because, well, when I envision the interface that I'm putting on my application, I see I've got, you know, the song here and the length here. And over here, I've got a little badge that tells me whether or not the song has lyrics. So the user can click on that and see the lyrics.
If I push that all off into a separate table, I'm going to have to go look at that other table in order to decide whether or not I want to put a badge there. This is where denormalization comes in. Denormalization is all about pulling data in, duplicating it in some way to improve my locality of reference.
So instead of having to go both to my main table and to a separate table to look at lyrics to find out whether I need to put a badge on, I can just look at the main table. Some common uses are for flags or for derived attributes where I may want to have multiple copies for performance reasons.
So let's do that here. Instead of having the lyrics stored directly on that table, I'm going to put just a badge there saying whether or not there's lyrics, and I'm going to have lyrics off in their entirely separate table like this. Gets the best of both worlds. I've got the normalization and move the data. I don't particularly need to search through all the time there. And I still have information about whether there is data or not. So where next? This is always an interesting thing. Text is always interesting.
Let's go look at our lyrics table in more detail. Let's try and imagine the data that could be in that table. Well, here's the stuff we saw originally. But there could be more in there. More with interesting characteristics. It's not all going to be the case I'm expecting all the time. Some of it may actually be in foreign languages.
Why is this important? Well, For a couple of reasons, because when your customers search, they're going to be searching in probably all lowercase, and they want hits to come back, whether you're camel case, lowercase, uppercase. They're not going to have to want to type in accents. They may not know how to type in accents because they may be using an English keyboard. And there's other reasons, like iHeart Unicode.
There's actually multiple representations for certain characters. E-accent aigu can be represented both as a single character and as a combining mark followed by an E. It's a combining accent aigu followed by an E. And if you're doing simple string matching in a database, those are going to compare differently, even though they represent the same character. This is especially bad if you're doing regex, because most regex engines are very, very literal. They'll take you at your word if you say you want single character E.
So when you're doing searching, and if you haven't pre-canonicalized your database, you may not get all the hits because there may be representation issues down in the database. So something that's usually a really good idea if you know you're going to be searching on string data like this, is to create a copy of it and strip it out, all of these little impurities and annoyances.
Move it down into a canonical form, strip out the diacritics, case fold it, in one of those lovely moments in computer science, normalize it, 'cause we can always reuse the same word for completely different concepts, right? So over here we see a canonicalized text column that contains the information that was in the other table in a form that's going to be much more easily searchable.
You're much more quickly searchable at that. Whenever I do the search, I'm just going to be able to strip information out of what the user's given me and pass that directly down to the database, rather than having to look at every individual row, transform the data there, and then do a comparison.
Melissa Turner Because over time that really builds up as you normalize and normalize and normalize and normalize and normalize those strings over and over and over again. Just do it once up front, never do it again. So instead of just having one attribute on our lyrics, we're going to have two. We're going to have text, and we're going to have canonical text.
So now we're going to go off and look at something else, other things that can be interesting. Artists. Our album has an artist field. But what is an artist? Let's go look at this table here. There's some interesting stuff in here. Namely, we've got some artists who are groups. I seem to have completely lost the thread of my presentation. I'm leaping ahead. We've got more duplicate data here. And we already know what to do with duplicate data. Break it out, move it into separate tables.
And I'm going to bring back the primary and foreign keys at this point, because I'm going to show you how a many-to-many relationship works in a database. And this is really kind of tricky unless you can actually see it happening. So I've now got my relationship set up.
One artist, many albums. Well, except my iTunes library at least has a lot of stuff that looks like this. I mean, the original data set was chosen specifically to fit on a slide, but I've got more room here so I can bring in some of my art, more interesting music.
Like, Albums that have multiple artists. But we already saw on the earlier slide that having multi-values is a bad idea. So what do we do? Well, we want to split those out, give it two separate rows. One for each artist. Now we've got a problem because we know we need to take our primary key and store it in the foreign key column on a separate table, but now we're back to multi-values because the artist's foreign key already has a value there.
Now what do I do? Well, this is where more primary keys come in. Add one to this table. Break that artist primary key off, put it in its own separate table, Copy the artist foreign keys over there. Copy the album keys here. That's how a many-to-many relationship looks. You've actually got three tables at play.
And you can figure out which songs are on, or which artists put out which albums by following from He's over here. From the artist table, over to the correlation table, and off from the correlation table to the albums. So whenever you do a query in Core Data that goes across a many-to-many relationship, we actually have to write SQL to handle this in the background. That's going to be a little bit slower than just looking up an attribute on a single table.
So what have we done now? Well, we had our artists, and now we've broken it out into a separate table. But like I said earlier, there's many right answers. There's no particular reason that that's a better way, that this here is a better way, better design of your model, than this is. In some ways, this actually makes more sense. Artists are more likely to collaborate on individual songs than they are to collaborate on entire albums.
So like I said, there's a play. Depends on how you envision your data, how you want your users to interact with it. And now we're back to where I jumped ahead to. But artists are kind of interesting in another way. Because if you look at this table, you'll see that we've got two kinds of artists here. We've got groups, and we've got individuals.
Individuals tend to have things like birthdays that groups really don't have, although they've got names. If you look at groups, well, they tend to have formation dates and members in addition to their names. And if you're familiar with object modeling, you've probably immediately thought, well, I want my object model to look something like this.
I'm going to put the name up there on my parent entity and have two sub-entities come, or subclasses coming off it. This is a pretty good instinct. Entity relational modeling can have inheritance in it as well. It's very similar to object inheritance. Although something important to notice is that it does not need to reflect your object inheritance hierarchy. Just because your objects in your model in memory are in an inheritance hierarchy, the data in the database does not need to be stored that way. You can actually have two entirely separate inheritance hierarchies.
Core Data, there's multiple ways to represent inheritance in a database, and Core Data has chosen to do single table inheritance, which means that for given inheritance hierarchy, all of the entities in that hierarchy will be stored in a single table. This is actually most efficient for doing searches which could potentially return values. An invaluable session for developers new to Core Data and for experienced Core Data developers looking to get the best performance out of their application.
Melissa Turner Got my artists, and let's add those extra fields, and let's populate them with some data. And immediately you're going to see something we discussed earlier. Nulls. We've got empty fields in the database, up here and down here. And in this particular case, those fields can never have data in them. A group is never going to have a birthday, an artist is never going to have a formation date.
So nulls are bad and may lead to normalization, but on the other hand, inheritance works this way. Where do you balance it? Well, a good way to tell how you should be balancing it is by looking at how your entities, how your instances are going to be used in your model.
In this particular case, I've got the artist's relationship, which is going to want to return a heterogeneous collection. Artist is going to want to return both songs and individuals. I'm not going to need to care or want to care. I'm going to go ahead and add that inheritance hierarchy to my model.
But, you know, there's potential for other things that could have shared information, like I could go through and add a notes field to all of my entities. And, well, it's a notes field. It's going to be serving the exact same purpose in every single one of those entities.
But really, it doesn't make sense because apart from the fact that it's notes, it's called notes. There's no shared behavior or shared real state across all of the disparate entities in my model. And most importantly, there's no place in the model where it's important to return instances of all of those entities at the same time.
So I'm just going to leave these and leave with a final thought, which is that if you ever realize that you've built a model where everything is in one inheritance hierarchy, that means you're putting all of your data in a single table. And that means you've probably done something wrong, unless you've only got one entity, if you've only got one table.
That's not good. So that actually covers most of what you need to know, the basics of data modeling. From here on in, it becomes just a lot of trial and error and trying to develop instincts for, you know, what's good, what's not so good, when you need to use normalization, when you need to use denormalization, and how to best balance the competing needs for your application. There's a few other things that are down in the database layer that we're going to talk about.
First of which is indices. An index, as you might guess from the name, is a quick way to look up information. Sort of like an index in a book. Tradeoffs with setting up -- you set up an index either on a column or a collection of columns. And you can -- from the index, you can basically look at a value, find a value in that, and quickly move to the row in the table that's associated with that value. There's tradeoffs, both in speed and in space.
The first tradeoff is that Indexes aren't free. You have to actually maintain them when you change data in the table. It's no good if my index contains stale information. So every time I change a row in the table, every time I add a row, every time I delete a row, I'm going to have to go off and update the index.
If I've got an application that's used 95% of the time for writing data to the database, I don't need an index. I probably don't want to spend the time updating my indexes regularly. And there's also a size tradeoff. Indexes don't come for free space-wise. I need to store them somewhere, and that's on disk. It's going to increase the size of your database. If I've got an application that's being used 95% of the time to do reads and the customer is constantly querying and searching on, that may be a worthwhile tradeoff. It probably is a worthwhile tradeoff.
It will really speed up their data access, but it will increase the size of your file. So, again, you have the tension. Size, speed, customer performance. One thing to note in SQLite is that the indices are binary only, so you don't get fancy substring searching or text comparisons. It's going to be purely equal to, greater than, or less than.
How do you decide or how do you tell Core Data that you want an index on a particular column? Well, in the modeling tool, it's up here in the model inspector. In the attribute inspector, you can just go up and there's a field that allows you to tick it. And if you check that checkbox, Core Data will create an index on that column for you.
Relationships. Well, you've seen what they look like in the database. And that may have made you think, and I have alluded to the fact, that there can be performance issues with relationships. If you're doing a lot of searching across them, or if you're deleting an object, for example. Deletion is a really excellent example.
We've got primary keys living on one table and foreign keys living on another table. In order to delete an object on the source table, the one with the primary key, I'm going to have to go through and clean up all references in the foreign key column in the other table. This is why Core Data has to load managed objects in order to propagate deletes from an object being deleted to an object it's related to. We need to do that cleanup, and the only way we can do that is by loading objects into memory.
You can do unidirectional relationships. If you try this, the modeling tool will warn you. It's kind of an advanced feature. And be warned, if you try and do this, there are some very specific cases in which it makes sense, but you have to be really, really careful because we won't know how to go clean up the information at the other end of that relationship. And that introduces the possibility of a data integrity problem in your database. If you don't do the cleanup, then at some point you may load an object, try and follow a relationship, and discover that the object at the other end of that relationship isn't there anymore.
Other things to think about. Data set management. Try and minimize the number of objects you're having to load into memory. Minimize your working set. Think about your where clause ordering. It's sort of related to data set management, only data sets in the database. What do I mean by that? Well, here's a hybridized, here's pseudocode for a select.
I want to select songs where the lyrics text contains you and the title is like dare, case and die critic insensitive, and the rating is equal to five. If I just pass that down to the database the way it is, what happens? Well, here's my three predicates. And let's look at the join.
First, I'm going to go through and look at my lyrics table. And I'm going to go look for all lyrics that contain the word you or contain the substring you. I'm going to find four of them. And then I'm going to go join those with the songs table.
And then I'm going to go look for all the lyrics that contain the word you or contain the substring you. and find all of the songs whose title is like dare, which is two songs. And then I'm going to go off and look at the ratings and find out that there's only one song It has you in the lyrics, title is Dare, and the rating is 5.
That doesn't look too bad, but really it's kind of completely nasty, because I've front-loaded all of the really heavy work. I've done a regex table scan across the lyrics table as my first operation. I've done substring matching across my second operation, and I've done my integer comparisons last. Let's invert those for a second.
Let's do the ratings first. There's only two potential songs I'm ever going to have to look at, because there's only two songs with rating five. Well, only one of those songs has the title there. At which point, I only have to do one regex search to discover whether or not the lyrics meet my criteria. This is going to be a much faster responding application than one going the other way, because, well, the first one has to do a lot more work.
And that's something to think about when you're building your where clauses. Minimize the amount of work you can do by front-loading your predicates with integer comparisons, particularly attributes that are on the source table. And only after you've passed through that, start following relationships and doing heavier text comparisons. Your users will thank you.
[Transcript missing]
There's no particular reason that just because you use one technology, you're bound to do everything in that technology. Pick the best of both worlds. Let the database do what it's good at. Let another technology do what it's good at. Hybridize. Store object URIs in whatever the external store is, or URIs out to paths in the file system. Let everybody be happy. Let everybody be efficient.
Sometimes you've got data coming from multiple places. You can actually have one model with many configurations. The configurations are set on the panel that I believe has the wrench in the modeling tool. This allows you to direct specific entities to specific stores. For example, you can have some public data in one store, private data in another store. Again, you end up using URIs to do cross-store relationships, but it allows you to partition your data sometimes more efficiently and allows you to use multiple stores with relationships between them.
Once you've got everything set up, you need to find out if you've got the best performance, you know, find out if there's any performance gotchas and widgets. Miguel talked in his last session, in the last session, which unfortunately you've missed and will need to catch on video when it's released, about debugging your application and using instruments to determine, you know, if you're hitting the database too often or not enough, or if you're loading too much data.
Here's where you would start. Turn on the com.apple/cordea/sqldebug flag that will tell you when you're hitting the database, what kind of queries you're doing against the database, how much data you're pulling back and how long it's taken. That'll allow you sort of an entree into figuring out, you know, what you might be able to do to tweak things.
If you realize you're loading lots and lots of one record, or if you're loading one record as the result of a lot of queries, you may be able to batch that. There's also the core data instruments in instruments that allow you to get a sense of when you're hitting the database, where you're spending all of your time in your application.
So we've gone through how model design affects application performance, or at least I've told you that it does, and you'll have to take my word for it and play with it a little bit. We've talked about normalization and denormalization, competing strategies that sort of allow you to move data around in your data model, try and figure out how to best make your data model suit the needs of your application. I'll give you a few tips about things that you can also do to help improve performance that aren't directly related to the schema, but That are down at the database layer as well.
And I believe at this point I go off into the, for more information, talk to these people. Michael Jurowicz, who is the developer tools evangelist. There's the Cocoa Dev mailing list is an excellent place to get answers about technical questions. And talk to us people. If there's something you don't understand in our documentation, if you find a bug in our technology, if there's just a feature you really, really want, I think that would be really cool in Core Data. Melissa Turner We can't read minds, especially we can't read minds of people we don't know at a distance. bugreporter.apple.com. Send us radar reports, send us emails, let us know what you're thinking.
A great place to start learning about Core Data, get lots and lots of details, is the Core Data documentation on developer.apple.com. Wikipedia actually has an excellent article on relational databases and some links off to things like normalization and denormalization. It will give you a bit more depth, a bit more understanding, help you go further in the field than I can give you in a one hour long session. And if you're going off and having to write your own SQL database, for whatever reason Core Data doesn't work for you, you need cross platform.
You need to do cross-platform solution or whatever have you. I highly recommend the SQL Practical Guide for Developers by Donahue and Spiegel. It's a great hands-on book full of lots of examples about what SQL looks like for doing all kinds of queries. It's been immensely useful over the years.
As I said earlier, you've kind of missed our other sessions. You'll have to catch them on video. We had a what's new in Core Data for Snow Leopard and Core Data tips and tricks. But what you haven't missed is the lab tomorrow. If you're a Core Data developer and you've got questions, if you're thinking about using Core Data, you've got questions, we'll be in the OS X lab tomorrow, from two until six, if I remember correctly, before we run off and get drunk. And we'll be happy to answer your questions. The entire Core Data team will be there. Come see us.