WebObjects: EOF Caching and Synchronization - WWDC 2000

Tools • 35:11

This session provides a detailed discussion of database snapshot management for conflict detection and caching, including the balance between efficiency and data freshness in refreshing and synchronization across sessions or multiple application instances.

Speaker: Daniel Abrams

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good afternoon everyone, welcome back. You have a good lunch? Yeah, I see a lot of empty seats. That usually means either lunch was really good or really bad. But we have here the session that was hinted at several times yesterday. How many of you remember the EJB session?

How many of you have digested all of it yet? A handful of you, yeah. So you might also think about retitling this session Caching In on Synchronization, because as you learned yesterday, one of the real advantages of WebObjects has is incredibly powerful mechanisms for caching and synchronizing data across multiple object stores running on multiple systems, not just the sort of heavyweight "who am I, here am I" kind of things you get with EJB. And to tell you all about that, here is Daniel Abrams. Please give him a warm welcome.

Thanks. It's really great to see everyone. I'm frankly amazed at how many WebObjects people were getting into these sessions and a little intimidated too, but we'll go ahead. Like Ernie said, the name of this session is Caching and Synchronization. And at a very high level, it's a very simple concept.

Caching in the sense of grabbing data from an external data store like a relational database and displaying it to users. And the real issue is how fresh that data is that your users are seeing and a balance between hitting that external store and caching that data so that you're not hitting it too often.

But as a consequence, users might not be seeing the freshest data. And synchronization, which is taking changes that your users make to objects and applying them back to the database in an orderly, organized way. So we'll jump right into it and get started. I want to divide the presentation into two parts.

I want to divide the presentation into essentially three parts. The first part, at a high level, we'll use the slides to go over some of the caching and synchronization issues and solutions. We'll jump into a little demo app that I've prepared to demonstrate some of these things. And then we'll get into the Q&A and hopefully get a constructive discussion going about how to deal with some of these things.

Before we get started, I did want to sort of get a show of hands just to see the level we should go at. How many people here have a good sense of what an EOFetch specification is, how to deal with it, have used that before? Okay. So maybe half, I would say. Which means that we're probably going to lose some of you, at least in the PowerPoint part of the presentation. And hopefully we'll bring you back in with the demo. But let's get right to it.

So I'm Daniel Abrams. I work out in the field as a consulting engineer and I've been building web applications with WebObjects for about three years now, large scale, small scale, all sorts of different types. So I've run into these issues over and over again and I think I have a pretty good and broad perspective on how to fix these things.

And I think if you've built WebObjects applications out in the field at all, you've probably run into these issues as well. So this is what I want to cover. I want to go over the default WebObjects EOF deployment scenario that is out of the box when you go to deploy a WebObjects application. What does that look like and what effect does that have on caching and synchronization?

I want to go over fetching and snapshotting. Snapshotting is sort of at the heart of the synchronization and caching issues, committing and receiving changes and coordinating updates. So this is what the default deployment scenario looks like. You essentially have a whole bunch of client browsers out in the real world. You have a bunch of client browsers out in the real world. You could have one or more web servers and WebObjects adapters, but let's take the simple situation where you have one.

And it's very likely that if you have any sort of volume at all coming into your application, you have multiple application instances. So if you look at this slide, you'll see we have instance one and instance two. And within each instance, we have multiple editing contexts sitting on top of a shared EOF stack. So that's essentially what you get out of the box with WebObjects without making any changes to the the default deployment scenario.

So I wanted to spend a little time talking about snapshots because I think that if you understand how snapshots work, how broadcasts occur and what's going on behind the scenes with your data, then you can figure out any given caching or synchronization issue. There's the master repository for coordinating all data retrieval and updates. And by default, in a single instance, you have editing context sharing a set of snapshots.

So these are some of the issues that when I go in and I see clients who have built WebObjects applications, they talk about or they say, "Why is it that I'm not seeing the freshest data? Or why am I seeing the application hit the database, but my users aren't seeing that data show up?" So let's talk a little bit about how fetching occurs in WebObjects.

So one thing before we get into this that I wanted to go over was that when you do have multiple application instances in a deployment scenario, by default, they're not communicating with each other at all. So this leads into the first level of complexity of what would seem to be a relatively simple situation. We're about to see a number of behaviors where within a single application instance, editing contexts are communicating with one another because they share a shared stack and a shared set of snapshots. But when we're in multiple instances, there's no communication going on whatsoever between those instances.

So if an editing context in one instance makes an update to the database, the editing contexts that sit in the other instance in no way recognize that. And it's essentially the equivalent of another application going in and making a change to the database. There's no real difference from within the instance.

So let's talk about a fetch within a single application instance. For those who don't know, for those who weren't familiar with the concept of fetch specifications, in EOF, you essentially can programmatically construct a fetch specification. And there's a series of parameters. When we go over the demo app, I'll show you how you do that.

That allows you to query a database, get back a group of objects, and display them to the user, if you want to. So after you've constructed this fetch specification and triggered a fetch, we'll obviously query the database to get the data. We'll snapshot that object within that shared stack.

So even though a given editing context, in this case, editing context one, has triggered a fetch, it's actually snapshotted in a shared location. So editing context two and editing context three will eventually become aware of that object, as we'll see, as they either fetch it or manipulate it. And finally, we create an object and pull it into the first editing context.

So now I want to talk about what happens when an additional editing context attempts to display the same object. So in this particular case, we have editing context to initiating a fetch on the same object, an object with global ID 1. Global ID is just EOF's way of displaying and packaging up primary keys.

So it's just a way of uniquely identifying an object. So in this case, we do the exact same thing. Editing context to triggers a fetch on an object with global ID 1. We query the database. And what's significant here is that when that data comes back, we actually ignore any updates to the data.

So If an external instance or even some other external application has changed that data and we simply do a fetch, the out of the box behavior is that you're not going to see it. So this is the first thing that actually throws people off. They're not aware of this. They construct a fetch spec, they fetch their data, and they don't see updates, but they do see the application hitting the database. And I'll show you that in the demo. But it's important to be aware that that's the default behavior.

So we've ignored updates. We create an object in Editing Context 2, but that object is actually created off the snapshot because we haven't specified otherwise. So I want to talk about what you can do to change that situation if you want to. Let's say that you always want users to have fresh data or you want users to have fresh data in this particular case. What are some of the things that you can do to make that happen?

Well, one of the things you can do is on your fetch specs you can use a method called Refreshes Refetched Objects. And if you do that, when you query the database and the data comes back, it updates the snapshot. So in this case, Editing Context 3 has triggered a fetch and they've set that flag for Refetched Objects to on. So we query the database and we can see that the snapshot gets updated in that shared stack.

Now, there's an additional wrinkle when this happens. When we have an update to a snapshot, all of the editing contexts that share that stack are referred to as Peer Editing Context, receive that change, receive a broadcast of that change. So, we can see that when this snapshot gets updated, we broadcast out to those editing contexts that are peers that already have an instance of that object. And then we pull that object into Editing Context 3.

Now, we'll get into more detail later what happens if Editing Context 1 or Editing Context 2 have made changes to that object before that broadcast has occurred. And what actually happens is you get a merge. And we'll talk about that more when we get it more in detail. So, we're going to go into synchronization.

There's an additional wrinkle I wanted to talk about, which is specific to 4.5. So, 4.5 has added a number of different ways that you can manipulate the way objects are refreshed or updated. And they sort of add an additional level of flexibility to what you can do, but also an additional layer of complexity that you have to be aware of. So, in 4.5, if Editing Context 3 were to trigger a fetch, this slide shows how the query database and check timestamp are actually inverted. So, what will happen is Editing Context 3 triggers a fetch. We will check the timestamp.

If that timestamp has expired, we'll actually go back to the database. We'll do a query. We'll update the snapshot. And again, because the snapshot has been updated, we'll broadcast those changes So I wanted to talk about one of the alternative methods that you can use to update snapshots, and that's invalidating objects.

So I've added a little bit of complexity here to the diagram. What you see in each of the editing contexts is the object that we were talking about before, and it now has a too many relationship, because relationships are actually treated a little bit differently. And again, we're sort of layering complexity on top of what we were doing before.

So one thing that happens when you do a fetch with refresh turned on, you actually don't update that object's too many relationships. So you might see changes to that object, but you're not going to see changes to the too many relationship or the other relationships that it has, even 2-1 relationships.

So I want to spend a little time dealing with both alternative ways of invalidating as well as methods for dealing with updating relationships. So in this particular case, we have editing context one invalidating the given object that we were talking about. And there's two ways to do invalidation.

One is on an individual object, and one is to invalidate every object in either the editing context or within the stack. We'll deal with individually first, and then we'll move on to invalidating globally. So when editing context one triggers this invalidation, we refault the object. We refault its too many relationships, but we preserve its 2-1 relationship. Thank you.

When editing context one trips that fault, we see a series of actions occur as a result. We first query the database against that object, so we now have a fault for that object and it specifically queries for that particular object. We update the snapshot and update that object, and then we broadcast those changes out because the snapshot has been updated. Now there's one additional wrinkle here, which is that the broadcast actually occurs a little differently than it does when you have refreshes turned on.

In the case of refresh, when you broadcast out in editing context two or editing context three, if you had a dirty object, that is an object that someone has modified, that broadcast would merge in those changes. When you invalidate an object, by default, if you don't do anything else, it will overwrite those changes and the users in the other editing contexts will lose their changes. So in some ways, invalidating is a more powerful tool, but in some ways you have to do it in a different way. So in some ways, invalidating is a more powerful tool, but in some ways you have to do it differently.

So in some ways, invalidating is a more powerful tool, but in some ways, invalidating is a more powerful tool, but in some ways, invalidating is a more powerful tool, but in some ways, you have to be careful because you could potentially overwrite other users' changes. So right now we have this object with global ID 1 pulled into each of the editing contexts, but we still have the too many relationship faulted, and I want to go over what happens when that fault is tripped as well.

So when the too many relationship fault is stripped, we will create a database for that too many relationship, just like it did the first time, but it will actually discard the changes. So you will not see any updates to the too many relationship when you invalidate an object like that, and then it will pull that relationship in from the existing snapshot.

So invalidating objects individually is an effective way to update the given object, but it will not work for updating changes to a too many relationship. And actually, as an additional wrinkle, it will query the database against that relationship. So I'll show you the demo app, and users are sometimes confused because they see that query occur, but you don't see changes.

So finally, the most drastic thing you can do is invalidate all the objects. When you invalidate all the objects, essentially everything, either in the editing context or the shared stack, is refaulted. So every object is refaulted, every relationship is refaulted. Every time you trip one of those faults, you're going to have a new query. Every snapshot is updated. All those changes are broadcast, including too many.

So when you invalidate all the objects, you will update the TooMany's that's an effective way of pulling in new data through your TooMany's, but there's a lot of really significant issues associated with invalidating all objects. One is that it's very expensive, period, to try and pull every single object as you trip them back into the database. And two is, it can actually be more expensive than your original queries.

So if you pull in a bunch of objects into editing context through a series of queries, you pull those objects in in sets at a time, right? So you might pull in objects five or ten at a time. Unless you recreate every single one of those original queries when you have to even validate it all, it's going to trigger a fetch individually on each of those objects as you trip the faults.

The other issue is that it will wipe out any changes to any editing context that share those objects. So if editing context one invalidates all objects and editing context two, editing context three happen to be making changes to those objects or deleted those objects but haven't committed those changes, those changes will be lost.

So, One user will have the freshest data, but another user may simply lose data behind the scenes without really realizing what's going on. So you have to be very careful when you do that to ensure that your users don't lose data. And then the other thing to be aware of is that with every single one of these mechanisms, when you actually go to deploy an application, a user may end up on one instance or multiple instances.

Actually, a better way to say that is that users may end up on a shared instance or they may end up across application instances. So if they end up on a shared application instance and you're doing things like updating fetches with refresh or invalidating objects, then those two users who are editing the same object are going to see changes as a result of those broadcasts. If by random chance they happen to end up on two different application instances, even if your code is exactly the same, they're going to see a different set of behavior changes.

So potentially users are going to be very confused unless you're very careful about the way you're doing this because from their perspective, they're doing the exact same thing. But from the perspective of the WebObjects application instances that are running, they're either not communicating with each other or they aren't, just depending on where those users ended up.

You particularly start to get into these issues when you talk about coordinating changes and different users having the ability to edit the same object at the same time. I want to start going over some of those things, talk a bit about the locking behavior in EOF and how it works, explain why sometimes users see that locking and sometimes they don't, and explain why relationships can change.

So let's talk about committing changes within a single application instance. So what we're looking at here is editing context one, modifying and committing changes to an object. It has a too many relationship. We won't worry about that right now. And as you can see, all of the other objects are in line with what's in the snapshot.

So what I mean is in editing context two, you can see that there haven't been any changes to the object. The data in editing context two is the same data that's in the snapshot and the same with editing context three. So right now, the only user who's committed changes to an object is in editing context one.

He modifies the object and commits. He goes to save to the database. So we see an update to the database. The snapshot is updated. And again, every time the snapshot is updated, we're going to broadcast out those changes to other editing contexts, that is, other users who are sharing a stack.

So, okay, good. Before it was getting cut off, but I think it's okay. So in this particular case, I want to talk about what happens when two users within the same application instance modify and attempt to commit changes to an object at essentially the same time. So, editing context one and edit context two modify an object. So, editing context one and editing context two are now out of sync with what's in the snapshot. They haven't committed their changes, but they're carrying around locally dirty versions of this object with global ID 1.

Editing Context 1 goes to commit its change. We update the database, checking to make sure that we don't have a locking failure, which in this case we don't. The snapshot was in sync with what was in the database. The snapshot is then updated and the changes are broadcast out. You notice Editing Context 3 has received that broadcast, while Editing Context 2 receives that broadcast, but essentially maintains its own changes. We'll attempt to merge in those changes and where there's discrepancies, Editing Context 2 will reapply the changes that it's already made and maintain those changes.

And the other thing to note is that right now the snapshot is in sync with what's in the database. So, Editing Context 1 has committed a change. We updated the database. We updated the snapshot. So, those two are in sync. And that's important because this is essentially EOF's mechanism for doing optimistic locking. Right? So, what happens when you go to save a change is we compare what's in the snapshot with what's in the database.

And if they're out of sync, we have an optimistic locking failure. And as long as those two are in sync, we're not going to get an optimistic locking failure and we're going to be allowed to update those changes. So, let's look at what happens when Editing Context 2 goes to commit its changes.

The database is updated because, like I said, the snapshot was in sync with what's in the database and we broadcast those changes out to the other objects. So the important thing to note here is that the out-of-the-box behavior is even when you have an attribute on a given object marked for locking, within the same application instance they share the same stack by default, they share the same set of snapshots, so you're not going to see one editing context that's a pure of another attempt to lock against each other.

Now I want to talk about the exact same behavior within multiple editing contexts. So in this case we have editing context one and editing context three modifying an object. You can see that they're in different application instances and attempting to commit those changes. So, editing context one and three, modify the object, and you can see that.

Editing Context 1 updates the database and we lock against the snapshot. So in this particular case, the database is in sync with the snapshot. We don't have any problems with locking. The snapshot is updated, and we broadcast out to the other shared instances. So editing context 2 is now aware of the fact that we've committed this update to the database, whereas editing context 3 and editing context 4 are not.

And just to be perfectly clear, from a user's perspective, there's really no difference. They could have ended up in the first application instance, and they could have ended up in the second. They could be editing context 2 or 3 or 4. They really don't know what editing context they're going in the beginning.

So let's see what happens when editing context three commits changes to the database. We go to lock against the snapshot and in this case we fail, right, because we have a snapshot that has been updated, or rather a database that has been updated since the last time we've updated the snapshot. So editing context one, or rather application instance one, updated the database. Editing context two goes to update that database and it's going to fail with an optimistic locking failure.

So I wanted to go right to the demo and show you some of these behaviors. There's actually some additional wrinkles that come into play when you're doing this in a real world situation. So from a high level, or from an application instance level, this is exactly what happens. But because the web is a stateless medium, there's some additional wrinkles that are introduced when you have a web browser that has a chance to get out of sync with what's actually in your application. So if we could cut over to the demo machine, that'd be good.

So the demo is about as simple as you can get. This is the EO model for the demo. You can see we have an object called movie here with a too many relationship to role and a 2-1 relationship to studio. And essentially I've constructed a demo that

[Transcript missing]

So these two different sessions, and I'm going to start to make some changes to the objects. So I've done a fetch, and I've pulled all the objects into each of the associated editing contexts.

Right now they're on the same application instance, and I'm going to make a change. So I'm going to go behind the scenes and I'm going to edit one of the objects directly. So I'm going to edit the movie's description and change this from labor union history, let's say, to labor union movie.

So right now you would expect that if I did a fetch, I probably wouldn't see that change according to what I told you. So let's go ahead and do that. And before we do that, I just want to pull up what's going on so you can see. So we do a fetch against those movies. You can see that We've hit the database. We've actually pulled back all three movies, but we don't see that change, right?

So I want to do the same thing, but this time I'll do a fetch and refresh the snapshot. So according to what I've told you, if you fetch and refresh the snapshot, you should pull back from the database, update that snapshot, broadcast out to the other instances, and you should see the change. So let's do that. Sure enough, we see the change. But I want to introduce an additional wrinkle. So let's actually go to the other application instance and let's do something similar.

Let's do fetch and refresh, and you would expect that you might see the change, right? But you're not. So in this particular case, we're still seeing labor union movie, whereas in this particular case, we see the old value, labor union history. And if you go into the console, You can see that sure enough, I'll demonstrate it just to be absolutely sure, that when we go in and fetch, sure enough, we're hitting the database, but we're not seeing any updates. So what's going on here? Well, what's going on is that When this first session went and refreshed the snapshot, it broadcast out those changes to all the other instances. So it pulled that data into its own editing context, so we saw the change.

It then broadcast that change out to the editing context that's sitting on the server that is represented by this particular session. But when this session went back to do a fetch and refresh, it synchronized the bindings. So it took the values, labor union history, that was saved within this overview, compared it to the values it had, which had been broadcast out from the other object, and noticed they were different, and assumed that this particular user was actually making changes. So from this user's perspective, he hasn't made any changes at all. And not only has he not made any changes at all, but he's committed an action that you would think, and that he would think, would send him to... to the latest data.

But in fact, it hasn't done that at all, and in fact, committing this action has cemented this older version of the object right back into the editing context. So if we were to hit save changes, which does nothing more than an editing context save changes, so it merely saves unchanged objects within his editing context, it will actually update the database at this point. So I'll go ahead and do that.

So we can see that it's updated the database. So without either user being aware of it, we've actually managed to overwrite the commit that the first user's done and replace it with an older value. So this isn't even a case where two users are attempting to edit the same data, but there's still one user fighting against another and overwriting the data.

And what's actually sort of interesting, I want to do the same thing, but rather than fetch and refresh the snapshot, which in this case is a button which submits the form, I want to follow this hyperlink that says "Do Nothing," which is essentially a no-op action. It simply returns the same page. But before I do that, I want to clear out all the changes. So in both cases, I'll invalidate all the objects. So we're completely up to date right now.

I will commit to change the database from the back end. So we'll change this back to labor union. I'll just get rid of the word altogether. So we're up to date in that. In this case, we'll fetch and refresh snapshots exactly like we did before, and we can see that it's gone. But in this case, we're going to follow the do-nothing hyperlink.

And you can see we're up to date. So what's the difference here? The difference is that when we follow a hyperlink, we don't actually submit any of the data that's in the form. We don't update those bindings. And so we see the update that's been broadcast out to us in the editing context.

So the other thing I want to demonstrate is a very similar behavior, but with regards to too many relationships. So let's clear everything out again. This time I'm going to make a change to an EO, but I'm going to make the change directly to one of the relationships.

So I've now made a change to this role right here. And in this particular case, why don't we start by doing a fetch? So I'll do a fetch, and you can see that we don't see the change to the relationship, which is probably what you'd expect. And if we go to the console, you can see that even though we don't see that change, we've actually gone out and hit the database again.

So let's do the same thing, but let's this time do a fetch and movies and refresh the snapshot. Well, what do we have in this case? We again go out and hit the database. We again pull back those three rows, but we again haven't seen that update. Okay, why don't we try invalidating that movie?

So we invalidate that movie. And again, we don't see that change. Let's look at what's going on behind the scenes. You can see that we do a fetch against the movie, so we pull back that particular movie, the one that we've invalidated. We've also refaulted its too many relationship, so we actually pull back those new roles right here, but you're still not seeing it. So the point is that even when you invalidate, you're not necessarily getting updates to a too many relationship. So this time, let's invalidate all.

So we do an invalidate all. You can see that we've actually gotten that update to that object. But I mean, we really have a flurry of database activity, right? We've hit the database once for every single movie, whereas before we were pulling back three movies at a time.

And that's because we're essentially iterating over an array with these movies in them. And rather than fetching, we're simply pulling all those movies back. And then we're also fetching back each of the relationships, so the too many and the too ones. But in this particular case, when we pull back the too many, we can see that it's updated.

So the other thing I wanted to show you was the difference in behavior between when you update within a single instance versus when you update within shared instances. So let's make a change. Actually, I think we're invalidated, but just to be sure, let's clear out the entire cache. Let's make a change.

So I have overview. I don't know if you noticed in the model, but I have overview designated as a locking attribute. So you would expect that if two different users make changes to an overview behind the scenes that you should lock on that attribute. So when they're within the same editing context, or sorry, when they have the same shared stack, but so you have editing context that share a stack and we commit saves.

We're just going to see that they can override each other. So let's take a look at that in the console. So we have two sets of updates right there, and we haven't detected any conflicts because the snapshots are always in sync with the database. So even though those particular attributes are marked for locking, we're able to update. But let's simulate the exact same behavior within separate editing contexts. So we'll start over. Pull the plug. Make a change behind the scenes.

I'll attempt to make a change. So this is, like I said, analogous to a user in another application instance committing a change to the database. And we'll also commit a change in this editing context and attempt to save it. And what you can see is that we've got an optimistic locking failure occurring. So the exact same behavior could result in the user seeing either an update to the data or a locking failure, depending on which application instance they end up in.

So there's actually a couple other things we could demo, but I think I'd rather just jump right to questions and then as people have questions, maybe we'll demo that behavior in there. So I'd like to bring Steve Miner and Eric Nui-Yao up to the stage. They're both part of the WebObjects engineering team and open it up for questions. First of all, a big hand for our presenters. Thank you very much.