Tools • 35:11
This session provides a detailed discussion of database snapshot management for conflict detection and caching, including the balance between efficiency and data freshness in refreshing and synchronization across sessions or multiple application instances.
Speaker: Daniel Abrams
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it may have transcription errors.
good afternoon everyone welcome back you have a good lunch yeah you see a lot of empty seats that usually means either lunch was really good or really bad But we have here the session that was hinted at several times yesterday. How many of you remember the EJB session?
How many of you have digested all of it yet? A handful of you, yeah. So you might also, think about retitling this session Caching In on Synchronization, because as you learned yesterday, one of the real advantages WebObjects has is incredibly powerful mechanisms for caching and synchronizing data across multiple object stores running on multiple systems, not just the sort of heavyweight, who am I, here am I kind of things you get with EJB. And to tell you all about that, here is Daniel Abrams. Please give him a warm welcome.
Thanks. It's really great to see everyone. I'm frankly amazed at how many WebObjects people were getting into these sessions, and a little intimidated too. But we'll go ahead. Like Ernie said, the name of this session is Caching and Synchronization. And at a very high level, it's a very simple concept. Caching in the sense of grabbing data from an external data store, like a relational database, and displaying it to users. And the real issue is how fresh that data is that your users are seeing, and a balance between hitting that external store and caching that data so that you're not hitting it too often. But as a consequence, users might not be seeing the freshest data.
And synchronization, which is taking changes that your users make to objects and applying them back to the database in an orderly, organized way. So we'll jump right into it and get started. I want to divide the presentation into essentially three parts. The first part, at a high level, we'll use the slides to go over some of the caching and synchronization issues and solutions.
We'll jump into a little demo app that I've prepared to demonstrate some of these things. And then we'll get into the Q&A and hopefully get a constructive discussion going about how to deal with some of these things. Before we get started, I did want to get a show of hands just to see the level we should go at. How many people here have a good sense of what NeoFetch classification is, how to deal with it, have used that before. Okay, so maybe half, I would say. Which means that we're probably going to lose some of you, at least in the PowerPoint part of the presentation, and hopefully we'll bring you back in with the demo. But let's get right to it.
So I'm Daniel Abrams. I work out in the field as a consulting engineer. And I've been building web applications with WebObjects for about three years now, large scale, small scale, all sorts of different types. So I've run into these issues over and over again. And I think I have a pretty good and broad perspective on how to fix these things. And I think if you've built WebObjects applications out in the field at all, you've probably run into these issues as well. So this is what I want to cover. I want to go over the default WebObjects EOF deployment scenario that is out of the box when you go to deploy a WebObjects application. What does that look like, and what effect does that have on caching and synchronization? I want to go over fetching and snapshotting.
Snapshotting is sort of at the heart of the synchronization and caching issues, committing and receiving changes and coordinating updates. So this is what the default deployment scenario looks like. You essentially have a whole bunch of client browsers out in the real world, you could have one or more web servers and web objects adapters. But let's take the simple situation where you have one. And it's very likely that if you have any sort of volume at all coming into your application, you have multiple application instances. So if you look at this slide, we'll see we have instance one and instance two. And within each instance, we have multiple editing contexts sitting on top of a shared EOF stack. So that's essentially what you get out of the box with web objects without making any changes to the default deployment scenario.
So I wanted to spend a little time talking about snapshots, because I think that if you understand how snapshots work, how broadcasts occur, and what's going on behind the scenes with your data, then you can figure out any given caching or synchronization issue. They're the master repository for coordinating all data retrieval and updates. And by default, in a single instance, you have editing context sharing a set of snapshots.
So these are some of the issues that when I go in and I see clients who have built WebObjects applications, they talk about or they say, why is it that I'm not seeing the freshest data? Or why am I seeing the application hit the database, but my users aren't seeing that data show up? So let's talk a little bit about how fetching occurs in WebObjects.
So one thing before we get into this that I wanted to go over was that when you do have multiple application instances in a deployment scenario, by default, they're not communicating with each other at all. So this leads into the first level of complexity of what would seem to be a relatively simple situation. We're about to see a number of behaviors where within a single application instance, editing contexts are communicating with one another because they share a shared stack and a shared set of snapshots. But when we're in multiple instances, There's no communication going on whatsoever between those instances.
So if an editing context in one instance makes an update to the database, the editing context that sit in the other instance in no way recognize that. And it's essentially the equivalent of another application going in and making a change to the database. There's no real difference from within the instance.
So let's talk about a fetch within a single application instance. For those who don't know, for those who weren't familiar with the concept of fetch specifications, in EOF, you essentially can programmatically construct a fetch specification. And there's a series of parameters. When we go over the demo app, I'll show you how you do that. That allows you to query a database, get back a group of objects, and display them to the user if you want to. So after you've constructed this fetch specification and trigger to fetch, obviously query the database to get the data, we'll snapshot that object within that shared stack. So even though a given editing context, in this case editing context one has triggered a fetch, it's actually snapshotted in a shared location. So editing context two and editing context three will eventually become aware of that object as we'll see as they either fetch it or manipulate it. And finally we create an object and pull into the first editing context.
So now I want to talk about what happens when an additional editing context attempts to display the same object. So in this particular case, we have editing context two initiating a fetch on the same object, an object with global ID one. Global ID is just EOF's way of displaying and packaging up primary keys. So it's just a way of uniquely identifying an object. So in this case, we do the exact same thing. We edit in context to triggers a fetch on an object with global ID 1. We query the database. What's significant here is that when that data comes back, we actually ignore any updates to the data.
If an external instance or even some other external application has changed that data and we simply do a fetch, the out of the box behavior is that you're not going to see it. So this is the first thing that actually throws people off. They're not aware of this. They construct a fetch spec, they fetch their data and they don't see updates but they do see the application hitting the database. And I'll show you that in the demo. But it's important to be aware that that's the default behavior.
So we've ignored updates. We create an object in editing context too, but that object is actually created off the snapshot because we haven't specified otherwise. So I want to talk about what you can do to change that situation if you want to. Let's say that you always want users to have fresh data, or you want users to have fresh data in this particular case. What are some of the things that you can do to make that happen? Well, one of the things you can do is on your fetch specs, you can use a method called refreshesRefetchedObjects. And if you do that, when you query the database and the data comes back, it updates the snapshot.
So in this case, editing context 3 has triggered a fetch, and they've set that flag for refreshing refetched objects to on. So we query the database, and we can see that the snapshot gets updated in that shared stack. Now, there's an additional wrinkle when this happens. When we have an update to a snapshot, all of the editing contexts that share that stack are referred to as pure editing context receive that change, receive a broadcast of that change. So we can see that when this snapshot gets updated, we broadcast out to those editing contexts that are peers that already have an instance of that object.
And then we pull that object into editing context three. Now we'll get into more detail later what happens if editing context one or editing context two have made changes to that object before that broadcast has occurred. And what actually happens is you get a merge. And we'll talk about that more when to get it more into synchronization. There's an additional wrinkle I wanted to talk about, which is specific to 4.5. So 4.5 has added a number of different ways that you can manipulate the way objects are refreshed or updated. And they sort of add an additional level of flexibility to what you can do, but also an additional layer of complexity that you have to be aware of. So in 4.5, if editing context 3 were to trigger a fetch, the query database and check timestamp are actually inverted. So what will happen is editing context three triggers a fetch. We will check the timestamp. If that timestamp has expired, we'll actually go back to the database, we'll do a query, we'll update the snapshot, and again, because the snapshot has been updated, we'll broadcast those changes out.
So I wanted to talk about one of the alternative methods that you can use to update snapshots, and that's invalidating objects. So I've added a little bit of complexity here to the diagram. What you see in each of the editing contexts is the object that we were talking about before, and it now has a too many relationship, because relationships are actually treated a little bit differently. And again, we're sort of layering complexity on top of what we were doing before. So one thing that happens when you do a fetch with refresh turned on, you actually don't update that object's too many relationships.
So you might see changes to that object, but you're not going to see changes to the too many relationship or the other relationships that it has, even 2-1 relationships. So I want to spend a little time dealing with both alternative ways of invalidating as well as methods for dealing with updating relationships.
So in this particular case, we have editing context one invalidating the given object that we were talking about. And there's two ways to do invalidation. One is on an individual object, and one is to invalidate every object in either the editing context or within the stack. We'll deal with individually first, and then we'll move on to invalidating globally. So when editing context one triggers this invalidation, we refault the object. We refault its too many relationships, but we preserve its too one relationship.
When editing context one trips that fault, we see a series of actions occur as a result. We first query the database against that object, so we now have a fault for that object and it specifically queries for that particular object. We update the snapshot and update that object, and then we broadcast those changes out because the snapshot has been updated. Now there's one additional wrinkle here, which is that the broadcast actually occurs a little differently than it does when you have refreshes turned on. In the case of refresh, when you broadcast out in editing context two or editing context three, if you had a dirty object, that is an object that someone has modified, that broadcast would merge in those changes. When you invalidate an object, by default, if you don't do anything else, it will overwrite those changes. And the users in the other editing context will lose their changes. So in some ways, invalidating is a more powerful tool. But in some ways, you have to be careful, because you could potentially overwrite other users' changes. So right now, we have this object with global ID 1 pulled into each of the editing contexts. But we still have the too many relationship faulted. And I want to go over what happens when that fault is tripped as well.
So when the too many relationship fault is stripped, we will create the database for that too many relationship, just like it did the first time. But it will actually discard the changes. So you will not see any updates to the too many relationship when you invalidate an object like that. And then it will pull that relationship in from the existing snapshot. So invalidating objects individually is an effective way to update the given object.
But it will not work for updating changes to a too many relationship. And actually, as an additional wrinkle, it will query the database against that relationship. So I'll show you the demo app. And users are sometimes confused because they see that query occur, but you don't see changes.
So finally, the most drastic thing you can do is invalidate all the objects. When you invalidate all the objects, essentially everything, either in the editing context or the shared stack, is refaulted. So every object is refaulted. Every relationship is refaulted. Every time you trip one of those faults, you're going to have a new query. Every snapshot is updated.
All those changes are broadcast, including too many. So when you invalidate all the objects, you will-- update the too-many's. That's an effective way of pulling in new data through your too-many's. But there's a lot of really significant issues associated with invalidating all objects. One is that it's a very expensive period to try and pull every single object as you trip them back into the database. And two is it can actually be more expensive than your original queries. So if you pull in a bunch of objects into editing context through a series of queries, you pull those objects in in sets at a time, right? So you might pull in objects five or ten at a time. And unless you recreate every single one of those original queries when you have to even validate it all, it's going to trigger a fetch individually on each of those objects as you trip the faults.
The other issue is that it will wipe out any changes to any editing context that share those objects. So if editing context one invalidates all objects and editing context two, editing context three happen to be making changes to those objects or deleted those objects but haven't committed those changes, those changes will be lost.
One user will have the freshest data, but another user may simply lose data behind the scenes without really realizing what's going on. So you have to be very careful when you do that to ensure that your users don't lose data. And then the other thing to be aware of is that with every single one of these mechanisms, when you actually go to deploy an application, a user may end up on one instance or multiple instances. Actually, a better way to say that is users may end up on a a shared instance or they may end up across application instances. If they end up on a shared application instance and you're doing things like updating fetches with refresh or invalidating objects, then those two users who are editing the same object are going to see changes as a result of those broadcasts. If by random chance they happen to end up on two different application instances, even if your code is exactly the same, they're going to see a different set of behaviors. the users are going to be very confused unless you're very careful about the way you're doing this because from their perspective, they're doing the exact same thing. But from the perspective of the WebObjects application instances that are running, they're either not communicating with each other or they are, just depending on where those users ended up.
And this, you particularly start to get into these issues when you talk about coordinating changes and different users having the ability to edit the same object at the same time. So I want to start going over some of those things, talk a bit about the locking behavior in EOF and how it works, explain why sometimes users see that locking and sometimes they don't, and explain why relationships can change.
So let's talk about committing changes within a single application instance. So what we're looking at here is editing context one, modifying and committing changes to an object. It has a too many relationship. We won't worry about that right now. And as you can see, all of the other objects are in line with what's in the snapshot. So what I mean is in editing context two, you can see that there haven't been any changes to the object. The data in editing context two is the same data that's in the snapshot and the same with editing context three. So right now, the only user who's committed changes to an object is in editing context one. He modifies the object and commits. He goes to save to the database. So we see an update to the database. The snapshot is updated. And again, every time the snapshot is updated, we're going to broadcast out those changes to other editing contexts, that is, other users who are sharing a stack.
So, okay, good. Before it was getting cut off, but I think it's okay. So in this particular case, I want to talk about what happens when two users within the same application instance modify and attempt to commit changes to an object at essentially the same time. So editing context one and edit context two modify an object. So editing context one and editing context two are now out of sync with what's in the snapshot. They haven't committed their changes, but they're carrying around locally dirty versions of this object with global ID one.
Editing context one goes to commit its change. We update the database, checking to make sure that we don't have a locking failure, which in this case we don't. The snapshot was in sync with what was in the database. The snapshot is then updated, and the changes are broadcast out. Now you notice editing context three has received that broadcast, while editing context two receives that broadcast, but essentially maintains its own changes. So we'll attempt to merge in those changes. And where there's discrepancies, editing context 2 will reapply the changes that it's already made and maintain those changes.
And the other thing to note is that right now, the snapshot is in sync with what's in the database. So, editing context one has committed a change. We updated the database. We updated the snapshot. So those two are in sync. And that's important because this is essentially EOS mechanism for doing optimistic locking. Right? So, what happens when you go to save a change is we compare what's in the snapshot with what's in the database. And if they're out of sync, we have an optimistic locking failure. And as long as those two are in sync, we're not going to get an optimistic locking failure and we're going to be allowed to update those changes. So let's look at what happens when editing context two goes to commit its changes.
The database is updated because, like I said, the snapshot was in sync with what's in the database and we broadcast those changes out to the other objects. So the important thing to note here is that the out-of-the-box behavior is even when you have an attribute on a given object marked for locking, within the same application instance they share the same stack by default, they share the same set of snapshots, so you're You're not going to see one editing context that's a pure of another attempt to lock against each other.
Now I want to talk about the exact same behavior within multiple editing contexts. So in this case we have editing context one and editing context three modifying an object. You can see that they're in different application instances and attempting to commit those changes. So editing context one and three, modify the object. You can see that.
Editing context one updates the database, and we lock against the snapshot. So in this particular case, the database is in sync with the snapshot. We don't have any problems with locking. The snapshot is updated, and we broadcast out to the other shared instances. So editing context two is now aware of the fact that we've committed this update to the database, whereas editing context three and editing context four are not. And just to be perfectly clear, from a user's perspective, there's really no difference. They could have ended up in the first application instance, and they could have ended up in the second. They could be editing context two or three or four. They really don't know what editing context they're going So let's see what happens when editing context three commits changes to the database. We go to lock against the snapshot. And in this case, we fail, right? Because we have a snapshot that has been updated-- or rather, a database that has been updated since the last time we've updated the snapshot. So editing context one-- or rather, application instance one-- updated the database. Editing context two goes to update that database. And it's going to fail with an optimistic locking failure.
So I wanted to go right to the demo and show you some of these behaviors. And there's actually some additional wrinkles that come into play when you're doing this in a real world situation. So from a high level, or from an application instance level, this is exactly what happens. But because the web is a stateless medium, there's some additional wrinkles that are introduced when you have a web browser that has a chance to get out of sync with what's actually in your application. So if we could cut over to the demo machine, that'd be good.
So the demo is about as simple as you can get. This is the EO model for the demo. You can see we have an object called movie here with a too many relationship to role and a 2, 1 relationship to studio. And essentially, I've constructed a demo that that has one component that allows you to edit any of these EOs or their relationships. And it actually oversimplifies the case somewhat because in the real world you have cases where you're going through a workflow and on any given page you'll see some objects and you won't see other objects. But in this particular case, you see everything on one screen. It makes it a little simpler, but as you'll see in a second, it's still very complicated. So I have two different browsers here, IE and Netscape, and we all know how well they like to communicate with each other.
So these two different sessions, and I'm going to start to make some changes to the objects. So I've done a fetch, and I've pulled all the objects into each of the associated editing contexts. Right now, they're on the same application instance, and I'm going to make a change. So I'm going to go behind the scenes, and I'm going to edit one of the objects directly. So I'm going to edit the movie's description and change this from labor union history, let's say, to labor union movie.
So right now you would expect that if I did a fetch, I probably wouldn't see that change according to what I told you. So let's go ahead and do that. And before we do that, I just want to pull up what's going on so you can see. so we do a fetch against those movies you can see that We've hit the database. We've actually pulled back all three movies.
But we don't see that change. So I want to do the same thing, but this time I'll do a fetch and refresh the snapshot. So according to what I've told you, if you fetch and refresh the snapshot, you should pull back from the database, update that snapshot, broadcast out to the other instances, and you should see the change. So let's do that. Sure enough, we see the change. But I want to introduce an additional wrinkle. So let's actually go to the other application instance, And let's do something similar.
Let's do fetch and refresh. And you would expect that you might see the change, right? But you're not. So in this particular case, we're still seeing labor union movie. Whereas in this particular case, we see the old value, labor union history. And if you go into the console-- You can see that sure enough-- I'll demonstrate it just to be absolutely sure-- that when we go in and fetch, sure enough, we're hitting the database, but we're not seeing any updates. So what's going on here? Well, what's going on is that-- When this first session went and refreshed the snapshot, it broadcast out those changes to all the other instances. So it pulled that data into its own editing context, so we saw the change. It then broadcast that change out to the editing context that's sitting on the server that is represented by this particular session. But when this session went back to do a fetch and refresh, it synchronized the bindings. So it took the values, labor union history, that was saved within this overview, compared it to the values it had, which had been broadcast out from the other object, and noticed they were different, and assumed that this particular user was actually making changes. So from this user's perspective, he hasn't made any changes at all.
And not only has he not made any changes at all, but he's committed an action that you would think, and that he would think would send him to the latest data. But in fact, it hasn't done that at all. And in fact, committing this action has cemented this older version of the object right back into the editing context. So if we were to hit Save Changes, which does nothing more than an editing context save changes, so it merely saves unchanged objects within his editing context, it will actually update the database at this point. So I'll go ahead and do that.
So we can see that it's updated the database. So without either user being aware of it, we've actually managed to overwrite the commit that the first user's done and replace it with an older value. So this isn't even a case where two users are attempting to edit the same data, but there's still one user fighting against another and overwriting the data. And what's actually sort of interesting, I want to do the same thing, but rather than fetch and refresh the snapshot, which in this case is a button which submits the form, I want to follow this hyperlink that says do nothing, which is essentially a no op action. It simply returns the same page. But before I do that, I want to clear out all the changes. So in both cases, I'll invalidate all the objects. So we're completely up to date right now.
I will commit a change to the database from the back end. So we'll change this back to labor union. I'll just get rid of the word altogether. So we're up to date in that. In this case, we'll fetch and refresh snapshots exactly like we did before. And we can see that it's gone. But in this case, we're going to follow the do nothing hyperlink.
And you can see we're up to date. So what's the difference here? The difference is that when we follow a hyperlink, we don't actually submit any of the data that's in the form. We don't update those bindings. And so we see the update that's been broadcast out to us in the editing context. So the other thing I want to demonstrate is a very similar behavior, but with regards to too many relationships. So let's clear everything out again.
And this time I'm going to make a change to an EO, but I'm going to make the change directly to one of the relationships. steps. So I've now made a change to this role right here. And in this particular case, why don't we start by doing a fetch? So I'll do a fetch. And you can see that we don't see the change to the relationship, which is probably what you'd expect. And if we go to the console, you can see that even though we don't see that change, we've actually gone out and hit the database again. So let's do the same thing, but let's this time do a fetch movies and refresh the snapshot. Well, what do we have in this case? We again go out and hit the database. We again pull back those three rows. But we again haven't seen that update. OK. Why don't we try invalidating that movie?
So we invalidate that movie. And again, we don't see that change. Let's look at what's going on behind the scenes. You can see that we do a fetch against the movie. So we pull back that particular movie, the one that we've invalidated. We've also refaulted its too many relationship. So we actually pull back those new roles right here. But you're still not seeing it. So the point is that even when you invalidate, you're not necessarily getting updates to a too many relationship. So this time, let's invalidate all.
So we do an invalidate all. You can see that we've actually gotten that update to that object. But I mean, we really have a flurry of database activity, right? We've hit the database once for every single movie, whereas before we were pulling back three movies at a time. And that's because we're essentially iterating over an array with these movies in them. And rather than fetching, we're simply pulling all those movies back. And then we're also fetching back each of the relationships, so the too many and the too ones. But in this particular case, when we pull back the too many, we can see that it's updated.
So the other thing I wanted to show you was the difference in behavior between when you update within a single instance versus when you update within shared instances. So let's make a change. Actually, I think we're invalidated, but just to be sure, let's clear out the entire cache. Let's make a change.
So I have overview. I don't know if you noticed in the model, but I have overview designated as a locking attribute. So you would expect that if two different users make changes to an overview behind the scenes, that you should lock on that attribute. So when they're within the same editing context-- or sorry, when they have the same shared stack, so you have editing context that share a stack and we commit saves.
We're just going to see that they can override each other. So let's take a look at that in the console. So we have two sets of updates right there. And we haven't detected any conflicts, because the snapshots are always in sync with the database. So even though those particular attributes are marked for locking, we're able to update. But let's simulate the exact same behavior within separate editing contexts. So we'll start over. Pull the plug. Make a change behind the scenes.
attempt to make a change. So this is, like I said, analogous to a user in another application instance committing a change to the database. And we'll also commit a change in this editing context and attempt to save it. And what you can see is that we've got an optimistic locking failure occurring. So the exact same behavior could result in the user seeing either an update to the data or a locking failure, depending on which application instance they end up in.
So there's actually a couple other things we could demo, but I think I'd rather just jump right to questions, and then as people have questions, maybe we'll demo that behavior in there. So I'd like to bring Steve Miner and Eric Nui-Yao up to the stage. They're both part of the WebObjects engineering team, and open it up for questions. Can I just slide, please? First of all, a big hand for our presenters. Thank you very much.