Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2000-410
$eventId
ID of event: wwdc2000
$eventContentId
ID of session without event part: 410
$eventShortId
Shortened ID of event: wwdc00
$year
Year of session: 2000
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC00 • Session 410

WebObjects: Optimization

Tools • 56:16

This session provides details on the WebObjects application server architecture with a focus on maximizing performance in a production environment. We cover common pitfalls and solutions, optimizations, sanity checks, and other issues related to scaling an application from the developer's desktop into a multiserver/multiuser environment. Discussions of memory management, resource usage analysis, and effective stress testing are also included.

Speakers: Alex Cone, Bill Bumgarner

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it may have transcription errors.

Good morning, everyone. Please take a seat. We're about ready to get started. If you are in the overflow room, we have plenty of room still here in the main hall. Probably the only time that's true today, so come and take advantage of it. So glad to see you all here this morning. How many of you made it to the community boff last night? You have a good time? Get to meet some people, learn a few things. Excellent. So as you can see, we got tired of having Bill preaching at us from the aisles. We just decided to just bring him up on stage and make it a lot easier on all of us. So we're pleased to have Alex Cohen and Bill Bumgarner here from CodeFab. These people between them probably have a good fraction of the WebObjects experience on the planet. And so we're really pleased to have them here about how to tune the last drop of performance out of your applications. Alex and Bill.

Well, thank you all for getting up at the crack of dawn and getting down here. I know next time we'll make sure that this session is later in the day, so everyone can sleep in in the morning. That's the first trick to optimizing is make sure you get a good night's sleep.

Okay, so, you know, basically, you know, this is about the good stuff. You know, you've got an opportunity to do killer high-performance application. This is when somebody wants to do the big high transaction rate e-store that's going to make a bazillion dollars. And, you know, this thing has actually got to work. It's got to work well. It's got to work well with a lot of people doing a lot of transactions through this thing. And, you know, and in truth, you know, you have to do a little bit of thinking and planning before you can get something that works exactly the way you want it to, to be ultra-high performance and ultra-reliable, ultra-scalable, and so forth. And we're going to try and give you some good points on how to make a WebObjects app that can take all that punishment and make it look easy. And we want to help you guys build an application like that and make it look easy. Okay. So, this is what you're gonna learn, how to do it leaner, meaner, faster, nastier WebObjects applications that will kick everyone's butt. Most importantly, how to optimize them before you build them. And then what to do when the application turns out to be quite as fast as you wanted it to be, and how to get that last ounce of performance out of it.

Okay, so what could go wrong with an application? I mean, what will stop an application from running, you know, perfectly with zillions of transactions and, you know, without blowing up? Well, you know, it can work too hard. You know, the application server is cranking away at 100% and there's no spare CPU cycles and so the response time just starts to slow down because, you know, you're just competing for actual cycles on the server. You know, similarly, you can run out of memory and the machine is working too hard getting stuff in and out of swap space. You can be bound by the network. Literally, you just have trouble getting enough packets in and out of the machine and bound by the actual code you've written. You're doing too much per request response loop and it just takes too long to get those responses out. And of course, a common one is that the database is working too hard and It takes too long for your database calls to come back because it's doing too much.

The first three can be fixed by spending more money. CPU bound, we'll just buy bigger computers. Memory bound, stick those things full of RAM. Network bound, buy a bigger pipe. In general, when you're building a big site like this, it's important to say, "You have to have enough computer power to do this kind of thing." It's not going to run on some linings box in the corner.

You're going to talk big, you know, Solaris machines with lots of CPUs in them. Similarly, you're going to talk about gigabytes of RAM. The basic idea is to make sure that, you know, you want to get as close as possible to not swapping at all. Similarly, you want to buy CPUs until you've got a lot of idle time left on your CPUs. Because, you know, you get that slash dot effect going on.

You get mentioned in the popular press. And suddenly, you know, you get ten times the people hitting your site as you used to. And, you know, when it's running at sort of normal rate, you want to have, you know, the CPU usage down where you've got some extra space. But, you know, the last two problems require optimization and that's what we're here to talk about. You want to start from here? Sure. OK, so one of the things is that with optimization is you don't want to optimize something before it works.

However, there are some things you can do before you start coding that will lead to an application that is both relatively optimal, or at least passable, as well as something that can be optimized. One of the philosophies we work by is to make it work, then make it work right, and then make it fast.

And it's very important to make it fast last. A big mistake people make is trying to optimize something before it exists. In that, what we're going to talk about is design. What you can do in the design and in the initial coding process to really lead to an application or a server solution that can be scaled and managed and extended. Within that, understand usage patterns. You want to optimize the most used areas first. You really want to focus your design around the areas that you think the users are going to use. If you're talking about a large site and you have the budget to do it, do some use testing. Do some focus testing where you take your potential user audience and run them through the designs and get their feedback. There's companies out there that do this. The other thing is make your entry page fast. One of the things you'll see on the web, especially when you get nailed by the slash.factory, you get touted in AOL or something like that, is that you'll get hundreds and hundreds of users that will hit your entry page and go no deeper. Kind of depressing, but it's the reality. To that end, one of the things that Dave Newman mentioned in his security thing yesterday, which was very interesting, if you're talking about a site, well, not even a site where there's logins, but in general, if you're talking about a site where you don't need to create a user session when the user hits the entry page, you can avoid a tremendous amount of overhead by not creating an individual user session when the user hits that entry page. That can be a great boon to performance.

Plan your business logic around response generation. One of the things we commonly find ourselves doing is we're building back end applications like entry tools, etc. And while those are more complex, what you really want to do is design your business logic about how the front high traffic piece is going to be used.

You want to avoid repeating expensive calculations. Use caching. Just avoid the expensive ones altogether. Use less precision. Don't provide as much information. Or provide user interfaces where if the user wants the expensive information, they have to drill down a little bit. They have to ask for it. Retain and reuse data. Know when it is out of date. That's a huge issue.

There's a caching session, EOF caching session. I recommend everyone go to that. and manage your cache data carefully. This is a huge issue as well. Think carefully about how often you really need to refresh that data and how you're going to go about doing that. One common mistake is invalidating your cache simultaneously across all your applications so every single app hits the database at the same time. Bad idea.

You also want to minimize your memory footprint. By doing that, you can run more instances, which gives you more opportunity for scaling by spreading your traffic out across instances. Share data across your sessions. What that means is when the application starts up, you pre-cache information. You want to clean up thoroughly and you want to clear transient instance variables when no longer needed. Now what that means is that if you're doing Java programming, just because you have a collector, don't be a lazy programmer. When you're done with something, set the pointers to null. Not pointers, set the variables to null.

That will, not only will that make your code cleaner, it also means that you're going to avoid issues like using a variable that you really didn't mean to use anymore. It also means that when the garbage collector does run, it has less of an object graph to traverse. Use stateless components.

Stateless component is a component that literally has no state within it. It's cached by the application, not by the session. Use shared sessions if appropriate. If you're talking about a new site, maybe you don't need user sessions at all. And set the session timeout value to something appropriate. You don't want these things sitting on your server forever.

You want to plan your data access, your queries caching, cache updating, and understand your data latency. You really want to try for zero requests or zero data requests per response. Now obviously you can't do that. I mean, you're never going to achieve zero, or else your app's never going to display anything. But you really want to minimize those, because if you can minimize the trips to the database, you can really increase app response time. It also leads to better scaling. If you have fewer applications talking to the database simultaneously as you hit that huge flood of traffic. If you're doing your caching and you're sharing that data across sessions, you can increase your scalability because as your traffic peaks, you don't have sudden bursts of activity against your database. Use in-memory searches where possible. Obviously, any time you can avoid traffic across the network in your server environment, you're going to get a huge boost in efficiency. You want to manage your faulting and manage your caching. Again, this is just about making sure your data is up to date. At the same time, making sure you're not expending huge amounts of processor time, network bandwidth, et cetera, updating these caches. And use the shared editing context for reference data. There's a shared editing context functionality that was new in 4.5. This allows you to relatively easily share data across sessions. I mean, if you've gotta go to the database and read the upcoming calendar events, every session doesn't need a copy of that.

And you can use time outside your request for response loop for housekeeping, or you can use time-- you manage the time with which you're doing the housekeeping in the request response loop very carefully. For example, you can load reference data at the application startup. Instead of forcing the first person to hit the site to refresh your caches or fill your caches, use the application will finish launching notification to pre-fill those caches.

You can use timers or perform after delay to do database access or to do cache invalidation, cache updating, etc. You got to be a little careful with that because of the way WebObjects works and the way threading works. You can end up with a thread issue. But there's discussions of that on the Omni group mailing list which everyone should be subscribed to. I'm going to try to keep this high level. Serialize and lock request handling. That's very important.

And this is when you get into really advanced WebObjects programming and you start doing things like multi-threaded cache updates, cache invalidation or the timers is what you want to do is you want to make sure and this is kind of a warning, we have scars. You want to make sure that you're locking your request handling when you're in situations where your caches are being updated because you could find yourself in multi-threaded situations that can lead to some serious data destruction. And these are again things that have been discussed on the OmniWeb list and we'll not really go into them. You want to partition your functionality into multiple applications. One of the temptations with WebObjects is--and with the ease with which it is to add functionality to applications is to make a monolithic application that just does everything.

And part of it there is that it's very convenient. If you have a single session for the user and that session contains everything in the world, then it's very efficient. Well, yes it is, but at the same time it also greatly limits your scalability. If you talk about spreading the user session across multiple applications such as the user say comes into your site and browses, that's a different application than say drilling down into a product or doing searches. What this means is that you have much greater opportunities for scalability. If you need to, if the search tool proves to be a bottleneck, you can just run more of them versus having to run more of your monolithic application.

Move more expensive operations from live site to data entry, that's just about building administration tools. Think about when you're building larger sites, building a front end application that the user sees, it's all optimized and oriented to performance. And building a back end application, which is what the administrators see, which is optimized to flexibility and power in manipulating the business. And by doing that, you can move your expensive operations into the back end. Yeah, I mean if you're doing something like, you know, storing images off on the server and you keep track of information like how big the images are and things like that. You got to do other sorts of processing that's necessary to prepare the user interface for the person visiting your site to see. Move those calculations to like the data entry time. When I upload the image, let's calculate what the height and width of the image is and store that information in the database rather than grabbing that at run time. There's a variety of things like that. If you can compose images by compositing and put them, store them when you enter the data rather than when the user comes to view the data, you can save time. You know, if you construct cached HTML pages when you're doing the data entry and, you know, thus have sort of information predigested ready for the site to work because, you know, you've got a small number of people using administrative application relatively infrequently and you can move functionality from sort of presentation time to data entry time, you know, that will significantly increase the speed of your presentation time tool which is actually the thing that your speed is all about. Nobody really cares if the admin tool is fast or not. I mean yes, they'll complain a little bit if it takes too long to save something but the real throughput that you're looking to optimize is the thing that the customers see. And one of the real benefits of the WebObjects environment is the ease with which you can create modules and you can assemble these modules and put them all together. And is create a different view of your data for the front end application versus the back end. If the front end application is primarily read only, which generally they are, I mean if you're in a store, it's not like the customer can edit the price.

So if that's the case, then the front end application doesn't need to have the business logic or the expense of supporting that. So you can create an EO model or say two EO models, one that has a very simple view of the data that's optimized for speed and a second EO model that's used by the entry tool application or the administration application suite that is optimized to the functionality and the power required by the business managers. And all of this can be leveraged through frameworks. Generally, what we're finding is that our applications end up being extremely thin. There'll be almost no code in the applications themselves. All they do is load a bunch of frameworks. Everything is in the frameworks. And by doing that, you can reuse those frameworks across as many applications as you need to realize the site.

We have minimize use of frames and user interface. It's just an optimization. I mean, clearly, if frames are necessary, frames are necessary. But frames can cause a lot of issues. They can cause a lot of extra traffic against your site. As well, when you're doing dynamic applications where you have to update content across multiple frames, you'll find situations where you end up having to reload the whole page, which means you get one hit to load the frame set, one hit to load each frame, and that can be very, very expensive. It also means that when those hits for the frames come in, the browser, it can lead to a lot of bugs because the browser, you don't know what order the browser is going to load the frames in. And it's going to load both of them simultaneously. Whichever one gets there first is going to be the first one to load. And if the user hits the stop button, then okay, one frame loaded, the other one didn't. How do you know? You don't. So that can just be a lot of confusion there. Use direct options wherever you can. Direct actions are wonderful. only do they allow for bookmarkable sections within your application, they can also be very, very fast because they don't go through the full request response handling.

They don't have to do form processing or things like that. You can certainly use form values with request stack, with the direct action request handler, but you don't have to. And beware of mixing Java and Objective-C. Yes, certainly the environment does support fully mixing these things in just about any way you want to. There are a couple of little subtle limitations you can run into. However, there are some serious performance issues with going across that bridge between Java and Objective-C and it should be avoided. It's also very difficult to debug.

Okay, so I'm going to turn it back over to Alex here. Okay. So, okay, so those are some good pointers just sort of up front. You know, think about when you're structuring your application, you're taking apart your problem and figuring out how am I going to go about building a solution. You know, we organize things well. We do some good planning about database access. And we do some good planning about caching of our data in order to, you know, minimize our round trips to the database. And we've thought through our framework design and everything else. And the app is done and it's up and running. but okay, you know, it's a bit of a pig. You know, let's assume maybe the opposite situation is you've inherited a pig that somebody else has built, and now it's time to figure out a way to make that pig fly. So this is, you know, all the planning up front is all well and good, but as we all know that, you know, no good plan survives contact with, you know, the enemy or reality or your customers and what have you. There's just a limit to how much you can get right planning up front because you get halfway through this thing and the design changes remarkably or your client calls you up and says, look, our business model has changed. So it doesn't always end up working out by the time you have the app written that it's exactly the way you thought when you started out. So it's a pick. It's a little too slow. It sucks more memory. You're like, damn, that's a big application instance size, isn't it? 50 megabytes and nobody's even started a session yet. You go off and do this request and it's like, it's on my machine and that's two seconds before it even started loading in my browser. you're testing this out with 100 users and the CPUs on your multiprocessor son are pegged and it's like, wow. All right, now what do you do? Okay. First of all, don't be silly. All right, you know, this seems like, this seems like trivial advice here but this is actually good. We've had some situations where we've had clients who got stuck with, you know, they had woke caching enabled, turned off, you know, for debugging purposes and, You know, they managed to get into production with the big sites on and caching was turned off so every time somebody loaded any WebObjects component, it reloaded it from disk. In fact, the particular client I'm thinking of, you know, had actually coded into the code for the application to turn this off explicitly. In eight places. In eight places.

And so, you know, everybody kept saying, "Ah, I found it," you know, and they'd take this line out and it would still be, still suck. All right. But, yeah. Yeah. You know, this sort of thing doesn't really show up when you're doing the desktop development. The pages load fast enough, the app's right there anyway.

But, you know, once everybody starts hitting this, this uses up a lot of resources on the server, reading all the stuff off disk. Make sure woe debugging is off. All right, so you've added, you know, go into monitor, you add the application in, and it's running in monitor and what have you. Well, monitor doesn't automatically turn off woe debugging. You have to actually go in there and edit the command line arguments and say, you know, we don't need all those debugging messages spewing to the log file during, while the application is running in production. So turn that off. And of course, the corollary is when you're actually doing logging, use debug with format as opposed to, say, log with format, which doesn't get turned off when you turn woe-debugging off. And in general, you can achieve some speed pickup by having your application methods, your action methods that return to the same page return this context page as opposed to returning nil? That basically does is it short circuits the action processing within WebObjects so that whenever--basically what happens is whenever WebObjects invokes the invoke action for request method, that's the thing that says, "Okay, which button did the user click on? Oh, it was this button. Okay, I got to do something."

Well, when you got to do something, if you return nil, WebObjects doesn't know that you actually did anything so it's going to keep searching for whatever user interface element was clicked on. by returning something, in this case, this context page, which just simply reloads the page you're already on, which does exactly the same thing as returning nil. But what it does do is it gives WebObjects a signal, oh, you can stop searching for whatever the user clicked on or whatever the user did. And depending on how complex your page and how many nested components and what have you, this can save significant amounts of CPU processing. Okay. So you got to start cleaning this stuff up.

Where do we start? Well, we got to start with the most frequently used bits. And this is the classic thing about all optimization is, you know, you can have this one page that totally sucks, but nobody goes there. So don't bother worrying about that one for now. Start off with the stuff that they do usually. You know, you got a store, they're doing a certain amount of browsing, you know, they got the whole checkout process and so on and so forth. Handle that. You know, if you're dealing with the lost your password, you know, section of the site and trying to optimize that, you're optimizing the wrong thing. Now, people don't spend their time there and they don't complain that it takes too long to fill out the little form to send me an email to give me my lost password. So log your user activity, know what they're actually doing. Use the Woe Statistics store logging. This is a great thing, it's only gotten better in four or five, gives you a tremendous amount of information on which pages the users are using and how long they're taking and what the average response time is. You look right down and say, oh look, this one's got an average response time of 10 seconds. It's like, okay, sure, sometimes it gets by with half a second, but we've got 35 seconds here and there. a good indication of where to go to start cleaning stuff up. Capture your direct action activity.

The direct action information is not by default. Most of the WoeF statistics store logging deals with component actions. It'll also keep track of your direct actions and what's happening, but you can code your direct actions in such a way that you're always going to some direct action with the same action method and then it's got some other arguments to tell it what to do. In a statistics store it'll show that, okay, the default direct action had these 50,000 hits on it, but it doesn't really tell you that much if you go to the same method and then you branch based on other conditions. So put in some logging stuff so you can tell which direct actions are doing the most work. And then tune the most visited areas first. This is generally where your butt gets bit the most.

By and large, WebObjects applications are big database applications. And the thing that most, you know, after you've cleared up the fact that you were running this on too small a computer or, you know, you cheaped out when it came to putting RAM in the machine or what have you, generally what it comes down to the fact is that you're bottlenecked on talking to the database.

So you need to start out by making sure that the app doesn't do amazingly stupid things with the database. So a common thing is go in there on that search page and if nobody fills in anything on any of the fields and hits search, don't go off and search and return all records. Say, "Oh, you got to put in at least one qualifier," something like that. That's a tremendous help because the big search and the big return result, it's going to take time to do the search, move the data across the wire, instantiate all those objects. You're just going to look at the first page and then say, "Oh, well, that's way too much information and type something in anyway. So, you know, a little bit of sort of smart modifying the user's behavior can go a long way. All right. Use fetch limits.

This is, you know, this simplifies a bunch of things. I mean, net-net, you're mostly doing a bunch of the same work as doing, you know, a large query because the database, you know, has to process the database request in the first place but you can choke off returning back tens of thousands of records by putting in fetch limit. I mean, nobody wants to look at, you know, more than 100 items on a return anyway, except in very rare circumstances. And if you put a fetch limit on there and take--bring back the first 100 or the first 20 records and then make sure that the user wants to see more, you know, you can limit the amount of data moved across the wire, the amount of objects you instantiated, the size of your cache, and so on and so forth. It's often useful to cache search results. This is kind of an interesting thing. It's for all, you know, blue t-shirts for men that are medium or larger or what have you. You get one, you go down and look at that t-shirt and say, "I didn't really want that," when you go back to the search page. It turns out to be very nice if you go back to the search page and the results of your previous search are there. And then they can go down to the second one in the list and go down there. This may involve, you know, you having to write code to keep that search around and keep the search results around on a per session basis, as opposed to when you go back to the the search page, you clear everything out and they got to do a search again. Because one of the most expensive operations generally on your site is doing these sort of big database searches. That's when the user is expecting to have a long response time because you're connecting to the database and returning a bunch of stuff. And if you can just sort of minimize that, that's generally, you know, last half dozen apps I've done, that's turned out to be the page that had the worst performance with the big unlimited database search page.

And so just by having that thing come back, be there automatically when they come back, drastically reduces the number of searches an individual user will do. Last but not least on this particular subject, if you have a small enough set of objects, you're doing a store but you have 100 products or 200 products, maybe it makes sense to have a read-only set of product data in memory that you initialize when the app starts up and you do the searches against this cache using in-memory searches and don't bother going to the database. If you've got a CD store with 400,000 records in it, no, maybe it doesn't make sense to bring all of that into memory and do searches. But if they're doing searches on relatively small things, definitely use in-memory searches. They're fast and all sorts of precious resources, bandwidth going over to the database server, the database server's resources, et cetera, suddenly are disappeared from the equation.

When you've got to do some fetching, let's optimize this a bit. It seems obvious, but it's definitely a good thing. You've got pop-ups, you've got reference data, you've got stuff that is constant across everybody's stuff. Fetch it at the application level in a shared editing context and keep it there.

It's really easy to start coding up using the default editing context for a session and start doing stuff there. You can end up with copies of data in every session in every editing context and you just don't need to do that. Use the sessions editing context only, and I mean only for data that the sessions user will actually edit. If he's not actually changing the values in it, you can share the data with everybody else. You can have a list of, you have a session specific list of things I'm interested in or what have you, but it's not, doesn't mean you need a session specific copy of the actual data. And that's just a general good rule of thumb. Is the user going to edit this piece of data in his session? No, then we don't need it in the session editing context.

Okay, you get to the stage where you want to avoid doing fetches in order to draw the pages that the user is looking at. Good idea is to cache and share data that's used to draw the pages that the users are looking at and try and keep that cache data up to date. You end up with a situation like we've done some financial sites where people are putting in bids and offers and doing trades and such so that session A, you know, app instance A is going to put some data into the database that everybody else needs to see, you know, you need to find a good way to make sure that everybody's information is up to date in a timely fashion. You know, there's some really neat stuff for, you can do inter-application messaging so that the individual applications don't have to fetch from the database every time. There was some good work that Dave Newman posted originally on doing snapshot updating. We've done a bunch of stuff to modify that stuff. But that just avoids you having to go to the database to get the cache, you know, to update your data. You can also use the time between request response loops. I mentioned during--up in the design session, you know, you can just--when nobody is actually requesting something, go in and do a fetch and update the cache data. That's a little less efficient than, you know, notifying the various app instances that the but net-net, when it comes time to handle a request for a response, you know you've got up-to-date data in your application, you don't have to go to the database.

And of course, if you've got to get some sort of non-object-based data out of the database, go ahead and use the raw row stuff. It's quite fast and doesn't involve instantiating objects. Don't try and get around the whole object mechanism using this stuff, but if you want to know if there are, you know, if there have been any changes to the database with, you know, in this particular time frame where you want to, you know, you can use raw rows for certain specialized stuff and it's quite fast. Alright, the thing that really bites you in the butt is, you know, you have this picture of what the application is doing. And, you know, you think it's being very efficient because you optimized the design before you wrote the whole thing. And it's still slow and the database is still cranking away. way. So, you know, obviously you're doing fetching where you didn't expect to do fetching. So, EOAdapterDebugEnabled is your friend. You know, turn this on and you'll see all SQL that's being generated. You go to this page and you think there's no SQL, no queries involved in this page. And you go hit this page, it's like query, query, query, query, query, query, query, query, query, query, query, process, process, and you're like, "Where's this all coming from?" And, you know, it's very easy to discover that, you know, in your WAD file you're referencing object.relationship.relationship.value and, you know, your smart caching where preloaded the data up front. You didn't use any prefetching or anything else like that. So all the stuff on the other ends of these relationships hasn't been fetched yet. And so you go to visit the page and you've got some binding here and that forces several fetches in order to get the data to answer the binding. Now especially bad is when you're just saying, you know, you're like testing to see whether or not we should show this component or not. You know, does this object have, you know, one of these things on the end of this relationship? And so you fault in the relationship only to find out no, it doesn't have anything and you're not going to display anything anyway. Be very careful about what you bind to and how you answer some of these questions. This actually raises an interesting point about web objects in general. I know virtually nothing about databases. I'm lost when I hit a relational database. You give me a raw SQL window, I don't know what to do. But with EOModeler, even I can set up a really complex database, generate it, use it, and do very useful things. Of course, I can't make it go fast. I mean, the power of these tools can be intoxicating. It can lead you to some trouble. And these things here, being, you know, using the adapter debug enabled, looking at the database plans, things like that is critical because it's very likely you're going to have someone on the project like me who can make the thing work at the object level and is going to realize an application that's just going to be a pig in production at the database level. So, yeah, there you are cleaning up after the pig. But, yeah, one of the good things you can do to avoid excess faulting is when we were talking before about having separate data models for data entry and data display, you can have instances of, you know, like the product that you're going to display on the screen or whatever, the article that you're going to display on this article page that you've tuned for the runtime application, and you do things like flatten relationships in. And so, you know, testing to see if you have a picture, and if the article has a picture, then we need to put it in the picture component or what have you. If it's been flattened in, we can check the value without causing faulting.

If you have this thing as a separate relation, article.image, in order to say if article.image is not equal to null, you have to fire a fault. So you can optimize your fetching behavior by tuning the EO model and flattening relationships in for presentation. One of the most common mistakes that we've had to clean up, I'm sure none of you would do this, you're all very good, You're building the components up one at a time, and you think, "Oh, this component, I'm going to need this pop-up list of all the states in the country." And so in your init method, you write a little thing in there that does a fetch of all the states, all objects, so that you can populate the pop-up. Because you're just writing this one component at this moment, you're not thinking about it.

And six other developers on the project for six other pages that have a list of all the states also write the same thing. And so every time these components are initialized, they go off and do the database and fetch in and components come out of the cache and they are recreated and they fetch in and components are cached in different sessions and each one in its init is fetching in this list of 50 states. You only need one copy. It's not like the states change all that often. You go through and you clean all this stuff up and you move this stuff off to application in the shared editing context. Everybody who's got a pop-up or browsers or things like that, valid regions that we ship to so on and so forth can get this common reference data out of one place and not try and do this stuff on each thing. If need be, fetch all the objects you need and then you can use filtering to produce the stuff that you need for each individual page. The other common mistake that involves excess fetching is, okay, you've got this thing in in a shared editing context and you accidentally cause stuff to be fetched into the session's editing context by sort of not managing which objects are in which editing context. If you must, you got to the point where the user's going to edit some object and you use local instance of object to get local copies of the object without doing faulting.

Basically, without going to the database to get this. Basically, it's creating a new instance of the snapshot data in a particular editing context and doesn't require a round trip to the database. You carefully manage when you move things across the boundary between the shared editing context and the session's editing context. Follow this stuff all around. Have a policy. These objects are all here. We'll only have this object when we do this or whatever.

Stick to it and that'll soup things up a bunch. Optimize your EO models. Again, there's a tendency to go batshit on your EO model or come up with the perfect normalized, abstracted EO model with everything as an object and so on and so forth. We had one client who had a table for gender objects with a row for male and a row for female so that they could have all the people who'd signed up for their site have a reference to either the male object or the female object, you know, it's like, oh please, you know, use flags, you know, simplify some of this stuff down. It may not make everything an object, but, you know, they were faulting these things in all over the place. And it was like, no. Okay, the other cool thing is EOF and inheritance, it's cool. You can do just amazing things with this. And I know I've been given the, you know, EOF inheritance abuse award a few times. You know, think seriously about, you know, how much of the inheritance stuff you absolutely need to have in your model. You know, if worst comes to worst, you can do a complex hierarchy for your editorial tool and simplify it for your application. But there are a lot of cases in which having a complex inheritance hierarchy, especially when you're doing deep fetches, which is, you know, I've got 15 types of users and I want to select all users who haven't been here since last week and I've got to do a fetch against each one of the 15 tables. Even when you're doing something like single table inheritance, it's going to do fetch against table A where flag equals one, fetch against table A where flag equals two. Each round trip is expensive. What you're trying to minimize is not the data that's pulled across, but the actual number of round trips to the data server. The complexity of your inheritance hierarchy, especially when you're using deep fetches, can cause a lot of round trips to the data server.

This again brings up another point where a person like me can get you in a lot of trouble. Because I think in objects. I look at a bunch of users and I think a big inheritance hierarchy, yeah, that makes total sense. But EOF provides a brilliant object-oriented interface to a relational database. And a relational database doesn't do inheritance well. Object-oriented databases do, but there are other issues there. So keep the object model simple, not because the object model being simple is great, but because it's going to make database that much faster. And you can overload these tables, too. I mean, you can have complex objects that you use for editing and then slap over on top of the same table a simplified object, maybe with flattened attributes and what have you, that you use for presentation. Maybe we're doing on this page some simple piece of information processing. And we can create a new user entity that spans the important shared part of all the other user entities. And we'll just do a query against that. It doesn't give us the whole complex hierarchy, but it gives us enough information to answer the questions that we need to do, and it doesn't have an inheritance hierarchy at all. There's tricks you can play like that that will simplify things. Again, think about what you're going to use these things for. Use batch faulting where appropriate. You can basically, what this does is you sit there and you set batch faulting in your EO to say when you're going to fetch this object, why don't you fetch the next 10 because we might need them. Basically what you're doing is you're pre-populating the cache that's stored by EOF, the snapshot dictionary of your objects. But then you need to make sure that you're using that appropriately.

If you've got 2-1 relationships in the same editing context and the object on the other side of the 2-1 relationship is already in your cache, it'll go find that without faulting the database. But, you know, it's not going to, you know, if you've got a too many relationship, it's going to have to go to the database anyway. I can't tell that it's got, you know, all the children in there for the parent because even though, you know, you know and I know that, you know, all three children are already been brought into the snapshot dictionary, you know, it doesn't really have anything that can tell to make sure that the, you know, the list is complete. So it has to go to the database even if the result of this is that it can satisfy the to many relationships out of the cache. Use prefetching. This is transmogrified from the earlier days to the current days from hints to actual directives. You can just say when you're going to populate this object, populate these things on the relationship. This is useful for when you're building up your cache to make sure that later when people start using the objects and following the relationships to things, the objects on the other end of the relationship are already there.

Beware of excess complexity in your model in general. you end up with extra pointers to various objects in there that can then cause further fetching activities or in certain cases, excess back pointers can prevent prefetching from working the way it's supposed to. So once you've set up all this prefetching, you've got to actually watch the stuff with the adapter debug enabled to make sure that the right objects are being fetched when you expect it to. All right, so EOF does all this great stuff for you. It'll build your tables, it'll build your database, so on and so forth, really nice stuff. do for you out of the box, it doesn't create indexes. You've gone off and you've created all these objects that have unique primary keys. Doing all this fetching based on unique primary keys. Create indexes on those things. Also look at your queries and see what you're doing. People are doing these sort of queries where they've got fields they can type in values and do a search. What are they searching on? Create indexes on those values. You can speed up your database activity tremendously by properly indexing things. If you're not quite sure how the database is using stuff, this is a great thing. Everybody who's not like a database geek doesn't know about this, but this is a database propeller head thing for sure. Sybase is show plan, and Oracle is explain plan. You turn this on and run your query, and it says, "Well, you know, I was going to check in this table and that table, and then I was going to gather this information here, and then I was going to process it and do that stuff there." And it tells you exactly how it's going to go about giving you back the three rows of data you would actually get from your complex query. One of the useful things this will tell you is, and then since you asked the question in just this way, I decided not to use your index and to do a table scan instead to get the results out. By doing explain plan, fiddling with your indexes and what have you, you're going to actually make sure the database is doing what you want it to do, not what it thinks it has to be able to do. I've got plenty of time. Yeah, we can go over. It was a two hour session, right? Okay, no problem. No, we're getting close to the end. Other good tuning thing is the database is running exactly the way it's supposed to. It puts most of the information that you're going to access on a regular basis into memory cache. You can check the database statistics to find out is it doing that or is it going to the disk every time for your data and you can tune that. Also just more silliness and you may have to get somebody who's a database whiz to come in and do some tuning in the operating system. You got a multi-processor machine, oftentimes you have to actually tell the database to use all the processors. Usually databases often have a bunch of parameters about how much memory they use, how much data they put in there, how much stored procedures they put into memory. Tune that appropriately because you're going to have this big piece of iron that's basically sitting there idle because the database is trying to run a little tiny slice of memory on one processor and it doesn't do anyone any good. Speaking from experience, databases run really, really slow when they're tuned for, say, 512 megs of RAM, but you only have 256.

And it turns out not to work quite as well as you'd like. And last but not least, actually look at the generated SQL. It'll suggest additional indexes. You shouldn't ever need to do hand optimized SQL and put that into the EO model. It's definitely a last resort. But once in a blue moon, based on the way you've constructed your object model and what have you, EOF may construct SQL that is less than completely optimal. And there may occasionally be special purposes where you need stored procedures. It's basically compiled SQL, runs on the server faster than on the fly SQL, and it can be useful in certain circumstances. Now that we've gotten out of the scary database part, I'll get this back to Bill. Okay, once you get the database going fast, because that is generally where most of the bottlenecks are, you need to start looking at optimizing your application itself and optimizing your components. One of the first things is there's a great temptation to componentize everything, make everything a reusable component. That's actually a significant performance hit. Do it carefully. Simplify your component nesting. You know, don't make every image in the nav bar an individual web component. Make the nav bar itself a component, things like that. Define your own compiled subclass of WoW component and put your common functionality there. What this does is this This just simplifies your overall component hierarchy. What we always do is we always have a subclass of Wo component and every single component, be it Java, Objective-C, web script, doesn't matter, inherits from that specific subclass. By doing that, not only do we gain the benefits of sharing all this functionality across the component hierarchy, we can also push some debugging information into there, some little debugging triggers, do some logging things. It's a great place when you start to get into debugging and performance optimization to be able to put breakpoints and print information, etc. You also consider caching pages or using new stateless components. Any page where you're not displaying information specific to the user or even if you are to a limited degree, there's no reason to not use a stateless component. Stateless components are great. That means they're cached at the application, not in the session. The other thing is caching your pages. Again, if you're talking about a page where you're say selecting a region for some store application or something.

Well regions in the United States aren't going to change that often. So cache that information. And finally, make static content static. And this is one thing that a lot of people miss. There's a great temptation to serve everything from WebObjects or everything from the dynamic content generator. If you use static content, you get an order of magnitude performance improvement.

Static content comes straight off the disk, goes straight out the web server. There's no state associated. It's blazingly fast because it's exactly what the web was designed to do. I mean, in effect, all these middleware things we're doing, all this web object stuff is doing something to the web that it was never designed to do. And there's a big performance penalty for making something do what it was not designed to do.

Refactor your software. Once you get the thing built, once it's working right, and you found where your bottlenecks are, start to, you know, compile anything that does serious calculations. Look at optimizing your calculation engines. Look at generalizing that and moving it out of sort of the application layer and into the backend layer and really treat it like a serious calculation engine that you want to maximize performance of and then use it from the upper layers. Simplify your application in session objects. This is more--this isn't really about optimization as much as facilitating optimization.

What you want to do is if you have, say, something that does region management, going back to the store thing where you've got multiple regions, at the application level, and there's some complex, say, product selection or product availability on regions. As an example, we did a record store.

Record stores, there's certain records you can't sell in certain countries in the world. So we have a region manager. We push that region manager into an object of its own that you can access through the application. By doing that, it moves that functionality out of the application level. And it means that as we're optimizing that, we're optimizing other things, we're modifying something that's relatively isolated.

And finally, don't forget about the web server. Is the web server optimized for the environment? A classic example of this is-- OK, so you're running against Apache. You've got that WebObjects adapter in there. You've tuned your application out to the nth degree. Oops, you're only running five Apache servers. I did that once. Tune it. Make sure it has the appropriate configuration for the amount of load you expect to have. Use a mixture of your static and dynamic content. Wherever you can use static content, again, that's just going to boost performance. Direct actions allow you to integrate the static content with the dynamic content. If you enable-- OK, in WebObjects, one of the great things about HTTP, since it's totally stateless, is that when you have a user having a user experience with your site, you have to pass around a user ID, a session identifier. Well, normally by default, that session identifier goes back and forth in every single URL. So every time a hyperlink is generated in the dynamic content, that hyperlink has to have this big, long, nasty number that identifies that user such that when the user clicks on that, WebObjects can figure out what session to associate that hit with. Well, if you move that to the cookie, you know, cookies have their own problems but pretty much they're supported everywhere now. By doing that, pretty much all the URLs in your content no longer have to have user specific information in them. This allows you to integrate your dynamic content and your static content. So for example, again using a record store example, we may have static pages that describe albums, static pages that describe artists.

Well those don't change very often. Leave them on disk as static. Let the WebObjects application navigate over to them, have direct actions in those pages that bring the users back into the WebObjects application. The more hits you can get against the static stuff, the better off you are. I'm going to move quickly and do some QA. Okay. Well, we actually--do we have 10 minutes after to do QA? Fifteen. Fifteen, great. Thank you. Optimize for fast browser display. This is another little war story here. We had a client and the content generation, the content delivery was really, really slow. And this was back in the days when tables didn't really quite work right and you couldn't really specify image sizes quite right. And to do layout, you had a spacer.gif everywhere. I mean, if anyone's been on the web for more than two years, you probably remember this. Well, the path to the spacer.gif was like webobjects/sumapp.woah/webserverresources /images/spacer.gif. And all we did is we put spacer.gif is s.gif in the root level of the web server and reduce the amount of HTML generated by about 40% across the entire site.

I was just like, "Oh." Smaller pages display faster. Less HTML you generate, the faster it goes out. The less dynamic content you're generating, the faster it goes out. You want to batch your displays along sets of data. Not only does the user not want to see 3,000 products all at once, this makes things go faster.

Show them 10 at a time. Show them 15 at a time. Generate short URLs. This again gets back to the spacer.gif thing. of slash images, use slash I. This also do better with images and just everything surrounding the static resources that are associated with every website. Split installs and web objects are very, very convenient. They're very useful. We never do them. And it's not because they don't work or anything like that. We never do them because we put all of our static resources as close to the top level of the web server as we can. We leave it there and it reduces of HTML we generate.

You want to also improve the structure of your HTML. Now, this isn't as much as an optimist, but this is optimization, as in optimizing towards a working application, not a fast one. Use an HTML code checker, such as WebLint, which everyone should be on the WebObjects mailing list at OmniGroup, www.omnigroup.com.

If you're ever planning on doing any development with WebObjects or you're even interested, immediately sign up for that list. And the reason why I mention that as well is because we're going to be throwing a bunch of code out there next week when we get a chance to go back. And one of them is this thing called WebLint. What it does is it looks through your WebObjects HTML or your generated HTML, checks the structure of it, makes sure everything lines up.

Simplify your table structures. It's tempting to nest tables deeper and deeper and deeper, especially when you have an object hierarchy or a component hierarchy. You want to reuse all those components. Every component needs to guarantee that it displays correctly so it has its own little table. That's really slow. It's really, really slow in Netscape. It's just very slow in Internet Explorer. And watch for nesting problems.

especially things like nested forms. If you open a tag, don't close the tag until you've closed every other tag inside of it and always make sure you close the tag. In the HTML standard that seemed to have come out of the early browser implementations, closing a table cell, closing a table row, even closing a table, closing forms is pretty much optional. That doesn't work when you're talking about dynamic content generation. And it's going to break things. Also, when it gets back to actual performance, one of the risks there, especially if you have forms that are mis-structured, is that you can get incomplete data back to your application or you can get broken data. You can get a performance hit as your application goes in, "Oops, exception," and has to go and deal with maintenance stuff associated with an error condition or an exceptional state.

The classic one is the overlap problem. I can't tell you how many times we've worked with HTML producers or we have been doing the HTML ourselves and just a simple HTML overlap where you open a form, you open the table, then you close the form and close the table. There's a problem there and this can cause some serious problems. The forms don't work. The processing must be broken.

The code must be broken. It's like, no, it's actually in the HTML. And one of the things to keep in mind is that, especially when you're a developer, you focus on a single component. I'm doing this component. Well, sometimes problems can span across multiple components. And what we like to do is we check, we use WebLint on the components themselves as well as on the entire generated content. And for more information, this here as well, please sign up to the OmniWeb, OmniGroup mailing list and there will be a lot more information coming out. After WWDC, there's always discussions on the mailing list, follows up from the sessions, etc. I'm sure Dave Newman who's making a bunch of code available will post information there as well. Dave Newman: All right. Well, we hope this stuff was, you know, was a good start.

  • Just a little bit, okay. Hope this was a good start at optimizing your applications. We got some time for a whole bunch of QA. Now, the usual who to contact. And let's do a little question and answer.
  • First of all, a big hand for our presenters.