WebObjects: Optimization - WWDC 2000

Tools • 56:16

This session provides details on the WebObjects application server architecture with a focus on maximizing performance in a production environment. We cover common pitfalls and solutions, optimizations, sanity checks, and other issues related to scaling an application from the developer's desktop into a multiserver/multiuser environment. Discussions of memory management, resource usage analysis, and effective stress testing are also included.

Speakers: Alex Cone, Bill Bumgarner

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning, everyone. Please take a seat. We're about ready to get started. If you are in the overflow room, we have plenty of room still here in the main hall. Probably the only time that's true today, so come and take advantage of it. So glad to see you all here this morning. How many of you made it to the community BOF last night?

You have a good time? Get to meet some people, learn a few things? Excellent. So as you can see, we got tired of having Bill preaching at us from the aisles. We decided to just bring him up on stage and make it a lot easier on all of us.

So we're pleased to have Alex Cone and Bill Bumgarner here from CodeFab. These people between them probably have a good fraction of the WebObjects experience on the planet. And so we're really pleased to have them here about how to tune the last drop of performance out of your applications. Alex and Bill.

Thank you all for getting up at the crack of dawn and getting down here. I know next time we'll make sure that this session is later in the day so everyone can sleep in in the morning. That's the first trick to optimizing is make sure you get a good night's sleep.

Okay, so, you know, basically, you know, this is about the good stuff. You know, you've got an opportunity to do killer high performance application. This is when somebody wants to do the big high transaction rate e-store that's going to make a bazillion dollars. And, you know, this thing has actually got to work.

It's got to work well. It's got to work well with a lot of people doing a lot of transactions through this thing. And, you know, and in truth, you know, you have to do a little bit of thinking and planning before you can get something that works exactly the way you want it to, to be ultra high performance and ultra reliable, ultra scalable, and so forth.

And we're going to try and give you some good points on how to make a WebObjects app that can take all that punishment and make it look easy. And we want to help you guys build an application like that and make it look easy. Okay. So, this is what you're going to learn.

How to do it leaner, meaner, faster, nastier WebObjects applications that will kick everyone's butt. Most importantly, you know, how to optimize them before you build them. And then, you know, what to do when the application turns out to be quite as fast as you wanted it to be. And how to get that last ounce of performance out of it.

[Transcript missing]

You want to start from here? Sure. Okay, so one of the things is that with optimization is you don't want to optimize something before it works. However, there are some things you can do before you start coding that will lead to an application that is both relatively optimal or at least passable as well as something that can be optimized.

One of the philosophies we work by is to make it work, then make it work right, and then make it fast. And it's very important to make it fast last. A big mistake people make is trying to optimize something before it exists. In that, what we're going to talk about is design, what you can do in the design and in the initial coding process to really lead to an application or a server solution that can be scaled and managed and extended. In that, within that, understand usage patterns. You want to optimize the most used areas first.

You really want to focus your design around the areas that you think the users are going to use. If you're talking about a large site and you have the budget to do it, do some use testing. Do some focus testing where you take some--your potential user audience and run them through the designs and get their feedback.

There's companies out there that do this. The other thing is make your entry page fast. One of the things you'll see on the web, especially when you get nailed down by the slash.factory, you get touted in AOL or something like that, is that you'll get hundreds and hundreds of users that will hit your entry page and go no deeper.

Kind of depressing but it's the reality. To that end, one of the things that Dave Newman mentioned in his security thing yesterday which was very interesting. If you're talking about a site--well, not even a site where there's logins but in general, if you're talking about a site where you don't need to create a user session when the user hits the entry page, you can avoid a tremendous amount of overhead by not creating an individual user session when the user hits that entry page. That can be a great boon to performance.

Plan your business logic around response generation. One of the things we commonly find ourselves doing is we're building back end applications like entry tools, etc. And while those are more complex, what you really want to do is design your business logic about how the front is going to be used.

You want to avoid repeating expensive calculations. Use caching. Just avoid the expensive ones altogether. Use less precision. Don't provide as much information. Or provide user interfaces where if the user wants the expensive information, they have to drill down a little bit. They have to ask for it. Retain and reuse data. Know when it is out of date. That's a huge issue. There's a caching session, EOF caching session.

I recommend everyone go to that. And manage your cached data carefully. This is a huge issue as well. Think carefully about how often you really need to refresh that data and how you're going to go about doing that. One common mistake is invalidating your cache simultaneously across all your applications so every single app hits the database at the same time. Bad idea.

You also want to minimize your memory footprint. By doing that, you can run more instances, which gives you more opportunity for scaling by spreading your traffic out across instances. Share data across your sessions. What that means is when the application starts up, you pre-cache information. You want to clean up thoroughly, and you want to clear transient instance variables when no longer needed. What that means is that if you're doing Java programming, just because you have a garbage collector, don't be a lazy programmer. When you're done with something, set the pointers to null. Not pointers, set the variables to null.

That will, not only will that make your code cleaner, it also means that you're going to avoid issues like using a variable that you really didn't mean to use anymore. It also means that when the garbage collector does run, it has less of an object graph to traverse. Use stateless components.

Stateless component is a component that literally has no state within it. It's cached by the application, not by the session. Use shared sessions if appropriate. If you're talking about a news site, maybe you don't need user sessions at all. And set the session timeout value to something appropriate. You don't want these things sitting on your server forever.

You want to plan your data access, your queries caching, cache updating. And understand your data latency. You really want to try for zero requests, or zero data requests per response. Now, obviously, you can't do that. You're never going to achieve zero or else your app's never going to display anything. But you really want to minimize those, because if you can minimize the trips to the database, you can really increase app response time.

It also leads to better scaling. If you have fewer applications talking to the database simultaneously as you hit that huge flood of traffic, if you're doing your caching and you're sharing that data across sessions, you can increase your scalability. Because as your traffic peaks, you don't have sudden bursts of activity against your database. Use in-memory searches where possible. Obviously, any time you can avoid traffic across the network in your server environment, you're going to get a huge boost in efficiency.

You want to manage your faulting and manage your caching. Again, this is just about making sure your data is up to date. At the same time, making sure you're not expending huge amounts of processor time, network bandwidth, et cetera, updating these caches. And use the shared editing context for reference data. There's a shared editing context functionality that was new in 4.5. This allows you to relatively easily share data across sessions. If you've got to go to the database and read the upcoming calendar events, every session doesn't need a copy of that.

You can use time outside your request-response loop for housekeeping, or you can manage the time with which you're doing the housekeeping in the request-response loop very carefully. For example, you can load reference data at the application startup. Instead of forcing the first person to hit the site to refresh your caches or fill your caches, use the application "will finish launching" notification to pre-fill those caches.

You can use timers or perform after delay to do database access or to do cache invalidation, cache updating, etc. You've got to be a little careful with that because of the way WebObjects works and the way threading works. You can end up with a thread issue. But there's discussions of that on the Omni group mailing list which everyone should be subscribed to.

I'm going to try to keep this high level. Serialize and lock request handling. That's very important. And this is when you get into really advanced WebObjects programming and you start doing things like multi-threaded cache updates, cache invalidation, or the timers is what you want to do is you want to make sure, and this is kind of a warning, we have scars.

You want to make sure that you're locking your request handling when you're in situations where your caches are being updated because you could find yourself in multi-threaded situations that can lead to some serious data destruction. And these are, again, things that have been discussed on the OmniWeb list and we'll not really go into them. You want to partition your functionality into multiple applications.

One of the temptations with WebObjects is and with the ease with which it is to add functionality to applications is to make a monolithic application that just does everything. And part of it there is that it's very convenient. If you have a single session for the user and that session contains everything in the world, then it's very efficient.

Well, yes it is, but at the same time it also greatly limits your scalability. If you talk about spreading the user session across multiple applications such as the user, say, comes into your site and browses, that's a different application than, say, drilling down into a product or doing searches. What this means is that you have much greater opportunities for scalability. If you need to, if the search tool proves to be a bottleneck, you can just run more of them versus having to run more of your monolithic application.

Move more expensive operations from live site to data entry? That's just about building administration tools. Think about when you're building larger sites, building a front-end application that the user sees. It's all optimized and oriented to performance. Building a back-end application, which is what the administrators see, which is optimized to flexibility and power in manipulating the business. By doing that, you can move your expensive operations into the back-end.

If you're doing something like storing images off on the server and you keep track of information like how big the images are and things like that, or you've got to do other sorts of processing that's necessary to prepare the user interface for the person visiting your site to see, move those calculations to the data entry time. When I upload the image, let's calculate what the height and width of the image is and store that information in the database rather than grabbing that at run time. There's a variety of things like that.

If you can compose it in a way that's more efficient, it's going to be a lot easier to do. You can compose images by compositing and store them when you enter the data rather than when the user comes to view the data, you can save time. If you construct cached HTML pages when you're doing the data entry and thus have information pre-digested ready for the site to work. You've got a small number of people using the administrative application relatively infrequently.

You can move functionality from presentation time to data entry time that will significantly increase the speed of your presentation time tool, which is actually the thing that your speed is all about. Nobody really cares if the admin tool is fast or not. I mean, yes, they'll complain a little bit if it takes too long to save something, but the real throughput that you're looking to optimize is the thing that the customers see.

One of the real benefits of the WebObjects environment is the ease with which you can create modules and you can assemble these modules and put them all together. That's really the last three points here. One of the things you can do is create a different view. You can create a different view of your data for the front end application versus the back end. If the front end application is primarily read only, which generally they are, I mean, if you're in a store, it's not like the customer can edit the price.

So if that's the case, then the front end application doesn't need to have the business logic or the expense of supporting that. So you can create an EO model or say two EO models. One that has a very simple view of the data that's optimized for speed and a second EO model that's used by the entry tool application or the administration application suite that is optimized to the functionality and the power required by the business managers.

And all of this can be leveraged through frameworks. Generally, what we're finding is that our applications end up being extremely thin. There'll be almost no code in the applications themselves. All they do is load a bunch of frameworks. Everything is in the frameworks. And by doing that, you can reuse those frameworks across as many applications as you need to realize the site.

We have minimize use of frames and user interface. It's just an optimization, I mean clearly frames are necessary, frames are necessary, but frames can cause a lot of issues, they can cause a lot of extra traffic against your site. As well when you're doing dynamic applications where you have to update content across multiple frames, you'll find situations where you end up having to reload the whole page, which means you get one hit to load the frame set, one hit to load each frame and that can be very, very expensive. It also means that when those hits for the frames come in, the browser, it can lead to a lot of bugs because the browser, you don't know what order the browser is going to load the frames in.

And it's going to load both of them simultaneously, whichever one gets there first is going to be the first one to load. And if the user hits the stop button, then okay, one frame loaded, the other one didn't, how do you know? You don't. So that can just be a lot of confusion there. Use direct options wherever you can. Direct actions are wonderful. Not only do they allow for bookmarkable sections within your application, they can also be very, very fast because they don't go through the full request response handling. They don't have to do form processing or things like that.

You can certainly use form values with request stack, with the direct action request handler, but you don't have to. And beware of mixing Java and Objective-C. Yes, certainly the environment does support fully mixing these things in just about any way you want to. There are a couple of little subtle limitations you can run into. However, there are some serious performance issues with going across that bridge between Java and Objective-C and it should be avoided. It's also very difficult to debug.

Okay, so I'm going to turn it back over to Alex here. Okay. So those are some good pointers just sort of up front. Think about when you're structuring your application, you're taking apart your problem and figuring out how am I going to go about building a solution. We organize things well, we do some good planning about database access, and do some good planning about caching of our data in order to minimize our round trips to the database, and we've thought through our framework design and everything else.

And the app's done and it's up and running, but okay, you know, it's a bit of a pig. You know, let's assume maybe the opposite situation is you've inherited a pig that somebody else has built, and now it's time to figure out a way to make that pig fly.

So this is, you know, all the planning up front is all well and good, but as we all know that, you know, no good plan survives contact with, you know, the enemy or reality or your customers and what have you. You know, there's just a limit to how much you can get right planning up front.

You get halfway through this thing and the design changes remarkably or your client calls you up and says, "Look, our business model has changed." And, you know, so it doesn't always end up working out by the time you get the app written that it's exactly the way you thought when you started out.

So, okay, so it's a pig. It's a little too slow, you know, it sucks more memory, you know, you're like, "Damn, that's a big application instance size, isn't it? Fifty megabytes and nobody's even started a session yet." You know, you go off and do this request and it's like, "It's on my machine and that's two seconds before it even started loading the browser." You know, you're testing this out with a hundred users and the CPUs on your multiprocessor son are pegged and it's like, "Wow." All right, now what do you do? Okay. First of all, don't be silly. All right, you know, this seems like trivial advice here but this is actually good.

We've had some situations where we've had clients who got stuck with, you know, they had woke caching enabled turned off, you know, for debugging purposes. And, you know, they're like, "Oh, I'm not sure what I'm doing. I'm just going to do this." You know, they managed to get into production with the big sites on and caching was turned off. So every time somebody loaded any WebObjects component, it reloaded it from disk.

In fact, there's a particular client I'm thinking of, you know, I had actually coded into the code for the application to turn this off explicitly. In eight places. In eight places. And so, you know, everybody kept saying, "Ah, I found it," you know, and they take this line out and it would still be--still suck. All right.

But, you know,

[Transcript missing]

When you got to do some fetching, let's optimize this a bit. It seems obvious but it's definitely a good thing. If you've got pop ups, you've got reference data, you've got stuff that is constant across everybody's stuff, fetch it at the application level in a shared editing context and keep it there. It's really easy to start coding up using the default editing context for a session and start doing stuff there. You can end up with copies of data in every session and every editing context and you just don't need to do that.

Use the session's editing context only, and I mean only, for data that the session's user will actually edit. If he's not actually changing the values in it, you can share the data with everybody else. You can have a session specific list of things I'm interested in or what have you, but it doesn't mean you need a session specific copy of the actual data. That's just a general good rule of thumb. Is the user just going to use the data that's in the session? Is the user just going to edit this piece of data in his session? No, then we don't need it in the session editing context.

Okay, you get to the stage where you want to avoid doing fetches in order to draw the pages that the user is looking at. A good idea is to cache and share data that's used to draw the pages that the users are looking at and try to keep that cached data up to date.

You end up with a situation, like we've done some financial sites where people are putting in bids and offers and doing trades and such, so that session A, app instance A, is going to put some data into the database that everybody else needs to see. You need to find a good way to make sure that everybody's information is up to date in a timely fashion. There's some really neat stuff for, you can do inter-application messaging so that the individual applications don't have to fetch from the database every time.

There was some good work that Dave Newman posted originally on doing snapshot updating. We've done a bunch of stuff to modify that stuff, but that just avoids you having to go to the database to update your data. You can also use the time between request response loops. We mentioned up in the design session, you can just, when nobody is actually requesting something, go in and do a fetch and update the cached data. That's a little less efficient than notifying the various app instances that the data has changed, but net-net, when it comes time to handle a request for a response, you know you've got up-to-date data. You know you've got up-to-date data in your application.

You don't have to go to the database. Of course, if you've got to get some sort of non-object based data out of the database, go ahead and use the raw row stuff. It's quite fast and doesn't involve instantiating objects. Don't try and get around the whole object mechanism using this stuff, but if you want to know if there have been any changes to the database with, in this particular time frame where you want to, you know, you can use raw rows for certain specialized stuff and it's quite fast. All right.

The thing that really bites you in the butt is, you know, you have this picture of what the application is doing. And, you know, you think it's being very efficient because you optimized the design before you wrote the whole thing. And it's still slow and the database is still cranking away. So, you know, obviously you're doing fetching where you didn't expect to do fetching.

So EO-adapter debug enabled is your friend. You know, turn this on and you'll see all the SQL that's being generated. You go to this page and you think there's no SQL, no queries involved in this page. And you go hit this page, it's like query, query, query, query, query, query, query, query, query, query, query, process, process, process. You're like, where's this all coming from? And, you know, it's very easy to discover that, you know, in your WAD file you're referencing object.relationship.relationship.value. And, you know, you're smart caching where you preloaded the data up front. You didn't use any prefetching or anything else like that.

And the data on the other end of these relationships hasn't been fetched yet. And so you go to visit the page and you've got some binding here and that forces several fetches in order to get the data to answer the binding. You know, especially bad is when you're just saying, you know, you're like testing to see whether or not we should show this component or not.

You know, does this object have, you know, one of these things on the end of this relationship? And so you fault in the relationship only to find out, no, it doesn't have anything and you're not going to display anything anyway. So be very careful about, you know, what you bind to and how you answer some of these questions.

This actually raises an interesting point about WebObjects in general. I know virtually nothing about databases. I'm lost in when I hit a relational database. You give me a raw SQL window, I don't know what to do. But with EOModeler, even I can set up a really complex database, generate it, use it, and do very useful things.

Of course, I can't make it go fast. I mean, the power of these tools can be intoxicating. It can lead you to some trouble. And these things here, you know, using the adapter debug enable, looking at the database plans, things like that is critical. Because it's very likely you're going to have someone on the project like me who can make the thing work at the object level and is going to realize an application that's just going to be a pig in production at the database level.

So, yeah, there you are cleaning up after the pig. But one of the good things you can do to avoid excess faulting is, when we were talking before about having separate data models for data entry and data display, you can have instances of, you know, like the product that you're going to display on the screen or whatever, the article that you're going to display on this article page that you've tuned for the runtime application, and you do things like flatten relationships in.

And so, you know, testing to see if you have a picture, and if the article has a picture, then we need to put it in the picture component or what have you. If it's been flattened in, you know, we can check the value without causing faulting. If you have this thing as a separate relation, you know, article.image, in order to say if article.image is not equal to null, you have to fire a fault. So you can optimize your fetching behavior by tuning the EO model and flattening relationships in for presentation.

One of the most common mistakes that we've had to clean up, I'm sure none of you would do this, you're all very good, is you're building the components up one at a time, and you think, oh, this component, I'm going to need this pop-up list of all the states in it, and the country.

And so, like in your init method, you write a little thing in there that does a fetch of all the states, all objects, so that you can populate the pop-up. Because, you know, you're just writing this one component at this moment, you're not thinking about it. And, you know, six other developers on the project for six other pages that have a list of all the states also write the same thing.

And so every time these components are initialized, they go off and do the database and fetch this stuff in. And components come out of the cache, and then they are recreated, and they fetch in, and components are cached in different sessions. And each one in its init is fetching in this list of 50 states.

You only need one copy. It's not like the states change all that often. You know, you go through and you clean all this stuff up, and you move this stuff off to application in the shared editing context. Everybody's got a pop-up or browsers or things like that, you know, valid regions that we ship to, so on and so forth, can get this common reference data out of one place, and not try and do this stuff on each thing. If need be, you know, fetch all the units. You can fetch, you know, all the objects you need, and then you can use filtering, you know, to produce the stuff that you need for each individual page.

The other common mistake that involves excess fetching is, okay, you've got this, you know, you've got this thing in the shared editing context, and, you know, you accidentally cause stuff to be fetched into the sessions editing context by sort of not managing, you know, which objects are in which editing context.

You know, if you must, you know, if you, you know, you've got to the point where the user's going to edit some object, and you use local instance of object, you know, to get local copies of the object without doing faulting. I mean, basically without going to the database to get this.

You know, basically it's creating a new instance of the snapshot data in a particular editing context, and doesn't require round trip to the database. So you carefully manage when you move things across the boundary between the shared editing context and the sessions editing context. And, you know, follow this stuff all around. Have a policy.

You know, these objects are all here. We'll only have this object when we do this or whatever. And, you know, stick to it, and that'll soup things up a bunch. Okay, optimize your EO models. Again, there's a tendency to go batshit on your EO model or come up with like the perfect, you know, normalized abstracted EO model with everything as an object and so on and so forth.

I mean, you know, we had one client who had, you know, who had a table for gender objects with a row for male and a row for female so that they could have all the people who'd signed up for their site, you know, have a reference to either the male object or the female object. You know, it's like, oh, please, you know, use flags.

You know, simplify some of this stuff down. It may not make everything an object, but, you know, they were faulting these things in all over the place. And it was like, no. Okay, the other cool thing is EOF and inheritance. It's cool. You can do just amazing things with this. And I know I've been given the, you know, EOF inheritance abuse award a few times.

You know, think seriously about, you know, how much of the inheritance stuff you absolutely need to have in your model. You know, if worst comes to worst, you can do a complex hierarchy for your editorial tool and simplify it for your application. But there are a lot of cases in which having a complex inheritance hierarchy, especially when you're doing deep fetches, which is, you know, I've got 15 types of users and I want to select all users who haven't been here since last week. And I've got to do a fetch against each one of the 15 tables.

Even when you're doing something like single table inheritance, it's going to do, you know, fetch against table A where flag equals one, fetch against table A where flag equals two. Each round trip is expensive. What you're trying to minimize is not the data that's pulled across, but the actual number of round trips to the data server.

And so the complexity of your inheritance hierarchy, especially when you're using deep fetches, can cause a lot of round trips to the data server. This again brings up another point where a person like me can get you in a lot of trouble. I think in objects. You know, I look at a bunch of users and I think a big, you know, an inheritance hierarchy, yeah, that makes total sense.

But EOF provides a brilliant object-oriented interface to a relational database. And a relational database doesn't do inheritance well. The data-oriented databases do, but there are other issues there. So keep the object model simple, not because the object model being simple is great, but because it's going to make the database that much faster. And you can overload these tables too.

I mean, you can have complex objects that you use for editing and then slap over on top of the same table a simplified object, maybe with flattened attributes and what have you, that you use for presentation. You know, maybe we're doing on this page some simple piece of information processing and, you know, we can take a, you know, create a new user entity that, you know, spans the important shared part of all the other user entities and we'll just do a query against that. And it doesn't give us the whole complex hierarchy, but it gives us enough information to answer the questions that we need to do. And it's only, it doesn't have an inheritance hierarchy at all. You know, there's tricks you can play like that that will simplify things.

You know, again, you know, think about what you're going to use these things for. Use batch faulting where appropriate. You know, you can basically, you're using, what this does is you sit there and you set batch faulting in your EO model to say when you're going to fetch this object, why don't you fetch the next 10 because we might need them. Basically what you're doing is you're pre-populating the cache that's stored by EOF, the snapshot dictionary of your objects.

But then you need to make sure that you're using that appropriately. If you've got two one relationships in the same editing context and the object on the other side of the object, you're going to need to make sure that you're using that appropriately. If you've got two one relationships in the object on the other side of the two one relationship is already in your cache, it'll go find that without faulting the database. But, you know, it's not going to, you know, if you've got a too many relationship, it's going to have to go to the database anyway.

I can't tell that it's got, you know, all the children in there for the parent because even though, you know, you know and I know that, you know, all three children have already been brought into the snapshot dictionary, you know, it doesn't really have anything that can tell to make sure that the, you know, the list is complete.

So it has to go to the database even if the result of this is that it can satisfy the too many relationship out of the cache. You know, use prefetching. This is, you know, this is transmogrified from the earlier days to the current days from, you know, hints to actual directives.

And you can just say when you're going to populate this object, you know, populate these things on the relationship. This is useful for when you're building up your cache to make sure that later when people start using the objects and following the relationships to things, the objects on the other end of the relationship are already there.

And just, you know, beware of excess complexity in your model in general. You end up with extra pointers to various objects in there that can then cause further fetching activities or in certain cases, excess back pointers can prevent prefetching from working the way it's supposed to. So once you set up all this prefetching, you got to actually watch the stuff with adapter debug enabled to make sure that the right objects are being fetched when you expect it to. All right.

So EOF does all this great stuff for you. It'll build your tables. It'll build your database. So on and so forth. You know, really nice stuff. The one thing it doesn't do for you out of the box, it doesn't create indexes. All right. You've gone off and you've created all these objects that have unique primary keys.

You're doing all this fetching based on unique primary keys. Create indexes on those things. Also look at your queries and see what you're doing. People are doing these sort of queries where they've got fields they can type in values and do a search. What are they searching on? Create indexes on those values.

You can speed up your database activity tremendously by, you know, properly indexing things. If you're not quite sure how the database is using stuff, this is a great thing. Everybody who's not like a database geek doesn't know about this, but this is a database propeller head thing, you know, for sure. You know, Sybase is show plan in Oracle.

It's explain plan. You turn this on and run your query, and it says, "Well, you know, I was going to check in this table and that table, and then I was going to gather this information here, and then I was going to process it and do that stuff there." And it tells you exactly how it's going to go about giving you back the three rows of data you would actually get from your complex query.

And one of the useful things this will tell you is, you know, and then since you asked the question in just this way, I decided not to use your index and to do a table scan instead to get the results out. And, you know, by sort of doing explain plan, fiddling with your indexes and what have you, you're going to actually make sure the database is doing what you want it to do, not what it thinks it has to be able to do.

I've got plenty of time. Yeah, we can go over. It was a two-hour session, right? Okay, no problem. No, we're getting close to the end. Other good tuning thing is, you know, the database is running exactly the way it's supposed to. It puts most of the information that you're going to access on a regular basis into memory cache, and you can check that with the database statistics to find out, is it doing that or is it going to the disk every time for your data, and you can tune that.

And also just more silliness, and you may have to get somebody to use a database with to come in and do some, you know, tuning in the operating system. If you've got a multiprocessor machine, oftentimes you have to actually tell the database to use all the processors. Similarly, databases often have a bunch of parameters about how much memory they use, you know, how much data they put in there, how much stored procedures they put into memory.

Tune that appropriately because you're going to have this big piece of iron that's basically sitting there idle because the database is trying to run a little tiny slice of memory on one processor, and it doesn't do anyone any good. Speaking from experience, databases run really, really slow when they're tuned for, say, 512 megs of RAM, but you only have 256. So, you know, it's a little bit of a challenge.

[Transcript missing]

Refactor your software. Once you get the thing built, once it's working right, And you've found where your bottlenecks are, start to, you know, compile anything that does serious calculations. Look at optimizing your calculation engines. Look at generalizing that and moving it out of sort of the application layer and into the backend layer. And really treat it like a serious calculation engine that you want to maximize performance of and then use it from the upper layers. Simplify your application and session objects.

This is more, this isn't really about optimization as much as facilitating optimization. What you want to do is if you have, say, something that does region management, going back to the store thing where you've got multiple regions, at the application level and there's some complex, say, product selection or product availability on regions. As an example, we did a record store. Record stores, there's certain records you can't sell in certain countries in the world.

So we have a region manager. We push that region manager into an object of its own that you can access through the application. By doing that, it moves that functionality out of the application level and it means that as we're optimizing that and we're optimizing other things, we're modifying something that's relatively isolated.

And finally, don't forget about the web server. Is the web server optimized for the environment? A classic example of this is, okay, so you're running against Apache. You've got that WebObjects adapter in there. You've tuned your application out to the nth degree. Oops, you're only running five Apache servers. I did that once. Tune it.

Make sure it has the appropriate configuration for the amount of load you expect to have. Use a mixture of your static and dynamic content. Wherever you can use static content, again, that's just going to boost performance. Direct actions allow you to integrate the static content with the dynamic content. If you enable, okay, in WebObjects, one of the great things about HTTP, since it's totally stateless, is that when you have a user having a user experience with your site, you have to pass around a user ID, a session identifier.

Well, normally, by default, that session identifier goes back and forth in every single URL. So every time a hyperlink is generated in the dynamic content, that hyperlink has to have this big, long, nasty number that identifies that user such that when the user clicks on that, WebObjects can figure out what session to associate that hit with.

Well, if you move that to the cookie, you know, cookies have their own problems, but pretty much they're supported everywhere now. By doing that, pretty much all the URLs in your content no longer have to have user-specific information in them. This allows you to integrate your dynamic content and your static content. So, for example, again, using a record store example, we may have static pages that describe albums, static pages that describe artists. Well, those don't change very often. Leave them on disk as static.

Let the WebObjects application navigate over to them, have direct actions in those pages that bring the users back into the WebObjects application. The more hits you can get against the static stuff, the more off you are. I'm going to move quickly and do some QA. Okay. Well, we actually--do we have 10 minutes after to do QA? Fifteen.

Fifteen. Great. Thank you. Optimize for fast browser display. This is another little war story here. We had a client and the content generation, the content delivery was really, really slow. And this was back in the days when tables didn't really quite work right and you couldn't really specify image sizes. So, we had to do a lot of customizing.

We had to do a lot of customizing. We had to do a lot of customizing. We had to do a lot of customizing. We had to do a lot of customizing. We had to do a lot of customizing. We had to do a lot of customizing. We had to do a lot of customizing.

We had Smaller pages display faster. The less HTML you generate, the faster it goes out. The less dynamic content you're generating, the faster it goes out. You want to batch your displays along sets of data. Not only does the user not want to see 3,000 products all at once, this makes things go faster. Show them 10 at a time, show them 15 at a time.

Generate short URLs. This again gets back to the spacer.gif thing. Instead of /images, use /i. This also do better with images and just everything surrounding the static resources that are associated with every website. Split installs and WebObjects are very, very convenient. They're very useful. We never do them. And it's not because they don't work or anything like that. We never do them because we put all of our static resources as close to the top level of the web server as we can. We leave it there and it reduces the amount of HTML we generate.

You want to also improve the structure of your HTML. Now, this isn't as much as an optim... This is optimization, as in optimizing towards a working application, not a fast one. Use an HTML code checker, such as WebLint, which everyone should be on the WebObjects mailing list at OmniGroup, www.omnigroup.com. If you're ever planning on doing any development with WebObjects or you're even interested, immediately sign up for that list.

And the reason why I mention that as well is because we're going to be throwing a bunch of code out there next week when we get a chance to go back. And one of them is this thing called WebLint. What it does is it looks through your WebObjects HTML or your generated HTML, checks the structure of it, makes sure everything lines up... Simplify your table structures.

It's tempting to nest tables deeper and deeper and deeper, especially when you have an object hierarchy or a component hierarchy. You want to reuse all those components. Every component needs to guarantee that it displays correctly so it has its own little table. That's really slow. It's really, really slow in Netscape. It's just very slow in Internet Explorer. And watch for nesting problems.

especially things like nested forms. If you open a tag, don't close the tag until you've closed every other tag inside of it. And always make sure you close the tag. In the HTML standard that seemed to have come out of the early browser implementations, closing a table cell, closing a table row, even closing a table, closing forms was pretty much optional.

That doesn't work when you're talking about dynamic content generation. And it's gonna break things. And also, when it gets back to actual performance, one of the risks there, especially if you have forms that are mis-structured, is that you can get incomplete data back to your application, or you can get broken data. And you can get a performance hit as your application goes in, oops, exception, and has to go and deal with maintenance stuff associated with an error condition or an exceptional state.

The classic one is the overlap problem. I can't tell you how many times we've worked with HTML producers or we have been doing the HTML ourselves and just a simple HTML overlap where you open a form, you open the table, then you close the form and close the table. There's a problem there. And this can cause some serious problems. The forms don't work. The processing must be broken. The code must be broken. It's like, no, it's actually in the HTML.

And one of the things to keep in mind is that especially when you're a developer, you focus on a single component. I'm doing this component. Well, sometimes problems can span across multiple components. And what we like to do is we check, we use WebLint on the components themselves as well as on the entire generated content.

And for more information, this here as well, please sign up to the OmniWeb, OmniGroup mailing list and there will be a lot more information coming out. After WWDC, there's always discussions on the mailing list, follows up from the sessions, etc. I'm sure Dave Newman who's making a bunch of code available will post information there as well.

Dave Newman: All right. Well, we hope this stuff was, you know, was a good start. I hope this was a good start at optimizing your applications. We've got some time for a whole bunch of QA. Now, the usual who to contact. Let's do a little question and answer. First of all, a big hand for our presenters.