Enterprise • 1:15:34
Learn about deploying WebObjects into real world situations. Topics include a tour of the tools available for determining where the bottlenecks are, configuration options for addressing scalability issues, and how to achieve scalability in a secure fashion. Load balancing, content aggregation, and other issues of deployment are also discussed.
Speakers: Bill Bumgarner, Max Muller
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
And we're also going to cover things like when good apps go bad. It'll be a Fox special this fall. Like if you'll notice, if you deploy on, say, your eight processor CPU, or eight CPU Solaris box, you can see your CPU usage go to 800%. That's always fun. You can even see it go to 900%, and then it takes down the machine next to it.
There's also problems with like when your working set gets larger than physical memory, and then disk is obviously a lot slower than RAM, and when you start hitting the disk for memory, that's bad. Also, there's the situations that a lot of people seem to miss, which is like network saturation when packets or connections are being dropped.
I know of one example of a very large company who will remain unnamed who called the FBI because they thought they were being hacked, and it turned out that they had two Windows boxes, and their traffic rates were so high that the TCP/IP stack was falling over and dropping connections. So, you know, that gets back to the analysis thing, and calling the FBI to analyze the performance problems of your apps is probably not a good idea.
There's also just the basic situation of like when your responses require too much computation. If you're going off and calculating pi to 100,000 digits to bring up your welcome page, that's going to be a problem. Your database can be obviously overwhelmed. You're hitting it way too often or you're just pulling too much data from it or pushing too much data into it. That can be bad. And a great one is external services because no external service ever wants to admit they're wrong. So instrumenting your app such that when they're wrong, you can prove it will make your life a lot easier.
So how can we fix these problems? Well, unfortunately, dollars are always involved. The basic problem is you can just kind of throw money at it and get more CPUs, get more memory, get more network, get a faster database, that kind of thing. But that won't always work, and that's what we're going to focus on is when that doesn't work and you need to throw engineering at it.
And what you're really looking to do in that case is do less work to generate the response or to make more efficient use of the database, or ideally, don't hit the database at all. Or optimize your external service integration, and that becomes especially critical as your site grows in size and you need to start relying on those services more and more.
So good rules to code by. These are somewhat obvious, but they bear repeating because when you get into the thick of things, it's easy to forget. Test-driven development. Now this is something that's really become popular in the recent years, and I can't emphasize how valuable this is. Put the unit tests together. Do it before you write the code.
Test the capabilities and the requirements of the code, and then run them every single time you build. And when you push to deployment, run them there, and run many of them in parallel if you can. They will uncover so many obvious problems and save you a huge amount of time.
Another obvious one that everyone misses, including myself, is make it work, make it right, make it fast. As developers, we often like to make problems a lot harder than they really are, because then our egos are boosted when we solve a hard problem. It's like, look, I've sorted something 20 times faster than the next guy. Never mind it's only 10 elements. Don't optimize without analysis. And I can't say this enough times. I've walked into so many situations where someone's got the world's most optimized means of writing out the HTML page that's only used once when the user signs up the first time.
It's like, come on, you know, go fix the problems that are really there. And also, optimize in small tests and test your results after each of those steps. And this gets back to unit testing. If you've got the unit tests in place, then this makes optimization a lot easier, because you can go off, you can do the optimization, and know immediately if you've broken things. And, you know, the last point is just, it's an obvious one, but everyone does it. If it ain't broke, don't fix it, which as object-oriented developers, that means if it works now, don't generalize it, because that's one that we commonly do.
Also, there's a lot of optimization that you can do at design time. If you understand your problem well, and any optimizations you can do before the first lines of code are written are always good, keeping in mind that you shouldn't optimize things too prematurely. So you really want to understand the usage patterns of your applications, know how users are going to use it, know how it's going to be administrated. Now, clearly, you can't be omniscient, but you can do a lot of upfront work there.
And of course, make your entry page fast. I can't tell you the number of times I walked into a client and I hit the entry page and it did 600 SQL queries. It's like, no, no, no, no. Make it static. Even if you have to have an entry page that just comes up that's just long enough to get them into the site, that's fine. But make that fast.
Your business logic in your application should be designed around the response generation. It should be designed around what the customers are going to be doing with the application, even if that makes the administrative tools less convenient. You know, the administrative tools, they need to be powerful, they need to be intuitive, but if they take a little bit of time, hey, that's okay, because it's all about administrating the content for the purposes of the customer. And if your business is like any business I've ever heard of, it's the customer that pays the bills. And, you know, I've seen a lot of people that lose sight of that. You also want to make sure you're retaining and reusing data.
and know when it's out of date. This will come up again, but it's something that bears repeating. You know, if you go into the database to pull out the list of states in the country, it's more likely not going to change in the next five minutes. So keep it around. That introduces some complexity in working with enterprise objects because, of course, you can't create relationships between entities that are in different editing contexts. So you have to be a little bit careful. And you want to manage that cache data carefully.
Because if you end up caching everything all the time and just leaving it in memory, then you're going to get back to the part where you run out of memory, your machine starts swapping, and then your performance goes to hell. So you also want to let the database server share in the work. Databases have been around a long time, and they do things like sorting stuff really fast. They also do indexes and all this other stuff, and you really want to leverage those wherever possible.
So, one of the optimizations is to minimize your memory footprint. If you can minimize the footprint of the application, then you can have more instances running. You can balance the load more effectively. And this means sharing data across sessions, which is not something WebObjects does naturally out of the box. You're going to have to play some games there to get that kind of thing to work. It's not hard. It is detail-oriented. You also want to clean up thoroughly.
It annoys me to see a developer make the statement that because of garbage collection, they don't have to care about cleaning up their code. Okay, not every object that's going to be collected by the garbage collector is just using memory. It may be using scarce resources like connections to quote servers, or it may be keeping a file descriptor open, or something like that. And you want to let the code know that it's done and over with. Also, any nulled out reference, the garbage collector doesn't have to traverse to deal with collecting. So that's another good way to ensure cleaning up happens quickly.
You want to also clear your transient instance variables when they're no longer in use. And this is not just an optimization issue, this is also a debugging issue. If you have stale state in your object graph, and you don't clean up when you're done, and then you come along at some later point in time through some code path that you didn't really think about, and you run across that transient data, and now it's out of date and you didn't know it, you're hosed.
Speaking from experience, traders on trading desks get really irritated when they start seeing the wrong prices. Some scars. You also want to do things like set your right session timeout value. Sessions will stick around and then they'll automatically go away. Unfortunately, on the web, there's no quit button, so it's hard to know when the session should go away. This gets back to looking at the usage patterns. Look at the use patterns of your app, understand how they are, and understand when it's safe to make the session go away.
Instrument Instrument your applications. We've got some wonderful tools built in for instrumenting them, which Max is going to demonstrate shortly. They do a great job of allowing you to detect when resources are being used or when things are getting out of control. You want to review those results often. Ideally, you want the instrumentation to be something you can turn on dynamically in production so that if any customer calls up and goes, you know, your app's not working right.
You can turn this stuff on and figure out what's going on. And unfortunately, because we're building applications where it's guaranteed that our development environment is about as different from our deployment environment as is possible, there's going to be a whole series of problems that will only come up in production, which makes life an adventure. So instrument and collect and then analyze.
Also, you want to really plan the data access, plan when your queries, when your caches, when your cache updating is going to happen, and understand the data latency issues. Now, data latency is all about looking at your application and understanding when the data becomes stale, and how often do you really need to let app A know that app B's state has updated. And a great example of this is I ran into a client and a site, and they needed some optimization, and their site would just grind to a halt when it had a bunch of users.
And what was happening is they had all this user-specific state, a shopping cart. And every time the shopping cart got updated for user one, all other 30 app instances got notified that that user's shopping cart was updated, even though that user's shopping cart was only on one session in one app instance. You know, and there was a case.
There was a case where they used a generic notification method, and by simply removing that, all of a sudden their app was stable, even under higher loads, and they could have more shopping carts. That's always good. You also want to do things in memory wherever possible, and to try to get zero queries per response. I mean, the fewer times you go to the database, the fewer times you go to disk, the better off you are. And especially when you get into the high-throughput sites like the music store, you know, any database hit is going to be ordered.
So you want to do things in memory. And that's why you want to do things in memory. And that's why you want to do things in memory. You want to manage your faulting and manage your caching so that you can explicitly update your stale data. You generally want to avoid situations where WebObjects is either deciding to populate relationships on its own or populate caches on its own, because it will choose a very general-purpose solution that is probably not optimized to your caching. actual use patterns.
And you also want to use a shared read-only, well, I say read-only shared, it should just be really called a read-only editing context for reference data. It's a little bit tricky because, again, you get into the situation where you've got to be careful about how you make relationships to and from objects that were fetched into that editing context. But, you know, that way you can have that single shared editing context that's read-only so it never pays the penalty of doing updates or inserts or anything like that.
And then another one that's really something that's more of a modern optimization, this is something that's become much easier with recent releases of WebObjects, is to partition your functionality across multiple applications. And what that means is that if you've got, say, a site where it has an expensive search operation, plus shopping cart management, plus, say, a library of information, plus a couple of other different things, an administrative tool, then you partition those different features into different applications, and then you can control the number of application instances individually, and control the configuration of those applications individually to optimize those particular applications for the use they need.
And that's very important. And with direct-to-actions, and with putting the session ID in cookies, and then being able to reconnect across different apps, you can achieve a lot of efficiency. And what you can also do is use optimized object models per application. So you can go into Enterprise Objects Modeler, and you can bring up your object model, and you can have your full object model for your administrative tool. It's got read, it's got write, it's got everything. But then all those entities... And that's really important.
And with direct-to-actions, and with putting the session ID in cookies, and with putting the session ID in cookies, you can create a very simple application model. And you can create a very simple application model that's just the fields you need for that particular application. This will reduce the memory footprint size, it'll reduce the amount of data going to and from the database. It just overall will make the app faster. It can be a little tricky because, of course, then you've got to keep the two things in sync. That's a pain. But if you're faced with this problem, it can really help a lot.
That course obviously maximizes reuse through frameworks. Again, got to point it out because some people forget about that. I've run into that a number of times. And you also want to partition between sessionful and sessionless and threaded and non-threaded because threading is always a very complex issue. There will be certain apps where threading is an obvious optimization and certain other apps where it's not so obvious.
In particular, things like where you have multiple writers to the same database, you probably don't want to thread that because bad things can happen when you cross commits or do partial transactions. Sessionful versus sessionless, there may be a number of apps like search apps are often things that don't need to have per user state.
And so if you can get rid of the session in a search app and then make it such that it's only caching, has one big cache for all the search data, you can make things extremely easy. And so you can get rid of the session in a search app and then make it such that it's only caching, has one big cache for all the search data, you can make things extremely easy.
Okay, so you've done all the right things in development time. You know, it's been the world's most perfect development schedule, and you even delivered early. And you got the app in production, and now it's too slow, or it's using too much memory and you're thrashing the disks, or, you know, the CPUs are just like big space heaters. Or it's just occasionally just crawls to its knees just really, really slow.
So now what do you do? Well, the first thing is don't be silly. And this comes from years of experience. I've been very silly myself and have seen many developers do really silly things. Turn on or off the obvious flags. Woe caching enabled, yeah, that's a good one to have on. Woe debugging enabled is a really good one to turn off.
It'll bring an app to its knees, go into production, all of a sudden 100,000 users hit it, and it's trying to log every SQL query. Yeah, not a good idea. There's the built-in NSLog facility, plus there's, of course, Log4J, which is wonderful because they can both be dynamically configured such that you can hit a production server and pay the penalty on logging on only the areas that you know are problematic. And that gets back to instrumenting and making sure that you have... dynamic instrumentation so you can actually catch problems in production.
Put indices on your database tables. Sounds obvious, but, you know, that's one that people often miss. And then they'll go into production, their data sizes grow to a couple orders of magnitude bigger than they are in development, and all of a sudden they're left scratching their heads as to why the heck it's so slow just to bring up a simple page.
And this is another one that's funny. Minimize the size of the generated content. We ran into a number of sites where the initial page load, which is a nice big beautiful page with a couple hundred images on it and a bunch of text and all this other stuff, it was about 160K of HTML, which is way too much, of which 40K of that was comments.
And another 30K of that was because the image URLs were all slash images slash clients slash sites slash codenames slash foobar.jpeg or gif. And by simply erasing all of that and making it slash i slash for the images, we were able to reduce the page from 160K down to like 50, 60, 70K. Still too much. And of course, again, analyze, analyze, analyze.
One of the challenges with building a WebObjects-based app or other dynamic applications is that as a user goes through the site, unless you specifically do the engineering work, you're not going to get a good record of what the heck they did.
[Transcript missing]
Thank you, Bill. So what I did here is built two very simple little applications. One is a Cocoa Web Services app that just makes a simple query to a WebObjects app I have also running on this machine. And it just brings back just a bunch of SOAP objects and lists them here. And then you can click through them and either choose to, and then if it's two Rs, you can choose to update them.
And that basically makes the SOAP callback. So it's just a very simple little Cocoa app. And on the server side, we have a very simple WebObjects app that threw in a bit of direct web. So you can basically see the current users. I mean, this is something that you can do out of the box very, very quickly.
So the question then becomes, if things are slow, what do you do? So the first thing you can look at is the stats page, which will kind of just look at your overall statistics of your pages and say, you know, what am I doing wrong? And it also gives you a good idea of... Whoa, stats.
Also can give you a really good idea of just where the high traffic sites that you're hitting or the pages that are coming up. So you can see that this has had 66 pages rendered so far. And what you can see here is that this will give me the number that has been served and the number and kind of the averages and the outliers. So obviously, given this app, I should optimize the event. So I'm going to go ahead and do that. And then I'm going to go ahead and do the display page because this is what's getting hit the most.
But it also can give you a good idea that you can see that the first query is taking 1.2 seconds to come up, and the first inspect page relatively quickly. And the list page, you can see that it's rendered eight times, but the first time it took 3.66 seconds, whereas the average is 0.6 seconds. So that can sometimes tell you that you have something that's coming up that is rather slow.
And so to then drill down and figure out what exactly am I doing wrong with that, you can go over to something called the events. So the stats basically just gives you kind of an overall look. And with Music Store, we'll log into various different apps and just kind of check out to see what the current apps are looking like in terms of what their averages are, what the outliers are, because sometimes we'll get some very long ones that we'll need to start looking into. So for the events, so this is -- it's actually already -- but let's just go to the setup.
So this is all just stuff that's just, you know, comes right with WebObjects. Very, very simple stuff. As long as you specify the password, nothing worse than trying a million passwords and realizing, oh, it's not quite in the properties file. And those are -- it's right there. So we're turning all of the events on, so we're setting it to everything, and then we're going back to our app and we're hitting the list all users. So that we can go right back to this and show the event log.
Now, this has many different options, and it can somewhat be rather non-intuitive exactly how these are organized. The one that I always like is I always just like to look at the events group by the page and by the component. As this can show, that off of the main page here, that... The main is obviously the one that's hurting the most. And the query page is coming up slow as well relative to everything else. So pretty much when you look at this, you can say, well, geez, the main page, that's the pig.
And so when you turn event logging on, it pretty much just covers all your application. And so anything that's going on is getting logged with events. So then you can say, well, geez, OK, it's not the new list page. It must be something else. So it just kind of gives you an ability to drill down. So you can see that this-- and voila, list users.
Well, what's going on here? Ah, objects with fetch specification. I think I've seen one of those before. So you can see it on list users. On the pull, that's the action. That's what I had bound up to the action link. So when you're clicking on the link, it says list my users. So you can see that what's hurting here is the fetch.
Right here is the fetch spec of pulling up the users. So... Later on we'll come back and basically show some more fine grained tools after we go through some of the database optimization to then try to discover what's going on. So, if we can go back to the slides.
So, as Bill mentioned, my name is Max Muller. I am one of the lead engineers in the iTunes Music Store, and I've been on this now since the very beginning. So we worked very hard and came across all the issues that Bill, that Pitt went over and made some of the mistakes, even though we've all been, you know, pretty much most of us on the team have been doing this for, you know, since the WebObjects three days.
So this is kind of just more stuff that we've stumbled across, that we've had to optimize. Being one of the lead engineers, I pretty much get to eat, breathe, and live optimizing these applications. When we launched in Europe, the average from when we had launched to Europe, we were selling on average 5.86 songs per second. That's how many we had since we had launched. So if you're doing many things per second over an average, you know, obviously the peak's much higher.
The trough's lower. Small little problems can very quickly turn into very large problems. And that's one thing that we really found. You know, we had to spend a lot of time tuning the database because WebObjects itself is very fast. If you've got just a pure WoW app, it's not doing any database work, and you've somehow made it slow, you've really done something wrong. Because out of the box, it's very fast. I mean, you're able to generate responses very quickly. And the request response. very quickly.
So in terms of getting down to the database work, a lot of the stuff that you can have happening in kind of your administration apps can also very quickly affect things that are taking place in your production applications. So we had a content management application that our content team is constantly in there working on building the new storefronts.
And we would notice that sometimes during the day, the store would get slow. And it turns out that they're in there just doing all these queries that are bringing back very large sets of results. And it's actually causing the database a lot of pain. And so the store itself is starting to get slow because the database is having to service all the content requests rather than actually the requests of people wanting to buy music, which is not a good thing. So putting in place fetch limits, putting in place requiring. Certain queries can significantly reduce the amount of database work that your database is doing, which your users actually might not be seeing.
So there's also a number of tools from database vendors taking queries and handing them off to your DBAs. It can also be very handy. One bit that's out there that we did was when we opened a new connection to our database, there's actually a stored procedure that you can call in Oracle to put in information about the connection. Because by default, for all the JDBC stuff, when it connects in, all that the DBA is going to be able to see is that it's a Java process.
Yeah, well, that doesn't really help you if you've got a whole heterogeneous mix of back-end processing apps, store apps, and a very large environment of Java applications. So you can put in information into the connection so that if they see some query that's running amok, they can actually look at the connection that's causing that query. So we put in the application name, the host it's on, even when it started up. Because sometimes we found that we'd forget to stop an instance and it would be off in la-la land when we would have rolled a new version.
So we'd go back to the software and be like, "Wow, what's going on? I thought we fixed this problem." And lo and behold, it's like, "Well, that guy's actually been running for a week," kind of thing. So binary data in the database is definitely a no-no, especially if you accidentally check the locking column to where EOF thinks it needs to lock on that attribute. And so it'll issue the where clause.
If you do, moving down to a 2-1 relationship, also this goes back to what Bill was saying about having certain models or certain attributes only for what you're working on, for your administration work, and different ones for maybe the consumers. So maybe you have a large clob field that people enter in a bunch of notes about an album that shows up, this came from this, blah, blah, blah, blah, blah, blah.
Well, that clob, which could get very large, we don't want the store app pulling that thing up. I mean, so you can either take the approach of creating a new model, or at runtime, you can just turn off that attribute. Say, you know, really, UF, you don't need to worry about that one. Leave that in the database when we're in this kind of read-only mode, because nobody's going to be, you know, you don't need the clob.
So we use the shared editing context for pretty much just reference data, kind of complete types, in the sense that it's kind of like a, just a type safe enum that, um, - You know, where it says, you know, key one is this key, and key two is this key, instead of kind of having a lookup, that's pretty much what we use it for.
So when your app starts up, all the shared editing context information is loaded, and then it doesn't have to be refetched. And likewise, when you trip the relationships to the shared, information to the shared, to EOs in the shared context, it's not, it doesn't require a database trip.
So there is inter-app messaging. If you need to synchronize states between applications for critical pieces of snapshots, a lot of the times just telling it that it's no longer, it's no longer a valid snapshot is good enough. So it's not that you really wanna move all of the snapshots over and say, here's the new snapshot. It's more just along the lines of saying, hey, you know, this snapshot for this guy, it got updated by this one. So the next time you need it, you better go get it from the database.
Raw rows is a useful technique for pulling back large content where you don't need all the snapshots, stuff that you're not going to be editing. Within the store, we have all these popularity caches that are getting rebuilt. It's like, if you like this, you like that. Well, as the number of items that you can buy expands, we'll pull that stuff in with a raw fetch actually in a separate thread.
So this thread can just kind of sit in the background every so often and determines the needs to go out to the database and pull in a new set of recommendations. So when we're rendering some of the pages, it can sync it up. So it'll basically refresh this cache.
You know, in the background, in the background thread using the raw fetches. Caching and memory, it's good. A lot of the times, if you have kind of a read-only application, you can look at, instead of using one kind of shared editing context, that you'll then pull stuff into and then hold on to. And so the application will hold the reference to this. It's not a shared editing context, but it's a shared editing context. So.
Adapter debugging enabled, obviously. That's when you just can turn it on and say, what SQL is going on here? This one, it's a godsend. Java length throwables. Being able to generate backtraces anywhere is very handy. I'll show you how we can change the logging pattern at runtime to start throwing in backtraces anywhere we want, which can really help.
A lot of the times, if you're just looking at the SQL, you'll be like, whoa, hold on, where's that coming from? One of the very common mistakes that a lot of people will make is if they fetch an object and then they're like, oh, they pre-fetch everything, and then on the next request, they'll say, well, we need to go ahead and invalidate the editing context here so we make sure we have fresh data. But the problem is you've got the object graph there, and so you've got an object, and you've spent the time to pre-fetch out all the stuff.
You've got it in two or three or four fetches, but then you've invalidated all the data underneath you, and so all of a sudden, you start to trip over these things again, and it's like, oh, that's actually been turned back into a fault. I need to go out to the database again.
And you're like, well, geez, I pre-fetched everything, and now I've got SQL going out the yin-yang. So turning on adapter debugging can help you see it, but being able to actually see the backtrace of where the faults are firing. So I'll show you in the demo a trick that we use in the music store quite a bit where we can turn on.
There's actually a delegate hook, and we can throw backtraces whenever a fault is fired. So a lot of times we'll turn that on and then go to render a page that has somehow started to get slow for some reason. And a lot of the times it isn't because that page itself has gotten slow. It's because something else is triggering something that's causing the snapshots to get old or wiped out. So, yeah, excess faulting, that's a hot one. Also, another trick that you can do with the Java length throwables.
In a constructor of a Java object, if you have debugging turned on, some debugging flag, you can actually create a throwable object in the constructor and stash that away in an iBar. At which point then, at any point later on, you can always ask the object, what's the stack trace where you were created? Which can be very handy in, say, Woe Sessions, where sometimes we'll, you know, we have several apps that are completely sessionless.
And all of a sudden we'll start to see sessions popping up. And we'll be like, well, what's going on? And so we'll set this value and then we can, at a later point, get the Woe Sessions store. Just basically we'll say, dump out all your sessions and give me the back trace. Because there's somebody who's doing something bad. And, you know, more often than not, it's a Woe Active Image. Somebody put a Woe Active Image in. And if you don't bind it up correctly, it'll go ahead and create a session and create a component action for you.
Very handy, but, you know, not what you want when somebody just accidentally forgot to do a binding. And then all of a sudden you've got these pages that are generating lots of sessions. Because a lot of the times they won't be referenced. And so you'll get a lot of them created. So you can get one request and you can somehow get multiple sessions created. Which just causes all sorts of, you know, nightmares. Fetching it from pop-ups, yes.
Yeah, yeah. The local instance of object you have to also be aware of is if your fetch timestamp lag is set, the fetch timestamp lag is saying, you know, how new snapshot do you care about? So oftentimes what people do is they'll create a new editing context, they'll set the fetch timestamp lag to right now. Say, I want everything fresh. And then they'll start doing the local instance of object, and of course, then when they touch the object, it goes to the database.
So you could have fetched all this stuff, and then you're like, oh, I need to do a local instance now. Let me create a fresh editing context. And so that usage pattern, all of a sudden you'd be like, well, I thought I was doing something good, but it turns out that local instance of object can actually cause a lot of trips to the database. Simplified object model, if you can. We're at 300 or 400 entities right now at the music store, and it's growing more each week.
The deep inheritance, the vertical inheritance is the only efficient form of inheritance in EOF. I've used it for many years now. It works rather well. It allows you to have a user and then a person user and all these kind of things mapped onto the same table, and you can still have relationships to the top abstract entity.
So when you trip a relationship, you could be getting all the different sub-entities, but EOF handles that gracefully for you underneath the covers. The other form of inheritance is across multiple hierarchy, across multiple tables, and that one is rather inefficient because anytime you trip a fault, it's going to be like, is it in this table? Is it in this table? Is it in this table? So if you do have to use that type of inheritance, the best way is to always trip, to always model the relationships down to all the sub-entities. Bill Bumgarner So the views of the database queries, if you can get an efficient one that has the bind that, a lot of the times if you don't need all the bind variables coming through in a view, it can be efficient.
The access back pointers is a really hot ticket one because a lot of the times you'll have a situation where you'll have a user, you know, from a music store example, so you have a user that's got many purchases. So when somebody clicks buy, you know, you're going to be creating a purchase for them.
And so the tendency might be just to say, you know, create a purchase, add object to both sides of the relationship, you know, to get the user on and save it. Well, if you recall Steve's keynote a few while back, you know, the number one person in the music store, 27,115 songs at that point. That's a whole lot of purchases.
And so what happens is when you add object to both sides of the relationship, if that relationship hasn't been fulfilled, you're going to have a lot of faulted. You're going to trip it, which means you're going to be pulling in all those things. So all of a sudden you're like, well, geez, why is a slow, why is my app getting slow on random intervals? This buy took, you know, three minutes in production.
What the heck happened there? You know, not only that, but the memory footprint went through the roof. Well, we just pulled in 27,000 things just because they're trying to purchase one more thing. So if you don't have to trip the relationships, whereas, you know, if you just create the purchase, set the user, save it to the database, that's fine. So the back relationships, or you can just not model it, just, you know, remove the modeling in the model completely. So, I mean, yeah, hope this isn't too advanced. It's just trying to cover a bunch of stuff that, you know, is we've found.
That's right. That's a little known technique. Little known database technique. Sure. So, you can, you know, databases are, you know, they're built for these kind of things. I mean, they're, you know, that's what you pay the big bucks for the big tools is that they, you know, they provide all the tools.
And, you know, if you've got a good DBA, you can get in there, or you can get in there yourself and look. There's, you know, if you can identify your top queries in your database, then you can start to go back and look to see where they're coming from in your applications.
About once a month, you know, our DBAs will send us a spreadsheet and be like, all right, here's the top 10, go for it, kind of thing. So, you know, then we start hunting around, like, okay, where's this one coming from? And, okay, who did this kind of thing? So, it's very useful. Yes. So, generated SQL is obviously one. We basically focus on optimizing the parts that are getting hit the most. We don't optimize the copyright page if the copyright page is slow.
And stored procedures, they're They're useful for some things, in some places where they're very useful, but other places where you are, if you're using a stored procedure to update rows that you're also modeling, you can really wind up in a state very quickly where you've just executed a stored procedure call, it's updated something underneath, and now your snapshot's out of date. And there are techniques that you can use to keep your snapshots up to date, but it's, you know, it's a pain. So if we can go back over to demo four.
So I'll just show a few bits here. We're doing OK on time. So coming back here, we can see that the list users is inefficient. So then it's, well, what do we do? So the first thing is you check your logs. OK, nothing in the logs. I'll bring up this, and I'll show you just a few tricks that we use. And all this stuff is in Project Wonder that we contributed back. So nothing I'm doing here is proprietary or something like that.
So the first thing we can start to look at is we can be like, well, geez, let's look at our... So this is without restarting the app, by the way. So I just... Let's start looking at our database traffic. Let's see what's going on. for our little app.
So list users. All right. Whoa, a whole lot of SQL there, huh? So we can see that we're fetching stuff from the user table, but then all of a sudden we've got user infos all over here. A whole lot of user info columns. So you're like, what's going on here? So has user infos. There's six there. Well, there's more than six queries. Actually, there is six queries. Sorry. So this is... So let's do one more thing. SQL's not really that useful here.
So let's look at fault firing. Let's see when we're actually firing faults. So all that this thing, so let's go here. Yeehaw. So then we can look down here, and so here's main. Here's my list users method. So the set data source, so let me just show you the code.
Very, very simple. So I'm just creating a database data source of users, setting the data source on a list page, handing it off. That's all the code that's going on here. And we have one in services that the other app is using to talk to it. It's just a -- and this is just the plain vanilla.
Out of the box, WebObjects handling all the services. So I wrote a bit of code here, I did it myself so that I could This is a user service, and so I exposed the find users method. I used the, if you saw Bob's talk on the first part of web, on the introduction of WebObjects about, you know, there's this WS make stubs command line app that you can run. So all I did was I wrote a find users and an update user. Takes the user ID, the first name, the last name, and the find user takes a first name and last name. And then in my application, I said, whoa, web service register, register this guy.
And I ran the make stubs on this. It generated for the Cocoa side all these stubs, the WS generated object. And then I just wrapped it in a little bit of user services. So I mean, it took all of a few hours to do. Nothing complex here at all.
Actually, not even a few hours, half an hour. So going back here now, we can see-- We can see basically the backtrace, and so that's happening on a set data source. This is our fetch specification. So we can see that we're fetching users, no qualifiers, no prefetching keys. Then, next we're fetching user info.
So here's main, here's set data source, here's fetch. Whoa, what's going on here? Awake from fetch. So you can see I'm just walking up the tree. User line 33. So I say, well, what's going on in user line 33? User line 33. Oh, if first name is Max and I've got this test user info, then I'm fetching it 10 times. Not so good. So let me set this to false. left it on in production too. Oops. So I can clear out the console, go back to the web app now.
List Users. Go back to this guy. And lo and behold, a little bit better. So these are just a few techniques that you can use to kind of - Quickly get your head around kind of what's going on. I'll show you one more, which is, - So we use Log4j for pretty much everything we do. And one of the nice features that it has is you can see it, this current one right here, I'm saying my conversion pattern I want to use, I'm saying a date, - I wanna have my memory stats. Yeah, so this is used versus free memory.
What category is logging? The line number that it's calling at, the priority level, which, you know, this is all log4j stuff of, you know, priority of, you know, debug or fatal or I forget what X is, M is message, and then a new line. So when we go back and look at when one of these gets called, let's see.
So, sorry, I deleted all my stuff. So we're back here. So let's go back and look at the first part of this line. Yeah, we have the date. So far this app is using 11 megabytes and it's got 22.95 free. This is being called from the method, from the class, ERX Database Context Delegate, line 149. This is a debug.
And then this is the message. So it's printing the stack trace itself. If I turn, trying to think, So if I go back to this, if I turn the fault firing off, because that guy was going ahead and putting its own stack trace in there. So save this, let's go back here.
One more time back to the home page. So list users. So here we have the exact same, all the same information coming in. This is the Log4j bridge that's just capturing NSLog events and routing to Log4j. And so again, we're getting messages coming through here. Now what happens is then, lo and behold, in production, now, for some reason, there's something going wrong.
You've got some random SQL coming out from this application. So you connect in, and you change the pattern to this pattern. So this gives you the WebObjects. It'll give you the name. It'll give you the number of sessions. It'll give you the WOPORT it's bound on. It'll give you the PID of the process.
Date format, give you your VM stats. Oh, Control-Z. Priority. But then I also put %at at the end, which says, you know, go ahead and dump that backtrace. I want to see where this is coming from. So now, when we turn the fire hose on... Get the firehose on. One more time. And lo and behold, lots and lots.
Oh, update log. Oh, okay, I didn't. I turned it off. So you can see that this does have all the information here. So it has the name of the application. That's the PID. That's the port. So far I've created 10 active sessions, date, memory used, all this kind of stuff, as well as stack trace of where that line message is coming.
So these are just a few of the techniques that we use to hunt down performance problems where looking at where the faults are firing, where the database traffic is. And then you can also look at the woe events if you want to get more fine-grained and look at where your components are potentially causing you problems. So let's see. Back to slides. Thank you. Thank you, Max. So as you can see.
[Transcript missing]
There's another area of optimization is if you generate crap HTML, it takes the browser longer to figure out what it should do with it. So if you generate well-structured HTML, the browsers render faster. And this is an interesting one because, of course, in the early days of HTML, it didn't matter if you closed your paragraph tags or your table tags or anything else. Because the browser would figure it out, thanks Microsoft.
And as things evolved, it not only affects both the HTML processing and parsing because now the browser has to look ahead and then go, oh, well, that tag over there probably means this one over here needs to be closed. But it also confuses things on the WebObjects side.
WebObjects components really want to generate a hierarchy of tags that are nested in an insane fashion. So looking for things like overlap problems, using an HTML tool like WebLint, to check the generated content. And you really got to check that generated content because, you know, WebObjects page is generated of many components all spewing forth HTML that then gets serialized and is one big response. And you need to check the content in the context of that whole response. Because there can be overlap problems that are caused by component mis-nestings and things like that.
Simplifying table structures is another great way to reduce content size and moving to CSS or having a site-wide common CSS document, CSS being cascading style sheets, which fortunately browsers seem to support though inconsistently, is another great way to both reduce the content size, speed up the rendering time, and make your site more flexible.
So there's also optimizing Direct2. The Direct2 stack is incredibly powerful. The whole notion of having rule-based content generation and data management and navigation management and user management and everything else. I don't know of any other tool out there that compares with WebObjects when it comes to this. But it is also overhead, and there's a different approach to optimizing it. And Max can certainly answer any questions in this regard.
So in the context of Direct2, the rules engine, it has this notion of significant keys and unbounded keys. Significant keys are the ones that are the focal points and then the ones that will be cached, etc. The unbound keys are the ones that will require calculation and a lot of faulting through the rule system to figure out the values of those things. That's very expensive. So you want to avoid that. You also want to optimize the data being accessed by property keys to a given task or page.
So WebObjects, the Direct2 stack has this very strong notion that the user is doing something somewhere. And you can optimize all of your data access around that notion. It gives you a lot of hints about what the user is doing at any given time. There's a number of debugging hooks, both in EO and Direct2 and also down at the lower layers, and a lot of which you can find in Project Wonder. And there's warm-up techniques you can do to cause the rule.
So the rule caching system to warm up its state such that subsequent evaluations of those rules will be much more efficient. Like one of the most common complaints we see about Direct2 Web or Direct2 Java client is the first hit always takes a long time because it goes off and the rule cache is empty and it has to go off and evaluate all these rules to fill the rule cache.
Well, the rule cache, most of it is actually going to be static results. So there's like entire huge sets of rules that just never need to be evaluated again because the results are going to be static. And the results never change. And so you don't want your first user to have to pay that penalty.
And when you're building custom components, and this is true of both DirectTo as well as everything else, go for stateless. Stateless means that there's no session-specific data. It means the component can be shared across the app. It doesn't have to be archived and unarchived and reconstituted during request response. It's just a lot more efficient.
Then, also, you've got to look beyond the WebObjects application itself. Make sure your web server is doing its share of the work. And that means tuning the configuration. Like, Apache has a mod status and one other module, which I can't remember the name of anyway, that out of the box can give you a lot of information about what your web server is doing.
Plus, look to your web server, especially as your site grows. You'll want to look to your web server to be able to farm out content across multiple web servers, multiple boxes, and even up to the level of farming out to, say, an Akamai or the other content aggregators. Because, of course, once you do that, then any hit that doesn't hit your web server is more CPU cycles for the primary content generation.
Offloading all serving of the content you can, like images, files, multimedia to other servers, is a great thing. One of the challenges is always if you have a site that's secure, as soon as you go into the HTTPS, then that means all the images that are on that page have to be encrypted as well, because web browsers don't like mixing encrypted and unencrypted content. This gets back to security being the antithesis of efficiency and convenience.
So that makes for quite the adventure, because now you're going to have to figure out how to pay the price of encrypting the content that's related to that page, including the static content. And of course, encrypted downloads is a really bad idea. Nothing like encrypting, say, 45 megabyte download for one user, because everything has to be encrypted per individual user.
Caching proxy servers. This is a really neat technology. You can use something like Squid or the caching proxy server in Apache, and the first user that hits your site will pay the price of the dynamic generation, but then that HTML page gets stuck in a caching proxy server that's in between the web server front line and your WebOptics application.
Once that item is in there, you can then put timeouts on it, or you could have an external interface for invalidating it, or the easiest thing to do is to just simply have a dynamic page, which has the set of URLs that lead to what will be cached, and just change those URLs once it's invalidated.
And that way, since it's an URL that hasn't been cached yet, the caching proxy server will go, "Oh, I got to go get it. Go get it. Cache it. Next user will be really fast."
[Transcript missing]
There's also tuning the adapter timeout values and making sure your wool worker threads settings are all set up correctly.
Because as is the case with most things, the generic out-of-the-box configuration is pretty much guaranteed to be wrong for your application. This is also why WebObjects doesn't do synchronization of data between instances. We could do a generic solution for that, but it would be guaranteed that it would be inefficient for your specific business problem.
And you also want to determine ahead of time how you're going to monitor the system for problems. I mean, every component in the system. As you add more machines, as you add more complexities, you add firewalls and everything, these things need to be monitored. And you need to plan ahead for a catastrophe.
And Max has got some great anecdotes on that, I'm sure. And I'm going to turn it over to Max now to talk about this particularly fun, fun issue. Max Muller Yeah, so we just wanted to finish up with the production quality deadlocks, which any of you who have been in high traffic sites, it's always one of those things that if you've got something, if you've got a recipe in it, it's definitely going to get baked in production, and you're going to find it in production. So just a few topics. One is that kill minus quit, you know, within the Java world, that'll basically be full stack traces to every, for the running app, for all the different threads that are currently running.
Max Muller One of the most common places of is, you know, - Is having initialization things that are happening in your dispatch request, 'cause dispatch request is completely threaded, or is, will have multiple threads coming through there at any given point. So even if you have your app set in kind of single threaded mode, it's not doing concurrent requests, your dispatch request has to be threaded.
Likewise, any of the code in there, if it's, if you have one method there, it's like, oh, let me go out, fetch something from an editing context, cache that value, and then, and that value will then be used for any request that comes in. You can guarantee that when you start, you're gonna have two threads that are immediately gonna get in there and start doing EOF stuff, which is, yeah.
Which you will run into serious problems with. Most of the deadlocks we have to track down are because of EOF, or we're not locking things correctly. Or the multiple EOF stacks in a single shared editing context. That one will kill you every single time. So you have to, if you want to use those full-blown EOs in multiple different EOF stacks, you definitely have to create new shared editing contexts for each one of the stacks that you want to use.
If you, by default, if you don't do anything, just new object store coordinator, new EO editing context, fetch an EO, then you're gonna be dead in the water. So the monitoring to detect wedged instances, that's a big one if you are having this problem. And also, it's very important to start your load testing before you start your applications.
'Cause a lot of the times, we ran into several issues where we didn't detect a certain deadlock condition because we had all of our apps up and running, and then we're like, all right, now turn on the load test. Whereas if we would've had the load testing up, which is what you have in production if you bounce your apps, 'cause you constantly have users in there clicking on everything, and then the app has some initialization deadlock and it starts up and it's like, ugh, and it's wedged.
And the dead time interval in the adapters can be a killer. 'Cause if you, 'cause if that interval, that basically says how long should we wait for, if we try an instance and it doesn't respond back, how long should we wait until we try it again? And so you can get this nice ripple effect where the wave will crash down and all your apps will basically register themselves as dead if they're starting up. And so then it will basically wedge all your web servers.
And then your web servers will finally come back, your apps will be like, now we're ready. And then the web servers will go, well, here you go. And the wave will come sweeping over and the apps are like, no, no, no, no, no more, no more. And so they'll wedge and the dead timeouts will set. And so you get this nice seesaw effect where all of a sudden things will be really fast and things will be really slow. Really fast and really slow. So that's a, yeah, so it was just one last slide. of these things.
So, users are funny because they pay your bills, but they hate you. Because as soon as your app starts misbehaving, how do they respond? By clicking like spastic monkeys. Fun. So, you know, quick summary here. So start thinking fast from the beginning, but don't overdo it. I mean, you want an instrument, you want to analyze, you want to track, you want to track your performance over time, but invariably you want to stay calm. And that's like a point that just, again, stay calm.
Because when things start going wrong, the worst thing you can do is to start just, you know, throw your hands up in the air and start rebooting things at random without understanding the problem or getting your analysis tools up and running. Or gathering metrics or gathering evidence because simply, you know, doing the spastic monkey routine on the reboot button is not going to fix anything. It's just going to make it happen again sometime later.
There's a tremendous wealth of tools available. It's easy to forget exactly how much stuff is out there, but the industry as a whole has been doing web-based deployments now for more than a decade, and web application deployments now for a decade, too. So there's just a boatload of free and commercial products out there to do a lot of management and analysis, some of which are better than others.
Always be aware of that security implication. There's a lot of really obvious optimizations that one can perform on a site that will make it completely insecure. Like the direct actions thing is one to watch out for. You've got these direct actions in place. If you carry too much state in that URL, or the URLs, like there was one case where someone decided to separate their shopping cart out from their main application, and when they put the product in the shopping cart, they put the price in the URL, and they believed it. Yeah, that wasn't good.
Having done this for so long, all of us having done this for so long, there's a tremendous number of community resources. There's Google, there's the Apple and the Omni list, there's again Google, which searches a lot of those lists and indexes everything else. There's Project Wonder and other random community projects that are out there, including a wealth of various random free Java projects that you can leverage. And finally, again, Google. If you get an error message that's coming back from something, almost always you can put that error message into quote marks in Google, hit return, and find 10 other people that are experiencing the same thing, one of which might have the answer.
So with that, for more information, that should have just said Google. There's sources of documentation and sample code. The documentation has been updated. I was reminded of one other thing. As far as performance analysis is concerned, Shark and the Chud tools now do Java as well. That works with WebObjects.