Enterprise • 1:13:46
View this session to learn about Oracle's latest grid-based relational database solution, Oracle 10g. Grid computing reduces the cost of IT by clustering servers together to function as a single large computer, shifting resources dynamically between applications. View this session if you are an in-house developer looking for a highly scalable and reliable database on Mac OS X Server.
Speaker: Thomas Kyte
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Good afternoon. So as he said, my name is Tom Kyte. I've been with Oracle going on about 11 years now, but have been using the database software for about 16. And I was just curious just to get a feel in the audience. How many of you have developed applications using Oracle, the database? Excellent.
So I don't have to explain what SQL and other things like that stand for. Perfect. One of the things I spend way too much of my time doing at Oracle is working on a website called asktom.oracle.com. That's one of the many resources out there for you to get answers to technical questions and things like that.
Another resource that I'd like to sort of point out, get out of there for a second, and go over into the web browser, otn.oracle.com. This is the Oracle Technology Network. And just to prove that this stuff actually exists, you can go there today, this afternoon if you like, to product centers, downloads for the database, and featured right down there, we have the Oracle 10g developers release for the Mac OS.
So you can go get it, the same software that we're running on these Xers over here, for the demonstration, is what you guys would be downloading and installing in your own environment. So what you see us doing are things you'd actually be able to do yourselves later this afternoon.
Back to the slides. So we have a really short agenda. I'm going to talk about two database things, the two main things about Oracle 10g. The g in 10g stands for the grid. So we're going to take a look at that briefly. Then we're going to look at the other half of 10g, the what's in it for all the rest of us, the manageability features that they've added to the database.
10g has over 350 new things, new functional areas added to it over 9i release 2, which had about 150 things over 9i release 1, which had about 190 things over 8i. Every year we've been coming out with more and more inside the database. 10g is our current production release. We started rolling it out on various platforms in the middle of this winter, the February timeframe. Available on the Macintosh now in a developer's release.
Consider that sort of a late beta. And we're looking at the fall time to go production. And then we're going to close up with HTMLDB, which Mark Peramarini is going to sort of walk you through and show you what this development tool is. So beginning very simply with the grid.
This is sort of a picture that describes the sort of systems I've worked on. I've been doing this since the mid-1980s, and I started as a PL1 programmer on an IBM mainframe using this thing that not many people heard about. Because I was the newest kid there, I got to play with the coolest new stuff, so they set me about programming against this thing called DB2 and SQL DS.
And that was really started, you know, where I got all my SQL background. But pretty much a lot of our systems back then were mainframe-based, very reliable, centralized. These things never went down. When I first came to Oracle in the early 90s, there was a lot of shift away from a mainframe sort of centralized environment into more of a departmental server. So Vax was what we all ran on.
I had a VT100 connected to Vax machines. Everything was character mode. We had literally hundreds of servers all over the world running things. In the mid-90s, we went through this sort of, I call it a faux pas, this client-server phase. I remember in 1995, we rolled out to 20,000 sales reps laptops.
With client-server software on it. It was called Salesforce Automation. It was going to make everybody so super productive, this would be great. The only thing we really got out of this was on Monday, we had 20,000 people with the same installation. And by Friday, we had 20,000 sort of unique installations out there.
Trying to keep the software in sync and keep the patches going was just impossible. Fortunately, in 1995, this internet thing started coming along as well. And we migrated all of our applications to a more... internet-centric approach. We sort of got back to a lot of the attributes of the mainframe. The browser became our 3278 sort of screen, if you will.
But we went back through a major reconsolidation phase. Where we took a lot of our systems and clumped them together into single systems. Instead of running 187 email servers, we went down to single email servers. One of the things that allowed us to do this, to... sort of consolidate these systems, is what I'm going to be talking about next. This grid architecture.
So there's a lot of people who study this and they look at systems and they say, you know what? The way we're doing it is wrong. I've got this system that for most of the year it sits there idle or underutilized. And then maybe one or two months out of the year the thing is really busy. I'll use one of my customers as an example, the IRS. I know they have a lot of systems that in...
[Transcript missing]
Other systems would pick up at the end of April to start processing the data. And what they've done is what most of us would do.
Put somebody in charge of system one. He went out, he got his budget, and he said, you know, I need to size for April 15th. They gave another system to somebody else, the data warehouse. I need to size my system for May. And so they go out and they buy these mega sort of machines. Everything's configured for their peak utilization.
The availability is fairly good, not as good as it could be, but it's pretty much a poor way to utilize this because at any point in time, if you added up all the free cycles, you probably have three empty computers sitting out there because one of the machines is fully utilized and the others are not doing much at all.
So we've come along, along with a lot of other people. I mean, Oracle-- it's not trying to say we invented the grid, or the grid is Oracle, or Oracle is the grid. There's many ways to approach grid computing. For us, this is our very simple definition-- coordinated use of many small servers acting together as one large computer, like a bunch of X-SERVs inside of a rack.
This is different from, say, IBM's programmatic approach to grid computing. They have a grid toolkit where someone can take their problem, basically decompose it, parallelize it, use the grid toolkit, and come up with a very scalable solution for their particular problem. We took it from a different perspective.
We wanted to take applications that already existed and be able to plug them into this grid environment without having to change them. So what we've done is sort of our grid toolkit is our database, our application servers. You have an application that you'd like to run in a grid sort of environment. You don't have to program specifically and build a grid application. You just take a database application and deploy it in a grid-capable database.
So one of the examples, this is sort of what I was talking about before. I've got two systems, order entry and financials. Well, things go very cyclical. This is May 31st at Oracle. May 31st is the end of our fiscal year. We book most of our business in the last couple of hours for the year on that particular day. Those order entry systems are totally maxed out. The very next day for the next month, the system that's really busy is our financials.
So what we've started doing is consolidating these. Instead of having separate stovepipe systems, we put our applications into a single database with multiple servers in front of it. So we have an order entry financial database, single database, single set of disks, but three or four, however many computers we need, able to access that information in a read-write fashion, and we can load balance across those.
So one month, two out of three nodes could be order entry. The next month, two out of three nodes could be order entry. The next month, two out of three nodes could be a financial system. So we're able to allocate and reallocate these resources very quickly, very dynamically.
The technology that lets us do this is something we call real application clusters. You're all familiar with RAID for disks. That just makes sense at this point. You understand how that works. Maybe you mirror your disks, and if you lose one, that's okay. You still have a mirror of it. Maybe you're using RAID 5, so you have five disks working together, and if I lose one, that's okay as well. RAC is RAID for your computer systems. In a clustered environment like this, I'm going to have one database.
Those four disks represent a single database, if you will. I'm going to have four instances of Oracle in this particular example. I have four computers that have full read-write access to this single database image. I can log into node 1 and run a query. I can log into node 4, run the same exact query.
It's going against the same data. In fact, I can update the same information in node 1 that I can update in nodes 2, 3, and 4. This bot is two things. First and foremost, it buys us much higher availability. I have to lose three out of four computers.
Before I'm in danger of having this system become totally unavailable. So it gives me the ability to reduce unplanned downtime if I lose a power supply, a CPU goes bad, something happens to a RAM chip, it's okay, one of those nodes can go down, the other nodes will pick up the workload.
The other thing it gave me was the ability to scale my application out. I don't know how big my application is going to ultimately be. All I know is initially I'm going to have 50 users, it might grow to 500, it might grow to 5,000. It's really hard to size systems for three or four years out sometimes because you don't know how popular some of them are going to be. Is the website you're building going to be the coolest thing since sliced bread or is it going to be a ho-hum event? Well, you don't want to have to buy a super large server to... You serve up something that becomes a ho-hum event eventually down the road.
It'd be much nicer to start small with the hardware and start growing it out over time as needed. We did this with our email. We consolidated all of our email in Oracle from 187 servers down to one. We started with a two-node rack cluster. We needed additional capacity. We're now running a three-node rack cluster on our email. Email for Oracle is a mission-critical sort of application.
If we don't have email up and running... We actually can't process orders. We can't do order fulfillment. We can't do our job basically. Our email system, being able to scale it out horizontally by adding additional capacity as well as having it highly available so that if a single node or two nodes even fail, we still have access to our email systems.
So what was new in 10g? RAC is something we introduced in Oracle 9i. In Oracle 8i and before, you might have known it as Oracle Parallel Server. In 10g, we made it infinitely easier to install, configure, and manage. So, for example, one of the things we did was add it, integrate it, clustering software.
In the past, if I was using Oracle 9i and I wanted to run RAC, what I'd have to do is get the Oracle database from Oracle. I'd have to get clustering software from some vendor. I'd have to get an operating system to run this on, and then I'd need some sort of volume management software to manage my disks and that infrastructure.
[Transcript missing]
We're going to have Oracle running just enough clustering software to make your database cluster just that, a database cluster. I've got two XSERVs. I want to pull them together, have two instances of Oracle accessing one set of disks. The only thing I need is the Mac OS and the Oracle database. Now, if I install it all and something doesn't work, I have one place to call.
I call Oracle and they say, oh, do you have this required OS patch? No, install it. If it still doesn't work, I still only have one place to call. You're not calling Veritas for support on their file system software. You're not calling whomever for support on their clustering software. It's just a database on top of an operating system. It's as easy to support and manage as a single database on a single server would have been. So the clustered environment now becomes pretty much synonymous with, a single machine environment.
[Transcript missing]
Then next week, what happens? Well, we spend next week figuring out how many mistakes we made, introducing a hot point over here or contention over there, and coming up with another plan so that we can spend the next weekend sort of coming back and rebalancing the I/O. And this is a continual process. DBAs spend a lot of time looking at disks, moving files around, trying to eke out the best performance they can by making sure that we achieve this nirvana of sort of even I/O distribution.
So in 10g, what we did was introduced a database file system. So much like you would install your operating system and say use a Unix file system driver or something like that to build a UFS, you're using automatic storage management or ASM to build a database file system. This is a file system that does one thing.
It holds files that Oracle ingests and Oracle processes. So it holds redo log files, it holds data files, it can hold database backups, anything that Oracle sort of produces or consumes, it can hold. What you're not putting here are your documents, your HTML files, your spreadsheets, unless of course you're actually putting them inside the database. This is purely for database files.
And then one thing that it does is it's achieving that goal of even I.O. across every single device possible. So here, using that same scenario, we started the year with a couple hundred gig of storage. We get an additional allocation to us. We would be adding these devices into what's known as a disk group. A disk group is what the DBA sees. Physically, at the operating system level, you still have all the devices. You can manage and monitor it at the OS level. But the DBA sees this disk group.
We just doubled the size of this disk group. What you would observe at the operating system level if you cared to watch as you added these disks would have been a whole lot of read activity on the top and write activity on the bottom. Oracle would be going out there and picking up extensive objects and moving them around, rebalancing the I.O.
Conversely, if you needed to take some storage away from the database, I doubled your size, but that was 50% too much. I need to take some of that back for this other application over here. You can go in and actually remove devices from these disk groups. What you'd see then is a whole lot of read activity on the devices you remove and a spreading out of the data across the remaining devices here. So instead of working with files and mount points and moving things around and trying to determine the optimal I/O layout yourself, the database is now doing it at the database level using the equivalent of a database file system.
So, in short, ASM is a portable high-performance database file system. It eliminates the need for conventional volume management. So, for example, if you get Oracle Standard Edition, this thing, RAC, comes as a feature of that. So you could get Oracle Standard Edition, put it on a two-node cluster with two CPUs in each box, and what you'd be running is our clustering software, ASM to manage the disks.
That's all just the Oracle database and the operating system itself. There's no other software to sort of buy or integrate into this mix, and in fact, with the Standard Edition, we don't even support it if you attempt to do that because it makes it too hard to support in that particular environment. Having it all come from a single place, the database, managing the database, made everything much easier.
So in short, that was a quick peek at some of our grid features. Grid is not just rack. Grid is many things. Part of it is ASM, being able to take the storage and allocate it and reallocate it. Part of the grid is being able to slide things into a rack cluster and slide things out of a rack cluster. Part of the grid is being able to just move data around in your network.
But what we looked at today briefly was predominantly storage-related, being able to very easily manage storage in an Oracle environment. Database clustering, being able to slide servers in and out of a rack cluster. We didn't really take a look at the application servers, but they have the same capability. I sort of used that in an example where I had 10 app servers and two databases. I can steal from the app servers, unplug those components, if you will, from that grid environment, and slide them over into the database.
And then one thing we will be taking a look at is the grid control. How do we manage four Oracle instances as if it were a single Oracle instance? Because that's really what you want to do at the end of the day. You don't want to be having to think, oh, I've got these four different instances. I need to remember to do this over here and that over there. I just want to manage it as though it were one.
Now onto manageability. Oracle 10g could have been called Oracle 10m, and sometimes I wish it had been, but when Larry Ellison sat down and looked at it, you know, there's manageability, it's hard to spell, it's big, it's long, and then there's grid. It's very short. So he went with 10g. But of the 350 new things added to the database, over one-third of them were manageability features.
If you just even look at, for example, ASM, seems like a grid feature, seems like something for big databases. It's equally appropriate for the small installation. I'm the Oracle DBA. I've been given six direct-attach SCSI disks, and I've been told to make it perform optimally. What do I do? Where do I start? Well, with six disks, I really don't have much of an opportunity. So give those to ASM, and let ASM implement this striping and mirroring of everything, achieving the best overall general I.O. throughput on this. I don't have to worry about those things. So ASM is actually a manageability feature in disguise. It's not just a grid feature.
So I stuck this slide in here introducing Oracle 10g and I colorized the last bullet points because I think many people gave us or conceded the first bullet points. Yeah, Oracle pretty much runs all your applications. It's fairly open that way. It stores any information you have from rows and columns to spatial to text to images.
It's scalable, available, reliable. It's pretty secure. But what I don't think people would have conceded in the past was it was really easy to manage. You know, it's as easy as firing up Microsoft Word and, you know, putting in ABC and I've got a document that I can print out. Oracle had a reputation of being able to scale up really large, but maybe not scale down as easily, if you will. So this is something that we really focused on in the release of this software.
It was the single biggest effort, not just for 10g in a microcosm, but in my years at Oracle, it was the single biggest effort software-wise that I've ever seen. They took this seriously. Over 50% of the architects and the engineers within the server technologies division were focused on manageability-related projects.
This was a widespread effort, not just in the server technology on the database, but throughout the entire application stack and the application server as well. So, for example... The overriding goal was reduce the cost of managing simple, small Oracle instances by at least 50%. How can we do that? Well, one thing we can do is reduce administration costs. We made many things automatic versus manual.
Mark was trying to get Oracle going on his laptop, which is somewhat memory-starved. And he was like, well, how big should I make the block buffer cache, and how big should I make the Java pool, and how big should I make the large pool? What about the shared pool and the redo log buffers? You know, I could go through all the pools.
All the DBAs would be glad to know we added five more in Oracle 10g, so you have like 13 of them that you could size. Well, we took a look at his system, and I said, you know, you got X amount of memory. Let's give half of it to Oracle.
You can set a single memory parameter, SGA underscore target. We set that to 200 meg or whatever it was, and Oracle come up with the best allocation of memory based on that. What it does is it sets up relatively small-sized pools for everything and throws the rest of the memory into the buffer cache. As the other pools become starved for memory, Oracle will come and steal from the buffer cache and give it to these other pools.
But it made it automatic. I didn't have to sit down and think about how to size these 13 caches. I just said, Oracle, take 200 meg of RAM and divvy it up the best way possible. Intelligence versus data. I call this one giving the database a long-term memory.
The DBAs out there know that the database is very good at telling you two things about performance. It can tell you... What's happening right now? I can log in. I can see what queries are running. I can see how many users are connected. I can see who's running a long-running query.
I can see how long the queries have been running for. I can see how much work the queries that are still running have done. The other thing it can tell me is, well, since you started me up two months ago, I did X number of transactions. I ran this query so many billions of times. I've done this over and over again.
What it can't tell you is, at noon, the performance problem that was being experienced by your end users was caused by George. That information just doesn't exist in the database anywhere. It had a very short-term memory, what's happening right now, and then it had this averaged long-term memory, but it had no real information. You couldn't log in and find out anything truly useful about the immediate past because it always happened that way. The DBA goes out to lunch. He comes back. The red light is blinking on their phone.
They pick it up. It's George saying, hey, at noon, the system is really, really slow. I need to know two things from you. Why and what are you going to do about it? In the past, unless you had accidentally been sitting there watching the system at noon, you'd really have no way of knowing. The database is now constantly benchmarking itself. When the DBA comes back at 1 o'clock, slide a little window back in the database console, drop it down, they can see what was happening at noon inside the database.
For the first time, they'll be able to call George back and say, dude, we've told you so many times, don't hit the stop and the reload button. Just let the page finish because the performance problem was you submitting the same monster query 15 times, and we fixed it by revoking your access to the system.
But the DBA never had that information before. It's always been guesswork. I've always been afraid of answering that phone call because it's like, I don't know what was going on. I don't know what was going on at 3 a.m., and I have no way of telling you what was going on at 3 a.m.
Reduce capital expenditure. This is what I was talking about by sort of right-sizing the system. Adaptive versus oversized. I'll add capacity as I need. I don't have to buy a machine that can support up to 128 CPUs, which is a quite expensive machine. I can start with smaller machines and add more of them over time. Integrative versus third-party.
I want to do this cluster thing. I need to buy two pieces of software, not four. So I've cut out sort of this volume management software that I don't need and this clustering software that I don't need. And the last part here is reducing the failure cost. So proactive versus corrective. That SGA target that I was talking about. One of the very important pools inside of an Oracle database is this thing called a shared pool.
Shared pools where we store parsed representations of SQL and PL/SQL and other pretty important stuff. A very common error is an Aura 4030, unable to allocate X number of bytes in the shared pool. Used to be that the end users would just get that message back. Not very useful, have to call the DBA, DBA's got to sort of figure out what went wrong and fix the problem.
If I'm using this automatic memory tuning before returning an Aura 4030 back to an end user, the database is going to look and say, can I sort of steal something from somebody else at this point in time? Maybe I can shrink this pool and take it and use it over here because we were using this memory over there, but we're not using it right now. So let's give it up from that purpose and move it over here. So it's proactively fixing itself rather than reactively waiting for DBA to correct itself or correct it.
And the other one is act and succeed versus trial and error. I know a lot of DBA's, I've been guilty of this myself, the system's running slow. Well in the past, you know, we did this and it sort of made things better. We rebuilt an index, so we'll go and try that. Is it better yet? No. Okay, re-organ tables, that'll make things faster, so we'll rebuild some tables. Is it faster yet? No. The database, because it's now benchmarking itself, can actually come back and give you advice.
Hey, I installed this database console on my 10G database. It was running great. One day it came back and said, your database is running five times slower today than it was in the past. We're running 500% slower. Things are taking five times as long. We've identified that there's these new queries that have been introduced into the system. They never existed before.
And if you push this button over here, we'd actually like to take these queries and tune them for you. We can suggest indexes that you could add to it to make them faster, materialize views, or even how to simply rewrite the query. So the database is sort of watching itself, waiting for its performance characteristics to change, trying to figure out why, and then notifying you. So we're not coming in and sort of guessing what can fix the system. The system's telling us, yeah, there is a problem, and here is the probable cause and solution for that problem.
So let's take a look at the life of a DBA, see where they spend all their time. About 18% of it is in install, create, and configure. So this came out of a survey from the International Oracle User Group. DBAs, what do you actually do? So it's 18% install, create, and configure. That sort of boggled my mind a little bit. I thought that was a lot. You install a database once and then it's over. It's forgotten, right? Well, no, because I install the database for this developer so that he can build his prototype.
Then I install it for this development team so they can take the prototype a little bit further. Then I need to move it over to this QA system and do it over here again. And then I need to put it into a production environment. So every database might actually have five, six, seven installs behind it. And people are actually creating and destroying these database instances relatively frequently. So one of the things we did in 10g was to make the database.
Fast and lightweight as far as installation. We got it down to a single disk, if you will. It used to be three to five CDs to actually install the database. It's now one CD. No, we didn't cheat. We didn't put it on a DVD. It's actually a CD. What we did was we went back and took a look and said, what is this file? It's been here since version five. What does it mean? Well, we don't need that one anymore. And we cleaned out a lot of the stuff. Went back.
And we have one single base image CD now. You stick this in. 15, 20 minutes later, you have a database installed up and running with the seed database, the starter database, and everything. An out-of-the-box database console. So it used to be I install the database. Now I want to manage it.
Oh, go get the CD back so we can install Enterprise Manager and set it up and configure it and everything else like that. We've taken the Enterprise Manager database. We've taken the database functionality and actually moved it inside the database itself. So now when I install the database and I start it up, it comes with a very small Java web server sitting outside of it.
The Java web server is just making stored procedure calls basically into the database. And the database has got the Enterprise Manager tool in it. So no longer am I installing the database and then hunting around for a CD to figure out how to install the tool to manage the database. I install the database. The last step is, hey, go to this URL and start managing me. This is how you're going to interact with me.
Six percent in loading data. So we took a tool. We've had this tool called export and import forever. And it's been virtually unchanged forever. And every DBA knows exactly what export-import is. They know the limits. They know how to kick it and make it do what they want. But it's not very flexible. Fast is something people might not have attributed to it. So for the first time in many years, we rewrote this.
They're now called import DP and export DP for data pump. The export is faster, but not as faster as the import is. See, export was already pretty fast. Using a direct path export, there was only so much we could do. We're now pretty much gated by the speed of the disk as far as getting data out of the database. But the import, we did a lot to. Made it use direct path operations. So it's bypassing the SQL engine. So it's able to forward and forward.
So if you've ever tried to import a 100-gigabyte database dump file, you know that was a long process because it would insert row after row after row. Then it would create an index, create an index, create an index. Now it's going to do all those things in parallel as much as possible. So it's blown as much data in as it can as fast as it can. Then creating indexes in parallel as fast as they can. Basically utilizing the index.
So it's a really good way to get a lot of data out of the entire machine. It's restartable. One of the bullet points I don't have on here that I think is the neatest one is I can actually start and export sort of a server process on one machine, stream it over the network to an import process running someplace else, and it's sort of piping it into my database.
So I can actually stream my information from, say, an HPUX box across the network and import it into my database running on a Mac or take it from the Mac and stream it over to Linux or wherever I need the information. So it's a little bit like using database links, but doing direct path operations in a massively parallel fashion.
Here's the big one, 55% of the time, ongoing system maintenance. So I've got a DBA, his name's Fred, he manages 14 Oracle instances, and his morning is pretty boring because it's always the same thing over and over again. It's a little bit like Groundhog Day. He comes in, logs into server one, hi, how you doing? Are you up? Are you running out of space? Did the backups work? Am I running out of space in the file system? Looking at everything, did anybody report any problems against you? Then he logs out and goes to the next server and sort of repeats the whole process, and hopefully it's a boring morning that he doesn't find out any problems actually went on. Now with the database sort of watching itself when he comes in, the databases tell him everything was okay, you don't even need to look at me. We're running fine. We're not out of space. Nothing bad happened. The backups succeeded.
Or there would be a red blinking light saying, help, I need help over here. Performance was going through the tubes at 3:00 in the morning. Here's what was wrong. Here's what I think you need to do about it. And then Fred, who's the professional DBA, looks at that advice and treats it as advice from a junior DBA and either does it or refines it or ignores it entirely.
So this diagram is supposed to represent what we have today. There's a database, and then each one of these red boxes is sort of a point solution outside the database. My developers have a tool that they use to do their application and sort of SQL management, and they're tuning SQL queries and rewrites and things like that.
My system administrators have system resource management tool. How's my system doing? How's my disk? The memory on the box, the CPUs. The DBAs have their backup and recovery tools. Then they'll get another tool to do some space management. Hey, go look for fragmentation in my database. Suggest objects that need to be fixed. Help me reorg and things like that.
And then lastly, there's storage management software like a Veritas or something like that. Our goal with 10g... is to actually move that all into the database. So if I want to tune SQL... The database is running the SQL. The database knows better than anybody else what could actually help the SQL go better because it's got this thing called an optimizer inside of it.
It could say, you know, what if we had an index on these columns? How would the SQL probably perform given that? That's the entire goal or job of an optimizer is to look at all the possible combinations of query plans given the tables and indexes it has. It can now hypothesize and say, you know, what if we had these extra two indexes over here? We estimate this query would consume one-fifth the resources it is right now.
Or if you created this materialized view, we would be able to execute this query with one-tenth the logical IOs instead of the millions that you're doing right now. Same with system resource management and space management. The database sort of knows as you're gathering statistics and things like that, this object is 50% white space. Well, once I've identified that an object is 50% white space because I sort of deluged.
I deleted every other row or something like that. The next step is, well, I need to reorg it. Well, in 10G, the database recognizes that the object is in need of a reorg, step one. And step two, it actually has a command to shrink segments now. So I could say, hey, take this table and compact it.
And it's going to do things like read the table from the bottom up and find the last row and sort of delete it and reinsert it at the top of the table so that the table will become 50% full at the top, 50% empty at the bottom. And it will redraw the high watermark for that segment and give us all that space back. But our goal was to take all these things that we always had outside the database and move them inside.
So for example, automatic tuning of SQL. We have bad SQL in a packaged application. This is sort of a list of steps. You're not expected to read these. Of what you would do in 9i versus 10g, in 9i, you'd have to first identify that there is a problem. This query is running for like five hours.
There's the problem. Then you'd have to try and figure out, well, could the query actually run in less than five hours? And I'd play around with it, and I'd tune it, and I'd find, hey, you know if I rewrote it like this, or I hinted it, or I did this, the query runs really fast. But it's in a third party application.
So you can't change the code. You can't change the query. So you call up their support, and you spend a couple hours convincing them that there is a problem. And then maybe a couple of days later or a couple of weeks later, they develop a patch. And then you install this patch, which is sort of downtime as you're patching your system. And one of three things will happen.
One thing that could happen is nothing changes, doesn't fix the problem on your particular box. The other one is it could actually fix the problem, but the more probable outcome is the installation of that patch destabilizes 50 other modules inside of this system, and you end up worse off than you were in the beginning. In 10g, what's going to happen or does happen is the database identifies bad SQL. It automatically pops to the top and it says, you've got this SQL. It is a problem.
We need to take a look at it. It's got many things it could do to it. It could advise you on indexes, materialized views. It could tell you how to rewrite it, but those aren't always achievable in this third-party software environment. The last thing it can do is actually analyze a query now, much like DBAs have been able to analyze tables. It's going to take that query. It's going to say, look, it's a 10-table join. The problem is we don't understand the relationship between table 1 and table 2. Let's understand the relationship between table 1 and table 2, given this predicate that you're using.
The database analyzes this query, stores extended statistics in the data dictionary. Then the next time you parse and optimize that query, it says, you know, when I join these two tables together, without this extended statistics, I thought I'd get 5,000 rows. But now I know because of these extra statistics, I'm going to get 5 million rows out of this join. I'm going to come up with a totally different query plan.
The DBA can take any query, analyze it, store that information in the data dictionary, so that the next time that query is parsed, it comes up with the right plan all by itself. It cuts out that loop of go back to the developers who wrote the query in the first place and try to figure out a better way to run this query, give the optimizer more information.
Out of memory errors. This is the one that I used before. Hey, I'm running a query and I get this Aura 4031 or an Aura 4030 out of memory. What do I do? Well, I call the DBA. The DBA has to look and say, okay, yeah, the buffer pool is a little bit bigger than we need it. Let's shrink the buffer pool and we'll grow the shared pool. And now the problem's gone away.
If I'm using the automatic memory management inside the database in 10g, well, that space is intentionally left blank because that's not going to happen in that particular installation. If the shared pool runs out of memory, we're automatically going to look and identify where we can get some from and slide it over there.
View this session if you are an in-house developer looking for a highly scalable database on Mac OS X Server. What do I do? Well, I call the DBA. The DBA has to look and say, okay, yeah, the buffer pool is a little bit bigger than we need it. Let's shrink the buffer pool and we'll grow the shared pool. And now the problem's gone away.
The last example, mistakenly dropped table. It's the problem of right table, wrong database. It's that late at night as the pinky leaves the inner key, you get that cold feeling and your stomach just falls through and you go, oh, I did a bad thing. Nine Eye, we'd prepare an auxiliary instance. We'd use RMAN to restore the table space.
We'd do a point-in-time recovery of that table space to right before the drop, and then we'd transport that data out and move it over. This is probably at least an hour because we're going to tapes. We're going to get backups. We're going through this process that we don't go through every day in order to recover this. 10g, we now treat tables as though they're like recycle bin objects. When I drag something into my trash, it doesn't actually delete it. It sort of hides it.
I can go back and unhide it later. When you drop a table, we don't actually drop it anymore. We sort of hide it from you, and we give you the ability to undrop this object at a later point in time. Instead of your system being down for an hour or so while you're recovering this table, it's back up immediately because as soon as you drop it, you undrop it and bring it back.
And then the last part we're going to look at here before getting into HTMLDB is the install, create, and configure again, but also the ongoing software maintenance. This is a quarter of their time is spent dealing with software-related issues, not even data-related issues, really. Enterprise Manager version 4. This is what we also call our grid control. Its main purpose in life is managing groups of systems and these grids, if you will. They took the database functionality out of Enterprise Manager.
Many DBAs looked at Enterprise Manager and said, that's the tool I use to create users and things like that, administer my database. We moved that all into the database console. The database has that functionality now. Enterprise Manager needed to find something else to do, a different purpose in life. So it came up with, I'm going to manage your systems. I'm going to manage your software. I'm going to fix the problem that we have of this create, configure, install, patch, maintain.
Two topics that I picked out of this, the two that intrigued me most. One is automated software cloning. That DBA has asked, hey, install this database for this prototype. Install this database for this development. Install this database for QA. Install this database for production. They're doing the same thing five, six, seven times over and over again.
They can now install the software. I'm going to install and create and configure an OLTP-ish database. Then I'm going to install and create and configure sort of a data warehousing database. It's going to have partitioning and other things on it. But I'm going to have these reference installs so that when somebody comes and provisions a database for me, I don't have to get the CDs out anymore. I don't have to go to the machine. I don't run the installation process. I point it at an existing Oracle install and I either just clone the software or I clone the software.
And the data along with it. So instead of being an hour-long process to say install, create, and configure a database, it becomes a 30-second operation because you're setting this up through Enterprise Manager and it's happening in the background. True, you have to come back in an hour and verify that everything worked, but you're not sitting there watching a blue bar go across, hitting a next prompt, filling in the defaults. It's an automated process.
[Transcript missing]
At this point, I'm going to turn it over to Mark. He's going to spend about 20 minutes talking about HTMLDB, and then we're going to have a quick Q&A session. So, Mark.
All right, so HTMLDB is something that's built out of the Reston Development Center. It's kind of started as a small little project to build a calendar and turned into a sizable component of... 10g. I've got a couple slides and I'm really just going to get into the demo, so I'll switch over there in a minute. What is HTMLDB? Okay, I'm going to watch my clock, watch this, try to look at this.
I'm really bad at reading like you all do. So HTMLDB is a declarative web-based development environment. What does that mean? Well, that means you build it with a browser. You don't have to install any client software. You run it from a browser. Where you build it is where you run it. You don't have to do that, but you can do that if you wish. It's also a feature of the Oracle 10g database. You can stick it in older 9IR2 databases with a loss of some functionality.
For instance, the recycle bin Tom talked about is exposed in HTMLDB. So what does it offer? I'm not going to go over that because Tom talked about all the ilities, the securability, the integrity, scalability. We know all that. It just exposes all that for you. And again, it's built for the web.
So if you're like a lot of organizations, you've got your spreadsheets out there. You've got your Microsoft Access databases, your file servers. You've got it all spread all over. And you can read all the studies that say, oh, it takes 13 man hours to find a file in the server. And I've heard some companies have one website for two different people and all that good stuff.
So this is about consolidation. You use HTMLDB. You take your spreadsheets, and we will. You take your desktop databases, which we won't do in this example for time, and your files. You can put them all on the 10g database. And then you can look at them with your standard browsers. It's not IE specific.
God knows how many times I've gone to a web page where it says you've got to run IE6. We don't do that. You can run all of those browsers right there. Works just fine. The development team is a small bunch, but they care about all the browsers out there. They don't want to isolate anybody.
So there's four major components to it. There's the application builder. There's the SQL workshop, the data workshop, and administration. I'll go over each one of those tools in a little more detail. Yeah, we're going. We're going well. Am I going too fast? Anyone? All right, great. So the application builder is best explained.
A SQL workshop is what you think it is. You can go create SQL, edit SQL, save SQL, create SQL from objects. Say you've created a database, tables, 10 tables, indexes, functions, triggers, and you misplace that file. You're like, oh, now I've got to go rebuild that script. You can point HTMLDB to the database.
You can point HTMLDB to the table and say, rebuild the DDL, and it'll spit it all back out. We'll be able to look at all the attributes of a database, a trigger, a function, look at the source code, do all that good stuff with the SQL workshop. The data workshop allows you to load data in via XML, via cut and paste with the spreadsheet upload, via export import. It's all right there.
And then administration. So this is a, if you buy your rack of XSERVs and you've got a lot of data in there, you can go ahead and do that. You've got 20 development environments in there. You don't want to give them their own database. You want to take one database spread across all those XSERVs and just slice them up.
So there's a provisioning aspect to it. So Tom gets his 20 megabytes because he's special. I get my two megabytes because I'm not as special. And then what I see is completely independent of what Tom sees. He won't be able to touch my stuff. I won't be able to touch his stuff. And that's part of using the virtual private database part of 10G. the good stuff, the good stuff. So I'm going to go over here. Oh, good. Cool. Get rid of that.
So I've already logged into HTMLDB. What I'm going to do is I'm going to take a spreadsheet. I'm going to put it into the database. I'm going to have it converted into a table. I'm going to make a couple lookup columns on them. Then I'm going to build an application based on that table. I've already logged in. So let's go first go to the data. Actually, where's my spreadsheet? Where's my spreadsheet? I'm so glad I didn't delete that. That would have not made a good demo.
So what this is is a spreadsheet of a bunch of hardware we've got. I'm going to just copy that. Go back to Safari. I'm going to import the spreadsheet data. I'm going to create a new table. And these tables are real tables like anybody else. They're not second-class citizens. They're not some special objects as metadata.
The whole environment runs off metadata. These are real tables. And we'll pop over to SQL and prove that because people ask, oh, you're putting this in some freaky repository, aren't you? No, we're not. It's just regular. Go over here. Paste the data. Don't worry if it looks funky. It'll come good.
So it took the columns and made a good guess as to what they are. Well, the first one is a text field, so is the second one. CPU speed, we're pretty good. We're pretty sure that that's a number. The purchase date, we're real sure that that's a date. And then so on and so forth.
If we wanted, we could go on and do some other stuff with it. And then it gives you a couple columns here. And if you don't want a column, you can say no. I'm not going to say this is going to be better than SQL Loader, if you're familiar with that tool. But it beats having to try to get a login to the server. It's all web-based. So we'll name this table hardware.
Thank you for watching. I'll see you next time. I said, what's a sequence? I said, it's an automatic number that's generated by the database for you. And what HTMLDB is doing is doing all that for you. It's automatically giving you a sequence number so you don't have to try to manage your own index numbers for collisions. I've done it myself. I've tried to do that. I've had a key violation myself trying to do it.
Not here, Tom. I learned that before. And then it'll create the primary key. So this is all the good stuff that it gives you. So we're going to go ahead and create the index. Let it do that. Create the sequence. Name is already used. I forgot to drop that before. Let me go back. Give it a different table name. Hardware 2. So, can we cut that out of the video? So there we go. 37 rows succeeded, zero failed.
Woo hoo. So we're going to go back over to the application builder. And you can see there's already a sample application here. When you install it, the way to deploy a sample application is you click a button and boom. Click on the button and it just goes and runs.
You don't have to deploy anything. It's just a bunch of SQL that it runs to create the database, create the application for you. And that's what they do when you provision a user. So we're going to go create a new application. We're going to have it based on existing tables.
This user has a schema associated with it called Apple. If I wanted to, I could have multiple schemas. This is the default schema that I told it when I provisioned this user. So it's going to look in this schema for the table.
[Transcript missing]
The nice thing is, as a developer, I don't have to manage state. If you're familiar with maintaining session state, it does it all for you.
You don't have to worry about, oh, when I go from page A to page B, collect up all the information in the form fields and save it off into some memory space. It does it all for you. So it really makes developing applications pretty quickly. Then I'm just going to say finish. It's going to chug.
Just go run it and see what we've got. We've got basic security on it. You as a workspace manager can pick what you want. Right now, this is using database users, and as a workspace manager, you can create those yourselves. If you want, you can use LDAP-based authentication.
You can use single sign-on authentication. I don't think most of any of you hopefully have Microsoft Active Directory, but you can use that as an authentication source so you don't have to convert all this data and move it over and keep it in sync. It'll do it. So let me log in.
So this is what it generated. It generated an application with a couple tabs and a couple components to it. So let's go look at the hardware. Let's go analyze it. It threw in some charting by default. It took all the columns in the table and said, okay, these are all the columns you have.
Let's go show a chart. We'll do that. It's an SVG chart, so you can do all that zooming and whatever. If you want, you can click on one of the components. It'll show you a SQL query of just that data. Literally, all this was is I just typed in, it typed in, I didn't do anything. You saw me. It typed in select star from hardware to where, what did I say, brand equals Dell.
Great. Then if I wanted to, I could go do it myself. I can go edit these. If you've done web-based development, you're always familiar with the create, update, delete functions. This takes care of all that. It's kind of the entity relationship builder that it has in there. It went out there and did it. It knew that this was a date, so it said, oh, I'm going to just go throw a pop-up date picker right there.
Then I can go apply the changes. I don't have to worry about commits or anything like that. It takes care of that. I can worry about it if I want, but I don't have to. This is just a sample application. You can go to town. I have a couple example applications of what some people have built with it. I mean, we're just using straight tables here, but you can use spatial data. There's an example of that down the road.
And then, so if we wanted to, we can, as a debugger, we can flip it in that mode and it'll show you everything that it's doing as it's rendering. But right here you can see all the session data. You're like, well, what is that value? I've got a bug. Maybe I don't.
But you can see the value right here and see what it is, what it thinks it is. And so it's a nice way to just keep track of it without having to do it if you're familiar with C. PrintFs all over the place or systems that out there, printlens.
You've got all that right there. The other thing that I wanted to do is go back and edit the application. And I told myself I was going to forget and I sure enough did. I forgot to create the... The lookup table on that. So let me go over to the SQL.
There's the recycle bin. I'm going to go browse the tables. Where's the tables? Find my hardware table. For someone like me who doesn't live in the SQL world, I love it. Tom lives in the SQL world. He's a SQL Plus jockey, among other things. So he loves that. I love this because I can just drill in here and start looking at stuff and have to worry about it.
This was the trigger that it put on there automatically. So I'm going to go create a lookup table for this. And I'm going to pick the brand looks good. I'm going to take that brand. I'm going to say next. It's going to make a new table out of it. It's going to soak up all that info and put it in there. and then index in table and then replace that.
So if you wanted to, you could see where was that? View the data model. And you can see that brand lookup is now related to hardware 2. This isn't a really fancy referential integrity, but you can get to the point when it gets larger. It can be pretty useful. So we'll just go back over here to the builder. Let's just do this one more time for reinforcement. Go back, pick hardware 2.
[Transcript missing]
So right there you see the brand ID now turns into a pick list and it's smart enough to know that, look at the referential integrity, see the relationship and create an appropriate type for that. That's pretty much HTMLDB in a nutshell from getting your spreadsheets out of everybody's desktops off everybody's desktops, consolidated into a database. I did a project for a boat racing group and they had spreadsheets all over the place.
Files here and there, spreadsheets everywhere. We told them to use this tool before it wasn't even produced and they were quite happy with it. They had literally sailors out there developing HTMLDB applications. How am I looking on time? Seven minutes? Okay. I'll let that run. It's not even showing anymore.
Can you flip back to that one real quickly? I apologize. So anyways, this was a spatial data thing. You can see that somebody-- we tracked a vehicle from, I guess, their home. They apparently went down to the airport. When somewhere else. This is all data running out of the database.
The map is generated by the spatial data system, and it's just making requests. You can pretty much drop these type of objects on there. If we wanted, we could zoom. You could see this person tends to speed a little bit. I think that's a 65-mile-an-hour toll road, though. So that's the type of stuff you can make. Anyways, it's another little bullet point of what gee whiz stuff there is available. So if we could flip back to the presentation. Thank you.
So we talked about how it works a little bit. You can see that it's centrally deployed and managed in your single database, be it running on one laptop like I have or a rack of XSERVs and X-RAID systems. I've got one of those in the lab. I thoroughly enjoy it.
Access anywhere via browser. We literally do development on the plane with our browsers, take the application, put it back up into the system, and it's just as easy as moving applications like that. And it's declarative. I don't have to go out there and say, if this condition is that, then do this. You can just say, you know, show me this text field only when I'm an administrator. Don't show it when Tom logs in.
Show an administrative tab only when the person who logged in is a member of this LDAP-specified search hierarchy. So it's a little simpler for some people, but you can also go to town and create normal PL SQL procedures, Java stored procedures, anything that's callable from the databases is a fair game.
This is what it looks like. That's pretty much it. As they say, that's all, folks, with that. You've got the little yellow box. That can be one machine. It doesn't have to be. But basically, you've got your database. It runs HTMLDB. You don't have a middle tier. You don't have edge servers and cache servers and all this kind of stuff.
It can be one machine. We typically do that. You can split it out if you want to have your HTTP server. If you have a HTTP server running somewhere else in a real world, you probably would because you want to have a firewall and all that good stuff with the load balancers. And then that's pretty much it. So the installation for the companion CD that's available on OTN has it.
It's a really simple install. You don't have to sit there and say, okay, now where does this go? How does that hook up? Invariably, somebody always says, well, are you trying to get rid of your portal product? How is this going to fit? And then you get to J2EE stuff.
And we're not trying to displace J2EE by any means, especially since we have our own product. But you can look at it and you can see that J2EE is standards based. This is HTMLDB. It runs PLSQL. That's what it's built in. It's taking advantage of all those wonderful 10G features that Tom talked about. The development approach is a little more ad hoc.
Somebody says, hey, I've got this spreadsheet. Can you do this? Build an application, track something, do whatever. It doesn't mean you've got a lot of silly little applications that are one-time use. You can, but it's more, it leverages itself for that as opposed to J2EE, where if you want to do a right J2EE application, you've got to know what you're doing. You have to design it well and scale it.
And then the development environment, J2EE, the Eclipse, the Xcode, the JDeveloper, those are all phenomenal IDEs. But if you've just got a web browser, you have to go somewhere and they don't allow you to bring your computer with you like the CIA. You've got a web browser, you use theirs and you're good to go.
And then it's for really the non-professional developers out there. Like I said, sailors have been coding applications, non-trivial applications I might add. They literally got the do SQL in 21 days type of book and they were ready to go. So that's a little flavor of what HTMLDB is and isn't and how it relates. relates to J2EE.
If you want to try it out, you can go download it, install it yourself, and go to town. If you want to just get started in 10 minutes, just go to this website, htmldb.oracle.com. It's free. You log in. You ask for a workspace. It'll get provisioned automatically within an hour or so, depending on if the guys are still there.
Then you can talk. There's a discussion on OTN forums. Oh, how do I do this? How do I do that? Then you can also go to the HTMLDB studio where there's examples of how to do this, how to do the... If you don't want to build that common look and feel from the application builder, how do you make your own master detail relationships? How do you put a multiple row edit? All these... They have viewlets out there. You can look at them. It shows you how to do it. And you can... Also go to htmldb.oracle.com/otn is where the official home page is for that product.