Mac OS X in Scientific Computing - WWDC 2006

General • 52:52

Over the last year Apple has released many new hardware and software technologies targeted to meet the needs of technical scientific and high performance computing. These technologies are spurring increased developer support and scientist adoption of Mac OS X. This session will review the technological advancements Apple has made over the last year with respect to the sciences, the market momentum Apple is experiencing in the scientific markets, and the variety of initiatives underway throughout Apple to support scientific computing on the Mac.

Speakers: Elizabeth Kerr, Bud Tribble, Fons Rademakers, Falko Kuester, Alexander Griekspoor

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Good morning everybody. Welcome to Mac OS X in Scientific Computing. For those of you that I haven't met, I think there's quite a few of you, my name is Liz Kerr and I head up the Marketing Group for Science and Technology at Apple. This is my third developer conference and I have to say I'm thrilled to be here.

The momentum that we've seen over the past few years in this market is tremendous and it's really very much tied to the scientific apps that you all bring to the platform. So I'm going to spend a few minutes to kick off this morning's session to talk a little bit about that momentum and also to talk about how my team is working to keep that going both as a team and with our developers, with you guys.

So the first thing I want to talk about is just some interesting trends we've seen in the growth of the science markets in three main areas. And this is really being driven by new applications and better applications in all of these areas. In things like medical imaging with programs like OSIRICS that keeps getting better, and new developers coming to the platform like a group called Calgary Scientific. And we even found out this year we have a medical image storage system that's FDA approved that has come to the platform. Really exciting stuff in medicine.

And engineering programs like LS Dyna finally coming to Mac OS X. Huge, huge benefit to our customer base and a real good opportunity for us to expand into these different areas. And then in chemistry with programs like Vita from OpenEye that brings a three dimensional element and just beautiful graphics to the platform to expand what we have to offer for our chemistry customers.

One of the things that has been so exciting to see from an Apple perspective is just how fast our science developers switched to UB apps and provided those out to the community. Really faster than any other sub-market area that we saw. This is just a graphic to show, we're obviously not all the way to February yet, but looking at the top 100 scientific applications, just how quickly those came over.

I mean it seemed like we had at least a handful even when we announced the Intel processors. It was fantastic. So this is something that we're thrilled about, you guys should be extremely proud of, and it's just a real testament to how dedicated you are to providing the science customers with the latest and greatest versions of your applications. take advantage of our technology.

This is a little bit of a digression, but one of the things we always get asked about, especially from our scientific computing customers and developers, is Fortran compilers. So I wanted to spend a minute and talk about the Intel Fortran compiler that's available for Mac OS X. This, of course, supports the Core Duo processors. It's got an auto-vectorization feature, which is new, not something you could use for PowerPC.

There is whole application optimization with the inter-procedural optimization feature. And then if you have an appropriate training workload, you can also get further optimization from your app with profile-guided optimization. Now this is called the Professional Edition, and there will be special pricing for the Fortran compilers through December, so through the end of the year. So something to speak to the Intel folks about if you're interested.

So a little bit about what we've been doing, my team and partnered with our developers in this past year. And just to tell you a few ways that we've been working with the developer community. The first thing I want to talk about is just our conferences and trade shows because we really do mostly partner with developers to show complete solutions on the platform because that's what our customers are very much interested in. So at national level conferences like Society for Neuroscience as an example. But we're also doing smaller things.

Regional shows, we did a show called BioCom which is a small biotech show in the San Diego area which was really fun and a great way for us to get the message out about the Apple platform and the solutions available on the platform. And then even to the level of users groups. So a lot of you probably have users groups for your customers. And just to let you know that we're... We're always interested in playing a role here if it's appropriate. And this is just an example we did, participated in the OpenEye user group this year.

So for those of you that visit apple.com/science, and I hope all of you do on a regular basis, this is really the heart and soul of the science marketing group. And we have a new web page that launched yesterday, a new look and feel to the home page and some new sections of apple.com/science.

And the message here really is just that we're wanting things to be a bit lively or a better place for your customers and our customers to learn more about what's available on the platform. But also a new focus, this is the home page, on science solutions. And this is really critical for us to have the right applications on Mac OS X and really highlight the way that the technology comes together with the applications to provide customers with a real workhorse for them in the lab or in the lab.

And this is really critical for us to have the right applications on Mac OS X and really highlight the way that the technology comes together with the applications in their office. So this is just a screenshot of the new solutions page. And you'll see this evolve over time as we get really nice things come together that we can highlight on this page. And each of these, of course, will have its own sub page with much more detail. And each of these, of course, will have its own sub page with much more detail.

The other thing that we do fairly often with developers is we highlight new applications to the platform. And this is an example with Bitplane coming to the platform. And again, this is a rework of this page, so it pops out a little bit more if you're brand new and we can highlight you here. But it's a great way for us to help you get the word out and have some marketing weight behind a launch on Mac OS X or an improvement or whatever it is.

The other thing that we do are profiles. You've probably seen these. We do a lot of them focused on customers, but we also do them occasionally focused on developers. This is a group called Thrust Belt Imaging that we recently published on our website. Fantastic story about this group of developers.

I would encourage you, if you feel like you have a compelling story or if you have a customer that's using your application in a cool way that you think would be a good story, please let us know. We'd be happy to consider it, and we're always looking for great profiles and stories, both from a developer perspective and a customer perspective, or a combination of both. So please let us know.

For those of you that forgot, there's something called the Mac Products Guide. And this is just to remind you that it's still here. It's still a great resource for our end users. They go here all the time to learn what's on Mac OS X. If you don't have an updated profile or a profile at all on the Mac Products Guide, this is an end user, you put your own information into the Products Guide. So I encourage you all to do this if you haven't or you haven't looked at your entry in a long time. Keep it fresh because this is where people look.

So another thing that we do on a periodic basis is we send emails out to our customer base. And in many cases that's hundreds of thousands of people that could get this communication. And we've been doing some of these with a focus on, again, product launches from our developers.

This one is a product called Array Assist from Strategene. And it's a great solution for our customers, so we wanted to highlight this. So this is just another example of how we try to really work with you to kind of boost up the message of Apple and our solutions.

I wanted to mention some community resources just for those of you that aren't aware of them. I think most of you probably are by now. But there's some great organizations that are, the first two are independent of Apple but we work very closely with them. Macresearch.org and Macenterprise.org. These both have websites and communities and are great forums for discussion, product reviews, product discussions, lots of information that's completely independent of Apple so there's no editorial from us.

So it's a nice community, these are both really nice community forums. There's also the Apple Developer Connection and this is probably a resource you're really familiar with but I encourage you to review that occasionally because there's lots of new information that gets posted there that could be quite useful for you. So with that, I'm going to turn it over to Bud Tribble, who's our VP of Software Technology, and we're excited to have him here. Thanks, Bud.

[Transcript missing]

What I'd like to do is talk about Mac and the sciences and start out with spotlighting some trends that we see going on and I'm sure all of you see going on as well. Science to me is one of our more fun markets. We like to, you hear a lot about Mac being in the creative markets.

To me I like to think of the science market as being the most creative market we're in. There's a lot of change going on right now in the scientific market. I'd like to go through what are the things that are driving that and illustrate that. We have a few customers here to illustrate some of these points.

First one is data sensor proliferation. And this is leading to an explosion of data which we'll get into in the next section. But really what's happening is that sensors are going to be the most important part of the data. And what's happening is that sensors are becoming cheap, small, ubiquitous, networked with mesh networks. There's a torrent of data coming in from these. Now this will be everything from large projects that generate in and of themselves torrents of data.

Like the synoptic sky survey generating petabytes per year. You're going to hear about the Large Hadron Collider at CERN that again is a petabytes per year generation of data. These are pouring into disk drives and pouring into databases. As that data gets analyzed, it explodes yet again to keep around the analyzed data. All the way to small meshed networks that are monitoring microclimates.

Seismic monitoring is one example. The data is coming in all sorts of diversity. The industry is settling in many cases on standard formats for this data, open formats for this data. These sensors are out there. They're always on. They're pouring in data. They're supporting scientific research. But we have to put all the data somewhere. Just right here in California, under our noses, there's a 24/7, 200 plus seismic stations monitoring, pouring that data into companies and the government agencies that are watching the earthquake activity, trying to predict the next earthquake. This is a tremendous amount of data to deal with.

Which brings us to the next issue, which is data storage. As I mentioned, some of these projects are generating petabytes per year. And it's actually stripping the ability, as fast as we can build disk drives, it's stripping our ability to store the data. The San Diego Supercomputer Center has a project called Data Central, which they're very proud of. It has a petabyte of online storage with about 10 times that much offline. But as you can see with these large projects, and as you'll hear about with these large projects, even that kind of a storage facility is rapidly overflowed.

There's a recent study in Nature that made a determination that the amount of scientific data is doubling every year, so you have an exponential growth curve. I actually think that's probably a little underrepresented or underreported. If it was only doubling every year, then actually our disk drives would be able to keep up with it, because the storage capacity on disk drives is actually doubling about every year for the same price.

An extremely important thing for these online repositories is that they're accessible by multiple scientists. So these many times are open repositories that can support multiple research projects going on around the world. That's not always true. For example, in oil exploration, these databases tend to be tightly held for reasons that are sort of obvious. But in many cases, there are multiple research groups supported around the world working on these data sets.

There's an increasing use of storage area network for these systems. And that is simply because really there's no other way to manage these terabyte or petabyte class amounts of data. Just managing them as a flat network file system is too much of a management burden really for anybody.

And what SAN really brings to the story, or projects like XSAN, products like XSAN, is the ability to manage these huge data repositories. To illustrate this, I'd like to invite Fons Rademaker from CERN. He works on the Large Hadron Collider and he'll give you just a feeling for the amount of data that are being generated and what they do with it. So Fons, thank you.

Okay, good morning. Most of you probably have heard of CERN as the place where the web was born. But actually we are doing, you know, our core business is particle physics, pure research in high energy physics. So there's no direct commercial offspring from this or goal of it. So our business is the bleeding edge physics and to be able to achieve our goal we need bleeding edge computing.

First of all the CERN Large Hadron Collider is a machine that's been built since the last 10 years and hopefully it will start in September 2007, producing its first collisions by proton beams that collide on each other with a center of mass energy of 14 TeV, which is by far the highest energy that any accelerator ever has achieved.

The LHC is a 27 km circumference machine, it's under the border between Switzerland and France in a very nice location. So very convenient to work. It consists of at least some 1200 superconducting magnets, cryo-dipoles. Each dipole, the whole system is filled with liquid helium and cooled to almost absolute temperature. It generates a very high magnetic field to keep the relatively heavy protons in the tight orbit.

Of course to detect all the particles that are being generated in the collisions we built humongous detectors, very large detectors. There are four detectors being built at the four different interaction regions. Two are called ATLAS and CMS. They are general purpose detectors that will measure basically everything that happens in the collision and they will be able to analyze and determine all kinds of new possible physics that can happen in this region of energy.

Then you have two other detectors, ALIS and LHCb. They are more custom detectors for measuring really specific physics events. And ALIS, the detector where I work for, we also work with detecting heavy ion collisions. So one month per year the LHC will run heavy ions and then you get a huge data stream of, we will generate huge data streams.

So to summarize, the four detectors together will generate about 10 petabytes of data per year, record 10 to the 10th events in total, and there will be about 4,000 or 5,000 physicists worldwide trying to analyze this 10 petabytes of data day and night. So, the LHC of course, the accelerator, the beams circulate with almost the speed of light, so you have about 40 million collisions per second happening.

Of course nobody can record this kind of amount of data streams. So the experiments after filtering, we get about 100 interesting collisions per second. So depending on the detector, the amount of data they measure, you measure from 1 to 12 megabytes per collision. And with that rate you can get up, for the case of Alice, to 1.2 gigabytes per second. And that data rate will be sustained for at least 24/7 for one month, after 1.2 gigabytes per second. And that's about a bit more than one petabyte of data that you have to record in this one month of time.

Like I said, 10 to the 10 collisions will be registered, that will be about 10 petabytes, and if you put it on conventional CDs you get a stack of about 20 km high generated in one year. And that for every year, and the machine is planned to operate for the coming 15 years.

So how do we are going to analyze this data? Of course you cannot have a single data center that will be able to process all this data. I mean there's no way that you can even put them in one room. So what we are working on is, okay also before you can put them all in one room, you need worldwide collaboration. So nations don't want to invest the money to put all the machines at CERN in one location to analyze the data there.

So all the nations want to prefer to build large clusters in their universities or research centers where the local scientists will analyze the data. Of course also the competences of the people, the system managers are naturally distributed. And like I said, just at CERN there's not enough real estate to house a computer center of that size.

So all these computer centers, they will be running, 99% of them runs Linux for data processing and the different levels, different versions of Linux and different hardware. So it's a heterogeneous environment where the software has to work on in reproducible ways. You cannot have physics that runs one analysis in one site, turn different results when it runs on a different site with different versions of the OS or compilers. So let's be well understood and well maintained.

To provide for the physicist an environment where they can just submit the analysis jobs from a central point and don't care where it runs. We have been working for a long time on the concept of the grid, which has been very popular. You know, people make everything as named grid, but the real idea behind the grid is that you can run worldwide in a way that it looks like you run in a local data center. And actually we will need grids of grids, because for political and different reasons you don't have a single grid. I mean, people make the European grid and US grid and Asian grid and they all have to work together.

So for example we have the EEGE grid, which is a European project that was funded by the European Union and is now worldwide deployed. So this test setup, because we don't have the real data yet, but this environment where we're testing, we have 180 sites online, providing 15,000 CPUs at the moment, and running 14,000 jobs per day. Considering that for the full data analysis you will need about 100,000 CPUs online and with 400,000 disks with the data ready on it.

So how is Apple software being used in this LHC computing environment? Of course, like I mentioned before, all scientific computing is UNIX based and Mac OS X is a very natural and very good fit there. People know immediately how to use it coming from Linux. The big advantage of course is that on this kind of machines we can run the presentation and office software at the same time as the scientific software, which has been still lacking a lot on systems like Linux. We have excellent development tools. You heard the talks about Xcode and things like that. GCC compilers that we know and trust because we use them on Linux too, and we know the results and the quality of the code.

For the Mac clusters we have, we use of course the excellent cluster monitoring software, which is quite ahead also compared to the open source software that we find on the Linux environment. So it's a very nice environment. We found that the Intel compiler, the C++ compiler, because all our code is purely C++ in the LHC computing environment, delivers up to 25% faster code than GCC. So this is very important because it's almost a CPU generation that you gain by buying or getting this compiler. Edwin Schroeder, and Jens Stoltenberg.

We are waiting for Valgrind, which is an excellent memory leak detection tool. We are also having some problems with the GDB in the C++ environment because it is not up to par. We are very happy that a company called ATNOS released a TotalView debugger for Mac OS a few months ago, which is an excellent debugger and it works perfectly.

So the hardware that we can use in the LEC environment, of course the move to Intel was an enormous, you know, made Apple a candidate to be used in our field. Because before we used some PowerPC based machines, but they were just not fast enough for the money that you could spend on it. So now with the MacBook Pros, they are by far the most popular laptops that we use at CERN.

We have, you know, basically everybody gets one. We have powerful OpenGL based graphic systems used for event display. So the very nice infrastructure and the pipelines in the Apples help a lot. So in Alice we have, for example, starting small, a small eight node cluster that we have configured as a full member of one of the EGE grid.

So it runs all the software which is being delivered by the grid middleware from the central servers. And it's a full member of the grid software. So it's a full member of the grid environment. So the interesting thing is, of course, that universities that see what we do at CERN can decide now to buy an Apple based cluster because they know that all the software, grid software, middleware that is needed runs on the Apple machines. And before that was not the case. And now the Intel based XSERVs have been announced and we plan to make quite an extensive upgrade to these machines.

So to summarize, the LHC will generate data on a scale you have never seen before, nobody has seen before. We will depend critically on how the grid will work, and it has to work, otherwise we are in big troubles. And Apple, with the move to Intel, opens, you know, gives us a fantastic way to get into this business. And, you know, they can really approach universities with, you know, to set up XServe clusters for this kind of work. Okay, that is it.

Thank you very much. So 10 petabytes per year of data being generated. I did a quick calculation. I think we're probably shipping about 500 petabytes per year of iPod storage. So if everyone would just set aside 2% of their iPod space per year we could archive all your data.

I'm not sure people would be willing to give up 2% of their music though. Data visualization. Data visualization is one of my favorite areas. It's very visual, it's very innovative right now. Of course the trend here is driven by these huge data sets that are being generated which ultimately come down to something that a person wants to visualize and with a lot of data you end up with very high resolution data sets. One of the latest things are display walls and we're going to hear in detail about the hyper wall at UC Irvine.

But these are very important in my mind because you find that the human brain is actually very good at visual data processing but when you're just focusing on a desktop you're actually missing out on what we evolved to do over millions of years which is to take in our entire environment by moving our head or walking around and these large display walls really let you take advantage of that human processing power in a way that's not been possible before. Of course stereo 3D is important as well and we support that on Mac OS X.

For the first time you can do really high end visualization tasks on desktop machines. That is actually more than anything being driven by the game industry and it's insatiable need for very high end graphics processors. And with the new Pro line that you saw that will be able to support extremely high end graphics processor and coupled with an easy way to program it, I think that's going to lead to a lot of new things. I think that's going to lead to even more breakthroughs in scientific visualization.

I think that if you look closely you'll actually find that because of this pressure of the gaming industry the abilities and the programmability of the GPU has actually outstripped the scientific community's ability to take advantage of all that power. But I think with some of the programming tools that we're making available on Mac OS X you're going to find some really great things. And I think that's going to lead to even more You're going to find some breakthroughs there.

Graphics processing power has actually been growing at a faster rate than CPU power. Maybe that's because they are willing to run it hotter and put more fans on the graphics cards. But we're up to around 10 gigapixels per second processing rate, which is, again, much of the legacy visualization code does not realize this, does not take advantage of it.

But some of the newer visualization applications coming along are jumping on this bandwagon of, hey, I've got 10 million pixel process per second GPU sitting right here. What can I do with it to give a very interactive scientific visualization experience? With that, it's my great pleasure to introduce Dr. Falko Kuester, Department of Electrical Engineering and Computer Science at University of California at Irvine, and he's going to talk about the Hyperwall. Thanks a lot.

So I'm going to talk about some of the research challenges that we really face today when it comes to dealing with massive scientific data sets. Now one of the things we just heard over and over again is ever bigger scientific data sets are being generated today through simulations, through acquisition, or a mix of the two. Now that's exciting, it's a very good thing certainly, right? We get more detail and hopefully we get more insight into particular phenomena we are studying. The question is how do we get there, right?

Just having the data doesn't help us, we somehow need to visualize it, conceptualize it. So in the end, we end up with a high resolution image that we somehow have to display, right? Regardless of which dimension the data set has, we need to display it in 2D on our screen.

So this could be one data set, one set of, one particular parameter, or it could be a set of parameters, a set of different images co-located that are visually being displayed. So the question is which one really works better for what we are doing. In particular, if you have to compare different data sets through some of the techniques which work so well for the human, for visual data correlation.

The question is then do we use pattern recognition, being able to compare things side by side, or do we use our temporal memory, which we oftenly do today. We use that, look at one image slice or one parameter set at a time, then sequentially we go through all of these data sets and try to composite that in our brain.

Doctors do that a lot, right? It's not necessarily the best way to do this. Now if we look at one sample data set, that's a confocal microscopy scan that you see in the lower left corner, of a red brain, very similar to the human brain. 18,000 pixels by 17,000 pixels resolution, 320 megapixels total for that image. Great thing, good, very good. Now we have to map that to state of the art display devices. Let's say we pick a 30 inch cinematic display. We have to put 1,560 by 1,600 pixels resolution for 4 megapixels total.

Now that becomes interesting, right? 320 megapixels to 4 megapixels. So we essentially look at 1/80 of the information at any given time. Which means if you do a 1:1 pixel mapping, we only see 1 out of 80 tiles at any given time. We don't get the bigger picture.

Or if we sub-sample the information, we throw away a lot of information that is really available to us. That's somewhat bad. What gets really ugly though is if you deal with medical data and the information you're looking at is a brain scan or a scan of your body, and what you're sub-sampling away is the cancer seed cell which is going to kill you in a couple of years, right?

And the donor at C might actually kill you in the end. So that's not good. So that's really ugly. So we have all the three key components. The good, the bad, the ugly. The question is how do we handle the bad and how do we somewhat address the ugly, right? So let's see high-performance motivation. So high-performance, network-centric.

That's why the IP is in the title "Display Wall". And the objective really is to support visual analytics, meaning analytical reasoning through interactive visualization. And to really do that for large data sets in larger or with larger collaborative teams of researchers, allowing you to look at multiple view parameters at any given time, co-locate information, have a comparative view, but do that in a form where we really have video streaming or streaming capabilities in general from diverse sites across the globe.

So there are a couple of really interesting, exciting, enabling research areas, many not fully answered, so there's much for all of us to do, in regards to how do we handle images, video streams. How do we best move data around? How do we cache it? Where do we cache it? We can't replicate a petabyte locally just to visualize it. If you want to look at one particular subset of the data we just heard about, how do we visualize it in real time?

How do we support parallel rendering? And in the end, how do we put ourselves into that data? How do we interact with it? Just displaying it statically doesn't help us much. We need to be able to really manipulate our own forces. So the hardware front end we built for this research as part of HyperWall is a 50 tile display.

Downstairs you will see an 18 tile version that you can play with. So 10 tiles wide, 5 tiles high, 50 tiles total. Each tile essentially has 4 megapixel resolution, which gives us 25,600 by 8,000 pixels overall to work with. Now let's put that in today's gold standard, right? HD, high definition video, 1080.

We have roughly 2 megapixels resolution, right? So one half of one HyperWall tile is high def video. Or in other words, HD is 1% of the resolution that HyperWall gives us. Why do we do this? We're sort of trying to extrapolate ahead where we're going to be a couple of years from now when we have organic light emitting displays that we can glue on walls or simply spray paint, right?

How do we drive this? Or since there are 10,000 of displays shipped already of this particular type, what happens if you connect them today to build these pervasive visualization spaces? How do we drive that information possibly, right? How do we get people to collaborate? So in this case, the 50 tiles are powered through 25 dual processor, dual core display nodes. There's roughly 50 terabytes of cache disk space storage behind it. And everything is networked essentially through fiber locally for data transport visualization, but then globally through the Optiputer, a dedicated optical. And that's the optical network that we can work with.

Now, the obvious challenge, right? Massive amounts of pixels have to be filled. If you just think about 24-bit RGB per pixel, we have 4.8 gigabit per frame. Now we want 60 frames per second, right? So that gets quite interesting. How do we generate these pixels in the first place?

How do we fill them in the end? And how do we do it at an interactive rate so we can actually really work with it in a meaningful form? Now, scalability and failure tolerance, and suddenly becomes a really big issue, right? When one of these tiles fails, you're suddenly at a loss. It sort of blows your entire workspace to pieces. So we decided to go with a really reconfigurable IP-based display system, which allows us primarily to add components, new resources as they become available quite transparently, but it also allows us to replace or remove failed components.

Since it's network-centric, we have access to all of the existing data repositories out there. So being able to tie into existing data sources is of course important. And then to use the visualization assets you have, the data you have, to really drive interactive visualization, data fusion, leveraging the GPUs as much as we can.

So here are just a couple of examples. And if you want to hear more about the details or the fun stuff underneath the hood, which is really the science and research contribution, come downstairs and we'll talk a little bit. Here's one example, natural disaster management. Eric and Katrina, all of us heard about it. Quite devastating, both to loss of life and financial impact. The problem is you have pre- and post-event imagery, for example.

How do you identify what crisis hotspots are? How do you do it in team environments so that emergency management folks can deploy field responders more intuitively? So in this case, with image processing, large-scale visualization, we have a mechanism to get a better grasp of the bigger picture, the magnitude of what really has occurred, to address it.

Now, rather than one or a couple of big images, we might be able to deal with many smaller images. So we're working with folks in brain imaging to identify particular patterns that might exist in brain-related diseases, such as Alzheimer's, Parkinson's, schizophrenia, things that should be close to all of our hearts since they're quite pervasive.

So in this case, providing a lot of visual information concurrently for many patients, being able to scroll through particular slices over time and visually identify other particular patterns which may occur. And that, of course, works then in 3D as well, right? Surface, representation, volume rendered, or texture map information, so in this case, brain activation on top of everything else.

Earth system science, another huge area of this gigantic computational resources, hundreds of thousands of processes running problems. One example being the Intergovernmental Panel on Climate Change, which is doing a 100-year simulation of how our climate will change. 20 different groups worldwide, independent from each other, that want to compare data and figure out if there's at least one parameter that everybody agrees on, which is difficult, right? So we can't just average all the data together and visualize it.

We really need to show it co-located so teams feel comfortable that their particular simulation is maintained. But in this case also, since grids, meshes vary, information is properly represented. And it was quite beneficial because individual simulation problems jumped out right away where somebody had a parameter wrong or the timing simply set up wrong. Now, if you go from image data, of course, to 3D, right? Digital elevation maps being a great example, terrain rendering, all the good stuff you've heard about. So that was Google Earth, for example.

And then of course the next step, volume rendering and so on. One very exciting area though is remote visualization or control of experimental infrastructure from very high-end imaging systems deployed in different areas. One example in our case is an electron microscope in our building which generates image data by far exceeding any display capability existing today.

So the idea is why don't we use this wall type interface to really cure and control the imaging system which might exist somewhere else, right? All we really need is that big network pipe. So in this case what you see across the entire display is actually 300 microns, right?

Lower left corner gives you a scale of 10 microns. So that's a tremendous amount of very high resolution image data that's suddenly being processed. Now there's an interesting set of impacts that we're aiming at. And by long shot we're not there yet, right? So everybody has a challenge to contribute to that hopefully. So HyperWall is intended to enable analytical reasoning through interactive visualization. There are three important areas: support interactive collaboration on a somewhat larger scale than just our research lab to help with the detection and correlation of patterns.

Leverage really the power of the human visual brain to intuit data. We're all very well trained to work in our particular domain. If you bring people together, cross-fertilize, tremendous impact. But the key aspect really is we want to enable the discovery of the unexpected while exploring the expected.

So in visualization in many cases we look at data we're already expecting. But we visualize it the way we expect it to be. Now if we find the things we haven't been able to see before, that's a huge contribution. So in the end we have to provide a means to really analyze large scale data sets in 2D, 3D or higher dimension.

And makes that one innovation right to save our life at the end hopefully. So feel free to come by downstairs ask more questions. Feel free to talk to us in person. There are many interesting research opportunities in our group. Bold plug here, but many open jobs in hands-on research, hyper-wall related, grid centric, and of course on plenty of good Apple hardware.

Thanks very much and I really encourage you to go downstairs and take a look at the smaller version of the hyper wall. It really is an insight as to what can the human brain actually process usefully. It's a lot more than just even a 30 inch display. This last area I want to talk about is I think absolutely critical to the sciences. Open standards and related to open standards, open source.

[Transcript missing]

So, I would like to start by thanking Apple for inviting me to talk to you about BioCoCo today. So, in biology, sequence data becomes more and more important and plays a tremendous important role. And that is not only in really large-scale bioinformatics projects like of the size of Phons, of Radema or Falco's projects. But even like for wet lab biologists like myself, biological data in the form of sequence data formats plays an important role. We work with it on a daily basis.

So this is one for instance and we usually retrieve them from these large public databases that Bud was already referring to. But for instance here you see a very simple format as we get the sequences which is basically a one line description followed by the actual sequence. But we also have more complex data formats like this one where you have a lot of extra metadata about a sequence like the date it was published and the species it was generated for or what's not on this slide.

This particular sequence for instance encodes a protein which is the molecular target of Viagra. But the big problem really is that with the big rise in bioinformatics we saw an enormous explosion of all these different sequence formats and it was basically because everyone was just inventing their own formats. So as a developer you're... Excuse me. As a developer... I will go back to that one.

So as a developer it's your task to basically support all those different formats. And that is a tedious task and not really fun to do because you basically have to reinvent the wheel all the time. And everyone has to do that. And also Peter Schoels from Belgium recognized that as a Cocoa developer.

And he decided that it would be time for a lightweight Cocoa framework that would allow you to support all these formats with only a few lines of code. Well, you saw it already. As an example, this is what he came up with. Instead of having to get into all these open standards, you now just have to implement a few lines of code. We basically instantiate a reader object and the framework does the rest for you if you handle it to pass through the file. And it returns you a dictionary with all the sequence files information.

Now, when I was developing our DNA sequence editor in Cocoa, EnzymeX, this was all I basically need for now to have EnzymeX support all these different formats. And I could basically focus on all the pretty stuff and all the nice features, which was very handy. So, quite soon after I and another few Cocoa developers got involved in the BioCocoa project, we realized that there was actually a need for a much broader format.

Much like, for instance, the Java language has the BioJava project and Perl has the BioPerl project, etc. So, together we decided that it was time for BioCocoa 2.0. And what we wanted to do in this framework is the following. We started from scratch and we basically wanted to provide Cocoa developers with a complete set of model objects to represent biological sequences.

And of course, we would then still allow to support all the different popular sequence formats. And together this would form the core of the framework. But on top of that, we would like to add a number of tools to play with these model objects. Like, for instance, you can imagine you want to translate this DNA sequence in that protein, which is the target of, well, etc. So, and we would like to add advanced tools for sequence analysis, like alignment tools or these kind of stuff.

So, and finally, ideally, a set of UI elements that would allow end users to consistently deal with those model objects. And together this would give Cocoa developers a complete, powerful framework to make bioinformatic applications. Now, if you look at the framework overview, which you see basically here, which we have in mind, then you can obviously also see the analogy with the Cocoa. And we really try to make the framework really leverage this powerful Objective-C language and the Cocoa APIs.

And as an example, we use, for instance, NSData as a way in the BC sequence model object, as a way to give developers low-level access to the sequence data, but at the same time in a very user-friendly way. So, and I'm happy to announce that actually today we released a beta version of the core of BioCocoa. So, if you're interested in that, please visit our website.

Or we will do it later today because it still has to be uploaded. But you can find it there. I will show you the URL in the end. But it's not, we're obviously not there yet. So, we still have to add tools and that's what we're basically doing. But that's something we could really use a lot of help in. So, if you're interested, be sure to, and you're a Cocoa developer targeting biosciences, then be sure to visit our website. It's actually here. It's on bioinformatics.org/bioCocoa.

And even better, if you wanted to have questions already or comments, then yeah, feel free to visit us at the poster session tomorrow evening. Or we have, I think, a Thursday morning from 11:00 to 12:00 we have a session in the Science Connection. So, I hope to see you there. And I think BioCoa is a nice example of how open source can in this case maybe tackle the problem of, in this case, maybe too many open standard formats. But, and with your help, we can even maybe bring it much further than that. So, thank you.

Okay, so I'm just going to do a real quick wrap up and cover some things also going on at the show. First I'd just like to say I think we've heard a really nice discussion on some of the prevailing topics in scientific computing today. You know, the diversity of data sets, the really big visuals and the power of this, the need for really big storage and how all the data that's being generated is really driving this, and then open standards.

So how do we learn more about all of these things at the show and where can you connect with people and network with folks around scientific computing? I really want to call your attention to the scientific development poster session if you don't already know about this. So this is our first year doing a poster session and I have to say the idea came from our developers last year at the developer conference.

This was a request that was made and we were really excited to be able to deliver this. We have over 50 posters being presented. It's a fantastic idea. It's a fantastic way to find out what your colleagues are doing and network with folks and just read about some really cool development going on.

This is Wednesday from 7 to 10 p.m. There's also some demos of the code being presented in these posters in the Science Connection, which is here on the third floor, Wednesday and Thursday from 9 to 6. And there's a schedule, a revolving schedule at the entry to the Science Connection if you're interested in seeing what's going to be demoed.

In addition, we've been hearing, you know, go downstairs and see the hyper wall. That's actually in an area called Apple in the Lab. And we've got a big focus on scientific visualization there. It's on the first floor. It's open through Thursday. So take a look down there. We have the smaller version of the hyper wall. We've also got some fantastic 3D, stereo 3D visualization stations set up for medical imaging as well as for some molecular imaging, molecular modeling, and some other really cool technologies on display.

So take a look at that. The Science Connection, which is here on the third floor, it's just a place you can hook up to the network, sit down, talk to your colleagues, take a break, look at some demos, play with some hardware, whatever. It's just a nice place to hang out. And then we're having community discussions either in the Science Connection or right across the way today, tomorrow, and Thursday at lunch between 1230 and 130. And the topics are Mac OS X technology. So we're going to be having a lot of discussion about the Mac OS X technology.

And then we're going to be having a lot of discussion about the Mac OS X technology. And then we're going to be having a lot of discussion about the Mac OS X technology. There's some nice sessions here at the developer conference that really relate to the topics you heard today. I'm going to leave this up for those of you that haven't, you want to take a few minutes to look at this.