High Dynamic Range Imaging with Image IO - WWDC 2004

Graphics • 1:07:44

Image IO is Mac OS X's unified architecture for opening and saving popular image file formats. View this session to learn how the Image Quartz-friendly API simplifies working with TIFF, PNG, JPEG, and JPEG-2000. Additionally, Image supports high dynamic range (HDR) formats, such as OpenEXR and floating point TIFF, that extend visual fidelity far beyond today's 32-bit images. View this session to learn about Image and HDR imaging. This is a must-see session for developers working in digital video, cinema, and photography.

Speakers: David Hayward, Luke Wallis, Gabriel Marcu

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Thank you, Travis, for the introduction, and thank you all for coming to today's session on high dynamic range imaging with Image IO. What I want to talk about today and what you'll learn is about the new, exciting, emerging field of high dynamic range imaging and how you can take advantage of it today in Tiger using a new facet of Quartz called Image IO.

But before I talk about those two fields and the people who'll be coming up and talking about it in more detail, I want to give a brief update on what's new in ColorSync for Tiger. Because ColorSync is one of the key pieces of technology that allows for the proper rendering of both standard and high dynamic range images. So let me give an update on ColorSync for Tiger. We'll be talking briefly about adding floating point support in ColorSync, use of core foundation types, some API changes we'll be making, some notes for developers of custom CMMs, and some changes to ColorSync Utility's user interface.

So first and foremost is floating point support. As Travis mentioned earlier, one of the things we're trying to do for Tiger is provide a new high fidelity cinematic graphic environment for Tiger. And in order to achieve that, we need full floating point support throughout the entire system. And one key piece of that is ColorSync.

So in order to achieve this, the first thing we needed to do was to have a new bitmap structure in ColorSync for supporting arbitrary bitmaps of floating point data that your application can pass to us. We wanted to make the structure as flexible as possible so that you wouldn't have to repack the data before you send it to us.

So this new structure supports both chunky and planar arrangement of data and also allows for the channels to be in any arbitrary order. The way we achieve this in this structure is it's a little different from other bitmap structures you may have seen, is that instead of having a single base address for all the pixel data, we actually have a different base address for each channel.

This allows for the channels to be in any order. And then we also allow for both row bytes and column bytes to be specified, which allows for you to have your data be scanned in reverse order if needed, or also if you've got unusual packing between channels that you want to make sure to skip over.

So it's a fairly basic structure, but in most cases, most people will be passing in a buffer of chunky or interleaved data. And so we provided a simple utility function called cmFloatBitmapMakeChunky, which you supply at a single base address, and then it'll fill in the structure appropriately for you.

In either case, whether you fill in the structure by hand or whether you call this helper API, once you have a source and destination float bitmap, you can then call color sync to match data from one space to another. We have three functions to do this. The first one is cmConvertXYZFloatMap, which allows you to convert between all the CIE-related color spaces, so XYZ, YXY, LAB, and LUV.

And also another function, which is convertRGBFloatMap, which allows you to convert between the RGB-derived spaces, RGB, HSV, and HLS. Both of these functions are based on textbook formulas, and so as a result, there's no need to pass in a profile or color world to do the transform. It just does the math for you with floating point precision.

The last is probably the most interesting. This is the new API, CMMatchFloatMap, which allows you to pass in a color world reference to perform the actual transformation. You can create the color world by concatenating one or more profiles. And then at that point, the data will be sent through to the CMM, which, if it supports floating point data, will be done in full floating point precision.

One of the other changes we've made to ColorSync is integrated in more closely with the core foundation types. And the key way we've done this is that the two common ColorSync opaque data types, which are the CM Profile Ref and the CM ColorWorld Ref, are now CF types. And this is quite convenient because it means that you can now call the CF-based functions, such as CF Retain and CF Release. And it also means you can add profiles and color worlds to dictionaries or arrays.

This is kind of handy if you're passing profiles and dictionaries around to other parts of your code. The other way that we supported a core foundation type is to actually get the data out of a profile. One of the questions I often hear from new users to ColorSync is, I've got this profile reference, how do I get the data out of it? In the past, that was done by calling either CM Copy Profile or CM Flatten Profile. Now it's much easier. You can just call CM Profile Copy ICC Data, and it will return you all the data within the profile as one giant CFData type.

Next thing I want to talk about are some API changes that we're making for Tiger. Way back several years ago, one of the features we added to ColorSync, both at the API and the user interface level, were a set of preferences, so that applications could have one place to go to for specifying default profiles based on usage or color space.

And at the time, we hoped that this would be a way of simplifying the user interface across a wide variety of applications. And so we presented both API and user interface to help with this. In practice, however, it's turned out that very few applications have used this API.

And what we're left with is user interface in ColorSync utility where people say, "I can't figure out what this does because nothing I change here seems to make a difference." So we're listening to the usage and we're actually beginning the process of deprecating the API and also the user interface.

So we still want the--any application that's used in the ColorSync utility to be able to make a difference. So we're still looking at the application that's used in the ColorSync utility. And so we're looking at the application that's used in the ColorSync utility. And so we're looking at the application that's used in the ColorSync utility. And so we're looking at the applications that were using this API to function correctly. So what we're doing is changing the behavior of CMGET default profile by space, CMGET default profile by use, and CMGET preferred CMM.

Instead of storing their preferences as a setting that's global across the machine, it will now be stored in the current application, current host, current user domain. So the APIs will still function, but we're deprecating them. This has some ramifications also in the UI, which I'll talk about later.

I also want to take this time to talk a little bit about custom CMMs. One of the things that we've been doing over the last few years is making even tighter and more powerful integration between the graphics system as a whole, notably courts and printing and color management. And in order to achieve this with high performance and high reliability, we have made it so that courts and printing will only use the Apple CMM.

That said, we have a long tradition of allowing applications and other developers to develop their own CMMs and for applications to call those if they wish. It's still possible for applications, if they wish, to have a custom CMM and for them to explicitly create a color world using that CMM, and that can be done using the recommended API for this now, is NCW Concat Color World.

And this API has an easy, convenient way for you to specify which CMM to use. The other thing to mention for CMM developers is that there's a new entry point for CMMs, which is CMM Match Float Map. If your CMM supports this, then you can have a full floating point support throughout the rest of the court system. And if you don't support this, then the data will be truncated to 16-bit integers and everything will work sufficiently.

Lastly, I want to mention some changes we're making to ColorSync Utility. As I mentioned earlier, we're deprecating the Preferences APIs for default profiles. And one visual manifestation of this is that we were removing the user interface from ColorSync Utility. However, we're adding something in its place. We'll be adding a new utility to the ColorSync Utility, which we call a calculator. So let me give a brief demonstration of this on Demo 2.

over here. So as we see in the ColorSync utility, everything looks similar except that there's no longer the preferences pane as the first item. But we have a new item which is a calculator, which provides a very simple way to convert color spaces between all the various different color spaces using floating point precision.

This is a convenience that also provides a good way to demonstrate our floating point data path. So obviously, we can specify our source color space and our destination color space. If we're just converting RGB to HSV, we can see the slider values. We can update the sliders on the left, and they update on the right.

One thing you'll notice is because the RGB and HSV are related color spaces, they're basic formulas for each other. So as a result, the color on the left will be the same as the color on the right. If we switch to CMYK, you'll see something slightly different, which is now it's going through a profile. And if I go to a saturated color, you'll notice that the color on the right is desaturated.

One of the other things we added is the ability for it to be fully symmetrical. So now instead of just updating on the left, I can also update on the right, and it'll show you the values in that order. We can also, this is an interesting way to test out a CMYK profile.

We can specify that we want to input LAB values and output to CMYK. And as we scroll through all the possible LAB values, we can see what the resulting CMYK values will be. So that's the brief demo of the color calculator. We hope that's a useful function. So back to slides.

So the next thing I want to talk about is something that's all new for Tiger, which is this new facet of Quartz called Image IO. And again, as Travis alluded to earlier, we wanted to provide a new API for doing image processing or image reading and writing from a variety of formats. And this is Image IO. We'll be talking today about its features, its goals, what formats it supports, the clients of this API, some of the core concepts that you need to understand for using this API, and some advanced techniques as well.

So what are the features of Image IO? Well, first is we want to be able to read a wide variety of file formats and write to a wide variety of file formats. We also want to support reading and writing metadata. And also, we want to support incremental loading for clients such as web browsers that get data in an incremental fashion over a slow data connection. We also want to support floating point support, because that's one of the key initiatives for graphics in Tiger.

We also want to have broad color space support and something called cacheable decompression. I mentioned a little bit on this now, which is, typically, different APIs for reading and writing image file formats have one of two behaviors in terms of decompression. In the case of the existing core graphics APIs, every time you draw the image, it's fully decompressed each time. This obviously has the advantage that you have very little memory overhead, but it's a performance hit if you draw the image more than once.

Other APIs have the behavior that the first time you draw the image, it fully decompresses it, which obviously requires more memory, but has the advantage that subsequent draws will perform quickly. There are merits to both approaches. And so one of the approaches we've used with ImageIO is to try to allow for both features. Not all file formats support both approaches, but wherever possible, we support both philosophies.

Here are some of the overarching goals for image.io. First and foremost was to reduce code duplications. Turns out there was an embarrassing number of different variants of JPEG readers and writers and TIFF readers and writers within our system. And they all had different strengths and weaknesses. And if you were actually trying to write an application that read and wrote images, you had to make a choice between which strengths and weaknesses you wanted to use. We wanted to have a single reference implementation within the system and use that in as many places as possible so that we have a single place to make changes in the future.

One of the other goals is we wanted to leverage open source so that our behavior of our APIs was consistent with other implementations. And improve performance. This is one of the other key things. We've been spending a lot of time with the vectorization team at Apple to make sure that our key file formats decompress with optimum speed.

Another feature was lazy decompression in the sense that if all you need to do is get the height and width or metadata out of an image, you shouldn't have to fully decompress the data. So we want to support that as well. And lastly, we wanted to make sure we had a very modern, core graphics-friendly, and easy-to-use API that you could all easily adopt this in your applications.

So one of the first questions I always get when I'm talking about image I/O is, well, what formats do you support? And we support all the standards for the internet, TIFF, JPEG, PNG, JIF, and JPEG 2000. These are already supported on the Developer CD that you got this week.

We're also supporting some exciting new formats, such as some high dynamic range formats, such as OpenAXR, Radiance, and some important variants on TIFF, such as Log LUV, and some Pixar variants. There's also countless other formats we're going to be supporting, BMP, PSD, QTIF, SGI, ICNS files. And we're considering more, both for Tiger and beyond.

So the clients for Image IO. Obviously, we hope that anyone who wishes to use this API are free to use them in their application. But there's also lots of places within the system that are going to be calling Image IO. So you may get the benefits of Image IO without having to change your code at all.

Probably the first and most important client for Image IO is the preview application. This has been a great example of how the power of the new Image IO and some of the advantages you can get from it. It's making strong use of this new API. Also, AppKit will be switching over. It's not yet switched over in the current developer release, but AppKit will be switching over to using the new Image IO API as well.

WebKit and its clients, such as Safari, Mail, and any of your applications that are using WebKit, will be using Image IO. Core Image is using Image IO to load data in floating point format. Spotlight is using it for generation of thumbnails and getting metadata. And some of our scripting technologies, such as SIPs and image events, are also using Image IO. So we're trying to use this everywhere in the system.

Eventually, I want to give an outline on the API in Image IO, but before I do that, I want to talk a little bit about how images are organized so that you can get an understanding for why we designed the API the way we did. In previous systems, the standard way of representing an image in core graphics was with a CgImageRef. And this was a great basic format for representing images.

It allows you to specify three things: the geometry of the image, such as its height, width, rowbytes and pixel size, the color space of the image, which can be a profile or other equivalent description of the color space, and the actual pixel data. This is the minimum information you need to describe an image.

However, it turns out that there's a lot of file formats out there, and they are actually quite elaborate in many cases. And so one of the things we wanted to support in Image.io was a richer model for images. For one thing, we wanted to be able to support thumbnails and metadata for images.

And also, a lot of file formats support multiple images within the same file format, such as TIFF. So we want to make sure we support that as well. And also, there's a set of attributes that apply to the image file as a whole rather than to the individual file images contained within the image file. This is the file format of the image, such as whether it's TIFF or JPEG, and also some properties that apply to the file as a whole. For example, TIFF files can be big Indian.

Here's an example of how this works in practice. We're using an example of a TIFF file. The file type is public.TIFF, which is a universal type identifier that describes this image as being of the type TIFF. We have some properties that apply to the file as a whole, the file size and bytes, for example, and the endianness of the TIFF. And then we have the standard information for each image, such as its height and width, its color space, its pixel data, its thumbnail, if possible, and its metadata, such as copyright and artistry. information, you name it.

So here's how this model is reflected in our API through data types. What we have is we use the existing CG image ref to represent the geometry, color space, and pixel data. The thumbnail is also represented by a CG image ref. The metadata and the file properties are represented as key values in a CFDictionaryRef. So it's all very simple.

So now I can talk a little bit about the API. What we've added is a new data type called CGImageSource, and this is the opaque type used for reading images from either memory or disk. You can create a CGImageSource from either a CFURLRef, CFData, or with a CG data provider.

Once you have a CG image source, you can query the image source for several attributes. You can ask for the properties of the file as a whole using CGImageSourceGetProperties. You can ask for its file type by calling CGImageSourceGetType. You can get the count of images using CGImageSourceGetCount. Once you know the count of images, you can then, for each image, ask for its image. You can ask for its thumbnail, and you can ask for its metadata.

So it's pretty simple. Just to show you how this works, here's a little code sample that shows you how, given a URL, to get the first image out of the file. It also returns some simple metadata. In this case, it's just returning the DPI of the image in the horizontal and vertical direction. First thing this code does is call CGImageSourceCreateWithURL, which creates our data type for subsequent access to the file.

Then what we want to do is we want to get the set of properties for the first image. So we call CGImageSource.getPropertiesAtIndex, and that returns a dictionary. We can then query that dictionary to see if it has the DPI height and width properties and return those to the client. Lastly, we need to actually return the image, so we call CGImageSource createImageAtIndex, and that will return the image to the caller.

Here's another example for getting a thumbnail out of an image. Image IO is very flexible for creating thumbnails. As it turns out, some file formats support thumbnails, some don't. Also, with some file formats, the thumbnails can be quite large. Your application may need to have control over how thumbnails are returned, and we provided that with the Image IO API via an options dictionary.

In this case, what we're doing is we're again creating a CG image source by specifying a URL. And then we're going to be creating an options dictionary with two key value pairs in it. The first key is CG image source create thumbnail from image if present. What this does is tell Image IO that even if the image doesn't create a thumbnail, return the actual image instead. So we'll always get an image for the thumbnail.

The second key value pair we specify is CG image source thumbnail max pixel size. And this allows us to make sure that the thumbnail is of a reasonable size, which is especially important if you've specified the previous option. So in this case, we're saying that we always want an image to be returned, and we want it to be no bigger than 160 by 160 pixels.

Once we've created that dictionary, all we do is call CG image source create thumbnail at index, specifying the image source, 0 at index, and the options dictionary, and it's returned. This is, for example, the way that the spotlight technology creates thumbnails for images in the search results field.

So that's the basics of reading from an image I/O. Here's what we do for writing. We have another data type, which is CGImageDestination, which can be created with the CFURL, CF mutable data, or with a CG data consumer. At the time of creation, you also specify the type of the file, whether it's a JPEG or TIFF for example, and the capacity, or the number of images that that image will hold.

Once you have a CGImageDestination, you can specify the properties for the file as a whole using CGImageDestinationSetProperties. And then you can repeatedly add each image with various options and metadata at the same time using CGImageDestinationAddImage. Lastly, you can flush the file out to either the URL or to the data by calling CGImageDestinationFinalize. And that returns true if the image was successfully flushed.

Again, let me give a short example just to show how easy this is to add to your application. We have a function called WriteJPEGData, which takes a URL and an image to write and a DPI to specify in the metadata. First thing we do is we create an image destination with a URL, specifying that it's going to be of type JPEG and that it's got one image in it.

Next thing we do is we specify a dictionary with three keys and values for options and metadata. One option that we're specifying is the quality of the JPEG, and that's specified with the key KCG image property quality. In this example, we're specifying a quality of 0.8 or 80% compression. The other two key values are for metadata, and they are the KCG image property DPI width and DPI height.

In this case, we're just creating CFNumbers based on the value that was passed in. Once we have this dictionary, then we call CGImageDict. We call CGImageDestinationAddImage to add the image and its options and metadata to the CGImageDestination. And lastly, we call CGImageDestinationFinalize to write the file to disk. So it's pretty easy.

So those are the basics of Image IO. I hope I've given the impression that this is a very simple and easy API to add to your application. And again, some of these benefits you'll be getting for free if you're using AppKit and other technologies. Let me talk for a minute about some of the more advanced techniques that come up when we talk about image reading and writing, such as extracting ARGB data, requesting the depth of an image, and loading an image incrementally.

So one of the common questions we have is, well, an image has been returned from ImageIO, but I don't know what color space it is. I don't know what depth it is. I don't know what pixel format it is. And I have an application that only works in RGB. That's a common scenario. And this is an interesting piece of code that makes it very easy to convert the data, no matter what format it came in, into ARGB.

Basically, the technique is to use a CG bitmap context to render the original image into an offscreen. And one advantage of this is that it takes care of all the color management correctly. If the image happened to be an LAB or CMYK image and had a profile, then it'll be correctly color managed to the RGB color space that you're working in.

Another interesting question is the depths of image. Some formats only support one pixel depth. For example, JPGs are always 8 bits per sample. Other formats can support arbitrary pixel depths. For example, TIFFs can be 1, 2, 4, 8, or 16 bits per sample. As a rule, the image returned by Image IO will be the same depth as that indicated by the file. So if you open a 16-bit TIFF file, you'll get a 16-bit CG image ref.

However, in the case of high dynamic range file formats, it gets a little bit more complicated. The data in these file formats are typically encoded in special encoding formats, which can then be decoded in a variety of ways. They can either be unpacked to floating point values, either 32- or 16-bit formats, or to integers with 16- or 8-bit precision. Also, in the decoding process, they can either be left as extended range values, or they can be compressed to the logical 0-to-1 clipped range.

Both of these are reasonable types of values to be returned, and your application may want one versus the other. By default, CG Image IO will return an image ref that's compressed to 16-bit integers. This gives the best results with reasonable memory for the typical application. However, if by request, an application can specify that they want the floating point unprocessed data returned.

Here's a brief example that shows how to do this. This is a code snippet that, given a URL, will request that the data be returned in floats. And if the data is actually returned as a float, a boolean will be returned to specify that it was actually floats. The way we've done this is, as you've seen from the previous examples, we create an image source, and we specify an options dictionary, which has as one of its key value pairs, CGImageSourceMaximumDepth with the value 32.

At this point, we can then ask Image IO to get the properties of the first image, given those options. And this will return a dictionary. We can then query that dictionary to see if it has floating point data or not. Then lastly, we can get the image and return that to the client.

Another advanced technique I wanted to make sure people knew that we supported was incremental loading of images. I won't go into too much detail on this, but the basic idea is that you create an image source in an incremental fashion using CG image source create incremental, and then you repeatedly add updated data to the image source. Each time you add data, you can request a new image, and it'll give you a partial image or complete if the image is fully loaded.

The, um, and then at the-- once you're done with the image, you can release it, and then once you've added more data, you can get a new updated image. It's important that you release it before you ask for a new image. So let me give a brief demonstration of Image IO in action. So one of the things I want to show first is the new preview. And I've got a bunch of images here.

Open. And one nice thing in preview is you can open all the images just by selecting a folder. And I've got a variety of images in here. One of them is an LAB image, and we can do that by-- we can verify that it is an LAB image by going to Tools, Get Info. And this shows the metadata that's been obtained using Image IO. And we can tell in here from the metadata that's currently returned that the color model is LAB.

We have a variety of other images. We can zoom in and zoom out. The thumbnails over here were obtained using Image IO as well. We have high dynamic range images here. We can zoom in and zoom out on that. Luke later will show how we can manipulate these images in real time. Here's another interesting example which I like to show people.

This is one of our things that we use for testing. Oftentimes, people want to know, well, how do I know if the profile is being used? What I have here is a document. It's a black and white CMYK document. that has a profile in it that makes values that are gray disappear.

So if this image were rendered and the profile were ignored, what you'd see is the text, the embedded test profile is not used. And that's because you can't see it here, because the profile is being used, but there's actually a gray word not right here. So it provides an interesting test so that you can tell if your profile is being respected or not. Here in this gray version, you can kind of see a little bit of the hint of what was once there and the word not. But this is a great way of testing images. We really should distribute these at some point.

One other example of using Image IO, I have a test application which shows some of the options. So let me go to open one of the images we just saw, look at desktop images, and open up this image here. We can see some information, the height and width, and how long it took to draw.

One thing we can do is we can specify that we'd like to see what this would look like if it was progressively loaded. If I open up another image, if I open the high dynamic range image, this is a big image, unfortunately, so it takes a couple seconds to open.

If we bring up the metadata on this, we go to Window, Metadata. We can see that it has height and width, and its depth is 16. This is because by default we return 16-bit integers. However, if we want to return it as 32, and again, it'll take a second or so.

This code still needs to be alt of x someday soon. We need to bring up the metadata. And now we can see that there's a new property in here, which is saying that data is returned as floats. So that's the introduction to Image IO. I'm going to pass the microphone and the demonstration and all the new stuff over to Luke Wallis, who will be talking about high dynamic range imaging. Thank you.

Thank you, David. So today, I will be talking about Mac OS X support for high dynamic range imaging, which is a new and exciting feature that we are adding into the Tiger release. As many of you know, high dynamic range imaging is generating a lot of interest and is still a subject of very active research. So, we could talk about high dynamic range imaging from many different points of view, but what I would like to do today is concentrate on answering very simple three questions. What is it? Why use it? And how to process it?

Before we try to answer these questions, let's take a quick look at the current status quo in digital image processing. We can conclude that in the majority, digital image processing is dominated by what is called output-referred approach. What it means is that The requirement of image reproduction are imposing certain requirements on the way we acquire and create images.

And because most of the devices we are dealing with, like displays and printers, can only handle 8-bit data per color channel, We impose the same requirement on digital cameras that, in fact, could produce much more about an order of magnitude more data if they were not restricted to that requirement.

Obviously, there are some advantages. This is not done for no reason. The main is that there is a very minimal image manipulation required before displaying or printing such an image. But obviously, there is a disadvantage that we are losing a lot of color and image information that could be used in further image processing that could result in much higher quality of display or print.

Oops, sorry, wrong direction. Another requirement, which is sort of hidden in the output-referred approach, is that the data is exchanged in one predefined color space. And in the most difficult case, this is sRGB. So when you look at this slide, you see I drew the shape of the typical exchange color space, made it be sRGB.

That color space covers only a part of visual gamut. So everything is fine as long as the camera is acquiring the color data within that triangle. But if we are outside, then we are out of luck. We have to do something with this color, and typically we have to push it into the color space. It can be done through different methods, but because the cameras are not very sophisticated in terms of processing power, we are using very often color space.

And as we know from practice, gamut clipping can produce really bad results, like, for example, hue shift. And here is one of the maybe a little bit strong and exaggerated examples what could happen, but this is a real clipping in which the white color, because of clipping, became a mixture of completely unrelated colors.

So, when that is what we can conclude when we look at the image processing from the point of view of device capabilities to reproduce the image. And what I would like to do now is to look at the image processing from a little bit different perspective, from the perspective of human vision. And as we know from the very rich research in this area, color and visual acuity are two of the most important characteristics of the scene. And not only this, these two depend on luminance and observer's visual adaptation.

We know that we can measure the world luminance, and it will cover the range of the values between 10 to power of minus 6, all the way to the power of 10 to 8 when measured in candelas per square meter. But what is important for us is that different ranges of luminance create different illuminations.

Now, and that illumination also can stretch all the way from very dark environment through starlight all the way beyond the sunlight. And now, I could spend a lot of time talking about physiological and physiological mechanism controlling our vision, but what I would like to do without going through those details, to say that humans have three types of vision, which are dependent of the type of luminance. We have scotopic vision, which works when we are in the dark environment. We have mesopic vision, which works in light-dark environment. And finally, when we are in high illumination environment, we switching to photopic vision.

Why is this division important? Because our quality of vision is related to this type of vision. As we know, if we look at something in a very dark environment, we have no color vision and very poor acuity. Everything in the darkness seems to be just a shade of gray. On the other hand, our best vision is in the photopic range, where we can see many colors and have a good color and visual acuity.

This is not everything. What is very important is that humans have a limited simultaneous range, which also depends on the type of illumination. And here I'm showing the widest simultaneous range, which again exists in the photopic vision, that can cover the range of order of magnitude 3 to 4. But if we... Try to estimate this simultaneous range in poor vision. The values can drop by the order of two.

So, we may ask ourselves why this is all important. Well, I think there is an answer. Because if we want to represent faithfully the scene that we want to process through image processing, we should have a mechanism to encode the data the same way, or at least as close as possible to the human vision fidelity.

So now, let's take a look where in this picture we can fit the typical 8-bit display. And as we know, the typical 8-bit display can cover the range of luminance on the order of magnitude of 2. That is a big discrepancy between human simultaneous range and dynamic range of a display. So, this is the biggest challenge that we are facing when we have to map the relatively wide human simultaneous range into low dynamic range of our display device.

There is one solution which we already know about. This is the output-referred digital photography. We are imposing

[Transcript missing]

This is a small color space, and the only thing we can do is to choose between different options. This is a simplistic view in which we may say, well, if I want to expose the details in highlight, I can use the short exposure.

But if I want to see the details in the shades, I can sacrifice the details in highlights and use the long exposure to capture what I wanted. The most important point is that this applied exposure is permanent. Once we burn this into the image, there is no way back.

So I think that at this moment I will try to answer the question, what is high dynamic range? And I think that we can define high dynamic range as a special encoding of the image data, which allows us to preserve the full fidelity of human vision. From the implementation point of view, The high dynamic range imaging is based on color values that, first of all, extend over at least four orders of magnitude, that can encompass the entire visible color gamut, and allow the values outside of a typical 0 to 1 range values of color.

In a summary, what it means that in high dynamic range imaging, we are no longer limited to a specific color space. We are trying to encompass, as I said, all visible colors. But on the other hand, we need to remember that we no longer have a convenient, ready-to-display or ready-to-print image. High dynamic range data requires some kind of manipulation before it can be displayed. But the big advantage is that we can make this decision at the moment when we need to reproduce the image with our preference instead of burning that to the image.

This is a kind of simple explanation how we can do that. We can go back and select the short exposure, long exposure. But most importantly, we can implement something which was not needed before, is the tone map rendering, which will allow us to achieve completely different results. For example, here, I can try to combine in one image the details from the highlights with the details from the shades.

So now, I'll try to answer the question, why use high dynamic range images? And the most important reason is to preserve the scene-referred information that can be useful in further image processing. And this way, we want to avoid intermediate encoding with restrictive color gamut, which was happening in this previous approach called output-preferred. And also, we can avoid irreversible modifications that happened during the image acquisition.

How to process high dynamic range images? The simplest answer is that we should not add any rounding or clipping errors. And for that, we want to render and capture the data in floating point. We want to store the entire image. And if needed, to process the color data in extended color space, which again will not impose any clipping. And at the end, we want to apply a tone mapping for a specific image reproduction. For example, that specific reproduction could be an example I just showed you that I want to see all the details in the image from the highlights and shadows.

Now, let's take a look at the file formats that we are supporting in Tiger. I think that the most important citizen here is OpenEXR that comes from ILM. First of all, it has the smallest quantization error. And most importantly, as you will see later, it comes with the recommended way of tone rendering, which solves a lot of problems in terms of presenting the image content. The other formats basically just define the way of encoding and decoding data with preserving the image fidelity.

So now, I would like to show you-- if I can get my-- this is demo machine-- my little application in which I can open high dynamic range images. And what I would like to show you is that we have to do something with those values, which are so large and much bigger than what we can represent by typical range of 0 to 1.

And one would think that very simple approach would be to simply-- map the brightest point in the image to the brightest point of the display. But if I do that with my little demo application, you see that we don't see too much in this image. There is way too much information beyond 1 and scaling didn't produce any visible image. Another very simple approach could be, okay, let's say I would like to see whatever you have in this image, clip the values to 0, 1, typical range, and show me that.

Well, as you see, the image quality somehow improved, but it's still very poor. And now, if I use the OpenEXR, and this is their default 0 exposure value, I'm getting some reasonable result, and I can see many more details. And not only this, I can do what I was talking about, that I can impose my preference at the moment of reproducing the image. For example, someone may like this kind of image, or someone else may still want to focus on this beautiful stained glass.

I want to show you a couple of classic examples, like for example, the famous Memorial Church picture, which comes from the Debeweg website. And the same thing happens here. If we just scale the image, the image is basically unreadable. Clipping will show something, that quality is really poor. OpenEXR is doing a very good job here. Another example is the picture I was using in the previous slides of our garage at Apple.

And this is how it looks when scaled. This is how it looks when clipped. Once again, typical hue shift when clipping the data. And OpenEXR producing quite reasonable result. What this leads us to the conclusion that tone rendering is a very important issue when processing the high dynamic range images. And maybe there may be many different methods of doing that. And I think this gives me a very good segue to introduce Gabriel Marcu, who will be talking about high dynamic range tone mapping developed at Apple.

[Transcript missing]

How do we create high dynamic range images is the next topic. And this is quite interesting. We can start with a file format of these images. And we have to code in RGB floats the radiance of the scene. So how do we capture this? We turn to a method that was published by Debevec and Malik, recovering high dynamic range radiance maps from photographs. And essentially this method is requiring to take multiple shots of different exposures of the same scene, and then combine these exposures into a high dynamic range file.

We start with the block diagram of the digital camera. And if you look closely, you can see that the scene radiance, it is transformed to the digital output in the digital file, which is maybe JPG or other file, by a set of transformation. First, the image is passing through the lenses, then the shutter, then the image is captured by the CCD and converted by ADC converter, and then some mapping is happening in the camera, for example, gamma correction or raw image to JPG image transformation, and you finally get the digital values in the file.

[Transcript missing]

I choose a number of already exposed shots, and we get a thumbnail view on the left side. And in here, we can select any of these images and see what is their content. You note immediately that no matter how we take these images, you can see either details in the shadow or either details in the highlight of this.

And the high dynamic range file will be able to capture all information of the radiance of the scene and will encapsulate this in a high dynamic range format. The first thing is to calculate the high dynamic range, to calculate the transfer function of the camera. And we did this in a single step.

You have seen several curves put together, and you recover the transfer function of the camera. Then the next step is to use this transfer function and to compute the high dynamic range image. Now you can see that even in the paper, it said that you need to do this processing of the image. And you can see that the transfer function of the camera over many and many images until you get an average behavior of the camera and use that transfer function to create the high dynamic range.

We actually add more robustness to the algorithm that is computing the transfer function of the camera such that we are able to recover the transfer function only from the same set of images that we used to create the high dynamic range. So we recover the function from this set of images. And we apply this function to these images. And we create the high dynamic range.

This brings us to an algorithm that will be able to do this kind of things by just specifying the set of images. So we choose a set of images. Then the algorithm is computing the transfer function of the camera and is computing the high dynamic range in a single shot. And this gives independence of this application. from any camera settings or any setup that you may have been required to do for the method that is published in the literature. So this, let's say, you are switching to another set of images, for example.

From a different camera, you don't have to specify the camera, and you immediately get the high dynamic range file direct from specifying only the set of images. The interesting thing about this is we want the Apple user to be able to have less intervention in this algorithm, less guesswork, and finally end up with an application that will be able to provide directly high dynamic range images without taking care of anything. And this is an advantage for the user.

Finally, I would like to mention that, as you have seen here, we select the images from a folder, but we have worked with the image capture, so I invite you on Friday afternoon from 5 p.m. to see an integration of this algorithm with the image capture modules, and you will see an interactive demonstration of how these images are captured live with the camera, and then a high dynamic range file is created. And with this, I thank you very much, and I will turn back to Luke.

Thank you, Gabriel. So, at the end of the presentation on high dynamic range images, I'd like to touch on the subject which is very close to our hearts of color-seeing engineers. We are really interested in color managing high dynamic range images. And you must know that this is the area which is under very intensive investigation, both in academic society and in the industry.

At Apple, we also are developing our own method of color managing high dynamic range images, and we are trying to take a new approach, which is based on human adaptation to image viewing environment. We think that image contains enough white point and adopting luminance information that could be used by color appearance model to predict human perception of color in different viewing environments, which basically means we can color manage our high dynamic range images.

Just to clarify what kind of color appearance modeling we are dealing with, let me say that we are looking at this kind of modeling which consists of two major, is based on two major concepts. On chromatic adaptation, which allows us to predict the influence of adopted white point on the color perception.

And on the second concept is the degree of adaptation, which allows us to predict simultaneous color contrast related to the luminance of adopted white point. So, in summary, what we are trying to do is to transform the colorimetry of the source using high dynamic range data to our destination. And then, after doing... after color managing and bringing to new environment, then applying tone mapping, which will compress color to the range of destination device. And this is what concludes our talk on high dynamic range images. And I'll turn back microphone to David. Thank you.

Thank you, Luke and Gabriel, for your discussion. I just wanted to bring it back just to do a quick summary slide, and then we'll have a few minutes of Q&A at least. Just wanted to summarize once again what's new in Tiger. We've got a lot of great stuff here. First of all, in ColorSync, we have floating point support.

Image IO, we have a brand new modern API for reading and writing that has optimized performance and support for metadata. And then we're doing a lot with high dynamic range, supporting OpenEXR file formats, access to compressed or unprocessed data. And also, this is an area of all sorts of ongoing and future research. We'll have lots to show for you. So, again, we have a few other places you might want to go.

There's a Graphics and Media Lab session on Thursday where you can talk to us if you have more questions than we can get to today. And then also, there's going to be some great demonstrations on the last show, last session on Friday, talking about image capability and how to use it. image capture in high dynamic range.