Configure player

Close

WWDC Index does not host video files

If you have access to video files, you can configure a URL pattern to be used in a video player.

URL pattern

preview

Use any of these variables in your URL pattern, the pattern is stored in your browsers' local storage.

$id
ID of session: wwdc2004-207
$eventId
ID of event: wwdc2004
$eventContentId
ID of session without event part: 207
$eventShortId
Shortened ID of event: wwdc04
$year
Year of session: 2004
$extension
Extension of original filename: mov
$filenameAlmostEvery
Filename from "(Almost) Every..." gist: ...

WWDC04 • Session 207

High Dynamic Range Imaging with Image IO

Graphics • 1:07:44

Image IO is Mac OS X's unified architecture for opening and saving popular image file formats. View this session to learn how the Image Quartz-friendly API simplifies working with TIFF, PNG, JPEG, and JPEG-2000. Additionally, Image supports high dynamic range (HDR) formats, such as OpenEXR and floating point TIFF, that extend visual fidelity far beyond today's 32-bit images. View this session to learn about Image and HDR imaging. This is a must-see session for developers working in digital video, cinema, and photography.

Speakers: David Hayward, Luke Wallis, Gabriel Marcu

Unlisted on Apple Developer site

Transcript

This transcript was generated using Whisper, it may have transcription errors.

I'm Travis Brown, the graphics and imaging evangelist, and I helped to put together this graphics and media track along with help from obviously the engineers who created the great technology and content. And I'm sort of generally your host for a lot of things going on with graphics and media here at WWDC. But one thing I wanted to take a little bit of time is to talk about some big sort of shifts that are happening in graphics, and particularly the Tiger time frame. And you've noticed there have been several mentions of the fact that now we're leveraging floating point pipelines and, for example, the GPU. We're also have floating point pixel support and Quartz 2D. And also new technologies like Core Image also are based on floating point pixel operations. And this is sort of an important shift in graphics because it used to be that, you know, 8 bits per component was plenty. It was enough. It was the ideal.

million colors encapsulated at all. But as the sort of imaging market has matured, it's really become obvious that we need more bits per pixel. And we also need them encoded differently. And this raises new challenges that we need to solve inside the OS, not only to be able to pump those through our graphic subsystem, but additionally be able to do things like read and write them to disk. So this session we're going to be talking about two topics and specific. One is high dynamic range, which is going to basically cover several techniques in terms of how deep pixel data is encoded to put on disk, the available file formats. Plus also, once you read that deep pixel data back and you need to actually make it viewable, what you need to do to it to actually, you know, basically create an image you can display on your monitor. And then second, we're going to be talking about the reciprocal changes in the operating system, new technology such as image IO, which is an imaging library which which is going to deal with these new data formats, and also changes in areas such as color sync and color management. Because these changes all together sort of complete the picture, they're going to enable you to actually step beyond eight bits per component and start leveraging the new fantastic capabilities inside the GPUs, our imaging stack, to really do new and interesting things with high resolution, high fidelity data. On that note, I'd like to invite David Hayward to stage to take you through the session. Thank you. Thanks. Thank you, Travis, for the introduction. And thank you all for coming today's session on high dynamic range imaging with Image.io.

What I want to talk about today and what you'll learn is about the new exciting emerging field of high dynamic range imaging and how you can take advantage of it today in Tiger using a new facet of Quartz called ImageIO. But before I talk about those two fields and the people who will be coming up and talking about it in more detail, I want to give a brief update on what's new in ColorSync for Tiger, because ColorSync is one of the key pieces of technology that allows for the proper rendering of both standard and high dynamic range images. So let me give an update on ColorSync for Tiger. We'll be talking briefly about adding floating point support in ColorSync, use of core foundation types, some API changes we'll be making, some notes for developers of custom CMMs, and some changes to ColorSync Utility's user interface.

So first and foremost is floating point support. As Travis mentioned earlier, one of the things we're trying to do for Tiger is provide a new high fidelity cinematic graphic environment for Tiger. And in order to achieve that, we need full floating point support throughout the entire system. And one key piece of that is ColorSync. So in order to achieve this, the first thing we needed to do was to have a new bitmap structure in ColorSync for supporting arbitrary bitmaps of floating point data that your application can pass to us. We wanted to make the structure as flexible as possible so that you wouldn't have to repack the data before you send it to us. So this new structure supports both chunky and planar arrangement of data and also allows for the channels to be in any arbitrary order. The way we achieved this in the structure is it's a little different from other bitmap structures you may have seen is that instead of having a single base address for all the pixel data, we actually have a different base address for each channel. This allows for the channels to be in any order. And then we also allow for both row bytes and column bytes to be specified, which allows for you to have your data be scanned in reverse order if needed, or also if you've got unusual packing between channels that you want to make sure to skip over.

So it's a fairly basic structure, but in most cases, most people will be passing in a buffer of chunky or interleaved data. And so we provided a simple utility function called cm float bitmap make chunky, which you supply at a single base address, and then it'll fill in the structure appropriately for you. In either case, whether you fill in the structure by hand or whether you call this helper API, once you have a source and destination float bitmap, you can then call color sync to match data from one space to another. We have three functions to do this. The first one is CM Convert XYZ Float Map, which allows you to convert between all the CIE-related color spaces.

So XYZ, YXY, LAB, and LUV. And also another function, which is Convert RGB Float Map, which allows you to convert between the RGB derived spaces, RGB, HSV, and HLS. Both of these functions are based on textbook formulas. And so as a result, there's no need to pass in a profile or color world to do the transform. It just does the math for you with floating point precision.

The last is probably the most interesting. This is the new API, CM Match Float Map, which allows you to pass in a color world reference to perform the actual transformation. You can create the color world by concatenating one or more profiles. And then at that point, the data will be sent through to the CMM, which if it supports floating point data, will be done in full floating point precision.

One of the other changes we've made to ColorSync is integrated in more closely with the core foundation types. And the key way we've done this is that the two common ColorSync opaque data types, which are the CM profile ref and the CM color world ref, are now CF types. And this is quite convenient because it means that you can now call the CF base functions such as CF retain and CF release. And it also means you can add profiles and color worlds to dictionaries or arrays.

This is kind of handy if you're passing profiles and dictionaries around to other parts of your code. The other way that we supported Core Foundation Type is to actually get the data out of a profile. One of the questions I often hear from new users to ColorSync is, I've got this profile reference. How do I get the data out of it? In the past, that was done by calling either cm-copy-profile or cm-flatten-profile. Now it's much easier. You can just call cm-profile-copy-icc-data, and it will return you all the data within the profile as one giant ICF data type.

Next thing I want to talk about are some API changes that we're making for Tiger. Way back several years ago, one of the features we added to ColorSync, both at the API and the user interface level, were a set of preferences so that applications could have one place to go to for specifying default profiles based on usage or color space. And at the time, we hoped that this would be a way of simplifying the user interface across a wide variety of applications. And so we presented both API and user interface to help with this.

In practice, however, it's turned out that very few applications have used this API. And what we're left with is user interface in the ColorSync utility where people say, "I can't figure out what this does because nothing I change here seems to make a difference." So we're listening to the usage and we're actually beginning the process of deprecating the API and also the user interface. So we still want any applications that we're using this API to function correctly. So what we're doing is changing the behavior of CM get default profile by space, CM get default profile by use, and CM get preferred CMM. Instead of storing their preferences as a setting that's global across all the machine, it will now be stored in the current application, current host, current user domain. So the APIs will still function, but we're deprecating them. This has some ramifications also in the UI, which I'll talk about later.

I also want to take this time to talk a little bit about custom CMMs. One of the things that we've been doing over the last few years is making even tighter and more powerful integration between the graphics system as a whole, notably quartz and printing, and color management. And in order to achieve this with high performance and high reliability, we have made it so that quartz and printing will only use the Apple CMM. That said, we have a long tradition of allowing applications and other developers to develop their own CMMs and for applications to call those if they wish. It's still possible for applications, if they wish, to have a custom CMM and for them to explicitly create a color world using that CMM, and that can be done using the recommended API for this now is NCW Concat Color World. And this API has an easy, convenient way for you to specify which CMM to use. The other thing to mention for CMM developers is that there's a new entry point for CMMs, which is CMM match float map. If your CMM supports this, then you can have a full floating point support throughout the rest of the Quartz system. And if you don't support this, then the data will be truncated to 16-bit integers and everything will work sufficiently.

Lastly, I want to mention some changes we're making to ColorSync Utility. As I mentioned earlier, we're deprecating the preferences APIs for default profiles. And one visual manifestation of this is that we were removing the user interface from ColorSync Utility. However, we're adding something in its place. We'll be adding a new utility to the ColorSync Utility, which we call a calculator. So let me give a brief demonstration of this on demo two.

over here. So as we see in the ColorSync utility, everything looks similar except there's no longer the preferences pane as the first item. But we have a new item, which is a calculator, which provides a very simple way to convert color spaces between all the various different color spaces using floating point precision. This is a convenience that also provides a good way to demonstrate our floating point data path. So obviously, we can specify our source color space and our destination color space. If we're just converting RGB to HSV, we can see the slider values. We can update the sliders on the left and they update on the right.

One thing you'll notice is because the RGB and HSV are related color spaces, they're basic formulas for each other. So as a result, the color on the left will be the same as the color on the right. If we switch to CMYK, you'll see something slightly different, which is now it's going through a profile. And if I go to a saturated color, you'll notice that the color on the right is desaturated.

One of the other things we added is the ability for it to be fully symmetrical. So now instead of just updating on the left, I can also update on the right, and it'll show you the values in that order. We can also, this is an interesting way to test out a CMYK profile. We can specify that we want to input LAB values and output to CMYK, and as we scroll through all the possible LAB values, we can see what the resulting CMYK values will be. Thank you. So that's the brief demo of the color calculator. We hope that's a useful function. So back to slides.

So the next thing I want to talk about is something that's all new for Tiger, which is this new facet of Quartz called Image I/O. And again, as Travis alluded to earlier, we wanted to provide a new API for doing image processing, or image reading and writing, from a variety of formats. And this is Image I/O. We'll be talking today about its features, its goals, what formats it supports, the clients of this API, some of the core concepts you need to understand for using this API, and some advanced techniques as well.

So what are the features of Image I/O? Well, first is we want to be able to read a wide variety of file formats and write to a wide variety of file formats. We also want to support reading and writing metadata. And also we want to support incremental loading for clients, such as web browsers, that get data in an incremental fashion over a slow data connection. We also want to support floating point support, because that's one of the key initiatives for graphics in Tiger.

We also want to have broad color space support and something called cacheable decompression. I mentioned a little bit on this now, which is typically different APIs for reading and writing image file formats have one of two behaviors in terms of decompression. In the case of the existing Core Graphics APIs, every time you draw the image, it's fully decompressed each time. This obviously has the advantage that you have very little memory overhead, but it's a performance hit if you draw the image more than once. Other APIs have the behavior that the first time you draw the image, it fully decompresses it, which obviously requires more memory, but has the advantage that subsequent draws will perform quickly. There are merits to both approaches. And so one of the approaches we've used with ImageIO is to try to allow for both features. Not all file formats support both approaches, but wherever possible, we support both philosophies.

Here are some of the overarching goals for Image.io. First and foremost was to reduce code duplications. Turns out there was an embarrassing number of different variants of JPEG readers and writers and TIFF readers and writers within our system. And they all had different strengths and weaknesses. And if you were actually trying to write an application that read and wrote images, you had to make a choice between which strengths and weaknesses you wanted to use. We wanted to have a single reference implementation within the system and use that in as many places as possible so that we have a single place to make changes in the future.

One of the other goals is we wanted to leverage open source so that our behavior of our APIs was consistent with other implementations. And improve performance. This is one of the other key things. We've been spending a lot of time with the vectorization team at Apple to make sure that our key file formats decompress with optimum speed. Amen.

Another feature was lazy decompression in the sense that if all you need to do is get the height and width or metadata out of an image, you shouldn't have to fully decompress the data. So we want to support that as well. And lastly, we wanted to make sure we had a very modern Core graphics friendly and easy to use API that you could all easily adopt this in your applications.

So one of the first questions I always get when I'm talking about ImageIO is, well, what formats do you support? And we support all the standards for the internet-- TIFF, JPEG, PNG, GIF, and JPEG 2000. These are already supported on the developer CD that you got this week. We're also supporting some exciting new formats, such as some high dynamic range formats, such as OpenAXR, Radiance, and some important variants on TIFF, such as Log LUV and some Pixar variants. There's also countless other formats we're going to be supporting BMP, PSD, QTIF, SGI, ICNS files. And we're considering more, both for Tiger and beyond.

So the clients for Image.io-- obviously, we hope that anyone who wishes to use this API are free to use them in their application. But there's also lots of places within the system that are going to be calling Image.io. So you may get the benefits of Image.io without having to change your code at all.

Probably the first and most important client for Image.io is the preview application. This has been a great example of how the power of the new Image.io and some of the advantages you can get from it. It's making strong use of this new API. Also, AppKit will be switching over. It's not yet switched over in the current developer release, but AppKit will be switching over to using the new Image.io API as well. WebKit and its clients, such as Safari, Mail, and any of your applications that are using WebKit will be using Image.io. Core Image is using Image.io to load data in floating-point format. Spotlight is using it for generation of thumbnails and getting metadata. And some of our scripting technology, such as SIPs and image events, are also using Image.io. So we're trying to use this everywhere in the system.

Eventually, I want to give an outline on the API in Image.io. But before I do that, I want to talk a little bit about how images are organized so that you can get an understanding for why we designed the API the way we did. In previous systems, the standard way of representing an image in core graphics was with a CG image ref. And this is a great basic format for representing images. It allows you to specify three things-- the geometry of the image, such as its height, width, row bytes, and pixel size. The color space of the image, which can be a profile or other equivalent description of the color space. And the actual pixel data. This is the minimum information you need to describe an image. However, it turns out that there's a lot of file formats out there and they are actually quite elaborate in many cases. And so one of the things we wanted to support in Image.io was a richer model for images. For one thing, we wanted to be able to support thumbnails and metadata for images. And also, a lot of file formats support multiple images within the same file format, such as TIFF. So we want to make sure we support that as well. And also, there's a set of attributes that apply to the image file as a whole rather than to the individual file images contained within the image file. This is the file format of the image, such as whether it's TIFF or JPEG. And also some properties that apply to the file as a whole. For example, TIFF files can be big-endian.

Here's an example of how this works in practice. We're using an example of a TIFF file. The file type is public.TIFF, which is a universal type identifier that describes this image as being of the type TIFF. We have some properties that apply to the file as a whole, the file size and bytes, for example, and the endianness of the TIFF. And then we have the standard information for each image, such as its height and width, its color space, its pixel data, its thumbnail, if possible, and its metadata, such as copyright and artist information, you name it.

So here's how this model is reflected in our API through data types. What we have is we use the existing CG image ref to represent the geometry, color space, and pixel data. The thumbnail is also represented by a CG image ref. The metadata and the file properties are represented as key values in a CFDictionaryRef. So it's all very simple.

So now I can talk a little bit about the API. What we've added is a new data type called CGImageSource. And this is the opaque type used for reading images from either memory or disk. You can create a CGImageSource from either a CFURLRef, CFData, or with the CG data provider.

Once you have a CgImageSource, you can query the image source for several attributes. You can ask for the properties of the file as a whole using CgImageSource.getProperties. You can ask for its file type, by calling cgImageSource.getType. You can get the count of images using cgImageSource.getCount. Once you know the count of images, you can then, for each image, ask for its image, you can ask for its thumbnail, and you can ask for its metadata.

So it's pretty simple. Just to show you how this works, here's a little code sample that shows you how, given a URL, to get the first image out of the file. It also returns some simple metadata. In this case, it's just returning the DPI of the image in the horizontal and vertical direction. First thing this code does is call CGImageSourceCreateWith URL, which creates our data type for subsequent access to the file.

Then what we want to do is we want to get the set of properties for the first image. So we call cgImageSource.getProperties at index, and that returns a dictionary. We can then query that dictionary to see if it has the DPI height and width properties and return those to the client. Lastly, we need to actually return the image. So we call cgImageSource createImageAtIndex, and that will return the image to the caller.

Here's another example for getting a thumbnail out of an image. Image.io is very flexible for creating thumbnails. As it turns out, some file formats support thumbnails, some don't. Also, with some file formats, the thumbnails can be quite large. Your application may need to have control over how thumbnails are returned. And we provided that with the Image.io API via an options dictionary.

In this case, what we're doing is we're again creating a CG image source by specifying a URL. And then we're going to be creating an options dictionary with two key value pairs in it. The first key is CG image source create thumbnail from image if present. What this does is tell ImageIO that even if the image doesn't create a thumbnail, return the actual image instead. So we'll always get an image for the thumbnail. The second key value pair we specify is CG image source thumbnail max pixel size. And this allows us to make sure that the thumbnail is at reasonable size, which is especially important if you specified the previous option. So in this case, we're saying that we always want an image to be returned, and we want it to be no bigger than 160 by 160 pixels. Once we've created that dictionary, all we do is call CGImageSourceCreateThumbnailAtIndex, specifying the image source, the zeroth index, and the options dictionary. And it's returned. This is, for example, the way that the Spotlight technology creates thumbnails for images in the search results field.

So that's the basics of reading from an image I/O. Here's what we do for writing. We have another data type, which is CG image destination, which can be created with a CFURL, CF mutable data, or with a CG data consumer. At the time of creation, you also specify the type of the file, whether it's a JPEG or TIFF, for example, and the capacity, or the number of images that that image will hold. Once you have a CG image destination, you can specify the properties for the file as a whole using CG image destination set properties. And then you can repeatedly add each image with various options and metadata at the same time using CG image destination add image. Lastly, you can flush the file out to either the URL or to the data by calling CG image destination finalize. And that returns true if the image was successfully flushed.

Again, let me give a short example just to show how easy this is to add to your application. We have a function called writeJPEGData, which takes a URL and an image to write and a DPI to specify in the metadata. First thing we do is we create an image destination with a URL, specifying that it's going to be of type JPEG and that it's got one image in it. Next thing we do is we specify a dictionary with three keys and values for options and metadata. One option that we're specifying is the quality of the JPEG, and that's specified with the key KCG image property quality. In this example, we're specifying a quality of 0.8, or 80% compression. The other two key values are for metadata, and they are the KCG image property DPI width and DPI height. In this case, we're just creating CFNumbers based on the value that was passed in. Once we have this dictionary, then we call CGImageDestinationAddImage to add the image and its options and metadata to the CGImageDestination. And lastly, we call CGImageDestinationFinalize to write the file to disk. So it's pretty easy.

So those are the basics of ImageIO. I hope I've given the impression that this is a very simple and easy API to add to your application. And again, some of these benefits you'll be getting for free if you're using AppKit and other technologies. Let me talk for a minute about some of the more advanced techniques that come up when we talk about image reading and writing, such as extracting ARGB data, requesting the depth of an image, and loading an image incrementally.

So one of the common questions we have is, well, an image has been returned from Image.io, but I don't know what color space it is. I don't know what depth it is. I don't know what pixel format it is. And I have an application that only works in RGB. That's a common scenario. And this is an interesting piece of code that makes it very easy to convert the data, no matter what format it came in, into ARGB. Basically, the technique is to use a CG bitmap context to render the original image into an offscreen. And one advantage of this is that it takes care of all the color management correctly. If the image happened to be an LAB or CMYK image and had a profile, then it'll be correctly color managed to the RGB color space that you're working in.

Another interesting question is the depths of image. Some formats only support one pixel depth. For example, JPEGs are always 8 bits per sample. Other formats can support arbitrary pixel depths. So, for example, TIFFs can be 1, 2, 4, 8, or 16 bits per sample. As a rule, the image returned by ImageIO will be the same depth as that indicated by the file. So if you open a 16-bit TIFF file, you'll get a 16-bit CG image ref.

However, in the case of high dynamic range file formats, it gets a little bit more complicated. The data in these file formats are typically encoded in special encoding formats, which can then be decoded in a variety of ways. They can either be unpacked to floating point values, either 32 or 16 bit formats, or to integers with 16 or 8 bit precision. Also in the decoding process, they can either be left as extended range values, or they can be compressed to the logical 0 to 1 clipped range.

Both of these are reasonable types of values to be returned, and your application may want one versus the other. By default, CGImageIO will return an image ref that's compressed to 16-bit integers. This gives the best results with reasonable memory for the typical application. However, if by request, an application can specify that they want the floating-point unprocessed data returned.

Here's a brief example that shows how to do this. This is a code snippet that, given a URL, will request that the data be returned in floats. And if the data is actually returned as a float, a boolean will be returned to specify that it was actually floats. The way we've done this is, as you've seen from the previous examples, we create an image source. And we specify an options dictionary, which has as one of its key value pairs, CGImageSourceMaximumDepth with the value 32.

At this point, we can then ask ImageIO to get the properties of the first image, given those options. And this will return a dictionary. We can then query that dictionary to see if it has floating point data or not. Then lastly, we can get the image and return that to the client. Thank you.

Another advanced technique I wanted to make sure people knew that we supported was incremental loading of images. I won't go into too much detail on this, but the basic idea is that you create an image source in an incremental fashion using CG image source create incremental, and then you repeatedly add updated data to the image source. Each time you add data, you can request a new image, and it will give you a partial image or complete if the image is fully loaded.

And then once you're done with the image, you can release it. And then once you've added more data, you can get a new updated image. It's important that you release it before you ask for a new image. So let me give a brief demonstration of ImageIO in action. So one of the things I want to show first is the new preview. And I've got a bunch of images here. Open.

And one nice thing in preview is you can open all the images just by selecting a folder. And I've got a variety of images in here. One of them is an LAB image. And we can do that by-- we can verify that it is an LAB image by going to Tools, Get Info. And this shows the metadata that's been obtained using ImageIO. And we can tell in here from the metadata that's currently returned that the color model is LAB.

We have a variety of other images. We can zoom in and zoom out. The thumbnails over here were obtained using Image.io as well. We have high dynamic range images here. We can zoom in and zoom out on that. Luke later will show how we can manipulate these images in real time. Here's another interesting example which I like to show people. This is one of our things that we use for testing. Oftentimes people want to know, well, how do I know if the profile is being used? What I have here is a document. It's a black and white CMYK document. that has a profile in it that makes values that are gray disappear.

So if this image were rendered and the profile were ignored, what you'd see is the embedded test profile is not used. And that's because you can't see it here, because the profile is being used. But there's actually a gray word not right here. So it provides an interesting test so that you can tell if your profile is being respected or not. Here in this gray version, you can kind of see a little bit of the hint of what was once there and the word not. But this is a great way of testing images. We really should distribute these at some point.

One other example of using ImageIO, I have a test application which shows some of the options. So let me go to open one of the images we just saw. We could desktop images and open up this image here. We can see some information, the height and width and how long it took to draw. One thing we can do is we can specify that we'd like to see what this would look like if it was progressively loaded. If I open up another image, if I open the high dynamic range image, this is a big image, unfortunately, so it takes a couple seconds to open.

If we bring up the metadata on this, we go to window, metadata, we can see that it has height and width, and its depth is 16. This is because by default we return 16-bit integers. However, if we want to return it as 32, and again, it'll take a second or so.

This code still needs to be alt of x someday soon. We need to bring up the metadata. And now we can see that there's a new property in here which is saying that data is returned as floats. So that's the introduction to ImageIO. I'm going to pass the microphone and the demonstration and all the new stuff over to Luke Wallis, who will be talking about high dynamic range imaging. Thank you. Thank you.

Thank you, David. So today, I will be talking about Mac OS X support for high dynamic range imaging, which is a new and exciting feature that we are adding into the Tiger release. As many of you know, high dynamic range imaging is generating a lot of interest and is still a subject of very active research. We could talk about high dynamic range imaging from many different points of view, but what I would like to do today is concentrate on answering very simple three questions: what is it, why use it, and how to process it.

Before we try to answer these questions, let's take a quick look at the current status quo in digital image processing. We can conclude that in a majority, digital image processing is dominated by what is called output-referred approach. What it means is that the requirement of image reproduction are imposing certain requirements on the way we acquire and create images. And because most of the devices we are dealing with, like displays and printers, can only handle 8-bit data per color channel, we impose the same requirement on digital cameras that in fact could produce much more about an order of magnitude more data if they were not restricted to that requirement.

Obviously, there are some advantages. This is not done for no reason. The main is that there is a very minimal image manipulation required before displaying or printing such an image. But obviously, there is a disadvantage that we are losing a lot of color and image information that could be used in further image processing that could result in much higher quality of display or print.

Oops, sorry, wrong direction. Another requirement, which is sort of hidden in the output-referred approach, is that the data is exchanged in one predefined color space. And in the most difficult case, this is sRGB. So when you look at this slide, you see I drew the shape of the typical exchange color space, made it be sRGB. be.

that color space covers only a part of visual gamut. So everything is fine as long as the camera is acquiring the color data within that triangle. But if we are outside, then we are out of luck. We have to do something with this color, and typically we have to push it into the color space. It can be done through different methods, but because the cameras are not very sophisticated in terms of processing power, we are using very often clipping. And as we know from practice, gamut clipping can produce really bad results, like for example, hue shift. And here is one of the maybe a little bit strong and exaggerated examples what could happen. But this is a real clipping in which the white color, because of clipping, became a mixture of completely unrelated colors.

So that is what we can conclude when we look at the image processing from the point of view of device capabilities to reproduce the image. And what I would like to do now is to look at the image processing from a little bit different perspective, from the perspective of human vision. And as we know from the very rich research in this area, color and visual acuity are the most important characteristics of the scene. And not only this, these two depend on luminance and observer's visual adaptation.

We know that we can measure the world luminance and it will cover the range of the values between 10 to power of minus six all the way to the power of 10 to eight when measured in candelas per square meter. But what is important for us is that different ranges of luminance create different illuminations.

Now, and that illumination also can stretch all the way from very dark environment through starlight all the way beyond the sunlight. And now I could spend a lot of time talking about physiological and physiological mechanism controlling our vision, but what I would like to do without going through those details to say that humans have three types of vision which are dependent of the type of luminance. We have scotopic vision which works when we are in the dark environment. We have mesopic vision, which works in light dark environment. And finally, when we are in high illumination environment, we're switching to photopic vision.

Why is this division important? Because our quality of vision is related to this type of vision. As we know, if we look at something in a very dark environment, we have no color vision and very poor acuity. Everything in the darkness seems to be just a shade of gray. On the other hand, our best vision is in the photopic range, where we can see many colours and have a good colour and visual acuity.

This is not everything. What is very important is that humans have a limited simultaneous range, which also depends on the type of illumination. Here I'm showing the widest simultaneous range, which again exists in the photopic vision, that can cover the range of order of magnitude 3 to 4. if we try to estimate this simultaneous range in poor vision, that can, the values can drop by the order of two.

So we may ask ourselves why this is all important. Well, I think there is an answer. Because if we want to represent faithfully the scene that we want to process through image processing, we should have a mechanism to encode the data the same way, or at least as close as possible to the human vision fidelity.

So now let's take a look where in this picture we can fit the typical 8-bit display. And as we know, the typical 8-bit display can cover the range of luminance on the order of magnitude of 2. That is big discrepancy between human simultaneous range and dynamic range of a display. So this is the biggest challenge that we are facing that we have to map the relatively wide human simultaneous range into low dynamic range of our display device.

There is one solution which we already know about. This is the output-referred digital photography. We are imposing the low resolution, low color, and low contrast. of small color space and the only thing we can do is to choose between different options. This is a simplistic view in which we may say, well, if I want to expose the details in highlight, I can use the short exposure, but if I want to see the details in the shades, I can sacrifice the details in highlights and use the long exposure to capture what I wanted. The most important point is that this applied exposure is permanent. Once we burn this into the image, there is no way back.

So I think that at this moment, I'll try to answer the question, what is high dynamic range? And I think that we can define high dynamic range as a special encoding of the image data, which allows us to preserve the full fidelity of human vision. From the implementation point of view, The high dynamic range imaging is based on color values that, first of all, extend over at least four orders of magnitude that can encompass the entire visible color gamut and allow the values outside of a typical zero to one range values of color.

In a summary, what it means that in high dynamic range imaging, we are no longer limited to a specific color space. We are trying to encompass, as I said, all visible colors. But on the other hand, we need to remember that we no longer have a convenient ready-to-display or ready-to-print image. High dynamic image data requires some kind of manipulation before it can be displayed. But the big advantage is that we can make this decision at the moment when we need to reproduce the image with our preference instead of burning that to the image.

This is a kind of simple explanation how we can do that. We can go back and select the short exposure, long exposure, but most importantly, we can implement something which was not needed before, is the tone map rendering, which will allow us to achieve completely different results. For example, here, I can try to combine in one image the details from the highlights with the details from the shades.

So now I'll try to answer the question, why use high dynamic range images? And the most important reason is to preserve the scene referred information that can be useful in further image processing. And this way we want to avoid intermediate encoding with restrictive color gamut, which was happening in this previous approach called output preferred. and also we can avoid irreversible modifications that happen during the image acquisition.

How to process high dynamic range images? The most and the simplest answer is that we should not add any rounding or clipping errors. And for that, we want to render and capture the data in floating point. We want to store the entire image. And if needed, to process the color data in extended color space, which again will not impose any clipping. And at the end, we want to apply a tone mapping for a specific image reproduction. For example, that specific reproduction could be an example I just showed you that I want to see all the details in the image from the highlights and shadows.

Now, let's take a look at the file formats that we are supporting in Tiger. I think that the most important citizen here is OpenEXR that comes from ILM. First of all, it has the smallest quantization error. And most importantly, as you will see later, it comes with the recommended way of tone rendering, which solves a lot of problems in terms of presenting the image content. The other formats basically just define the way of encoding and decoding data with preserving the image fidelity.

So now I would like to show you, if I can get my-- this is demo tool machine-- my little application in which I can open high dynamic range images. And what I would like to show you is that we have to do something with those values which are so large and much bigger than what we can represent by typical range of 0 to 1. And one would think that very simple approach would be to simply map the brightest point in the image to the brightest point of the display.

But if I do that with my little demo application, you see that we don't see too much in this image. There is way too much information beyond one, and scaling didn't produce any visible image. Another very simple approach could be, okay, let's say I would like to see whatever you have in this image, clip the values to 0, 1, typical range, and show me that. Well, as you see, the image quality somehow improved, but it's still very poor.

And now, if I use the OpenEXR, and this is their default zero exposure value, I'm getting some reasonable result, and I can see many more details. And not only this, I can do what I was talking about, that I can impose my preference at the moment of reproducing the image. For example, someone may like this kind of image or someone else may still want to focus on this beautiful stained glass. I want to show you a couple of classic examples like for example the famous memorial church picture which comes from the Debevec website and the same thing happens here. If we just scale the image, the image is basically unreadable. Clipping will show something that quality is really poor. OpenEXR is doing a very good job here. Another example is the picture I was using in the previous slides of our garage at Apple.

And this is how it looks when scaled. This is how it looks when clipped. Once again, typical hue shift when clipping the data. And OpenEXR producing quite reasonable result. What this leads us to the conclusion that tone rendering is a very important issue when processing the high dynamic range images. And there may be many different methods of doing that. And I think this gives me a very good segue to introduce Gabriel Marcu, who will be talking about high dynamic range tone mapping developed at Apple.

Thank you very much for the introduction. As you have seen, the main problem of rendering high dynamic range images is how to map the high dynamic range, how to reduce the high dynamic range into low dynamic range device. And this problem is not trivial. So we have looked into what is available in the published literature.

And here I put a list with a few methods that I select from what is available. And aside from OpenEXR, you can see the histogram adjustment proposed by Ward. Retin-X algorithm, this is a class of methods and a good review on these methods is published by John McCann at the Electronic Imaging Conference in 2002. Another interesting approach is based on the color appearance models and using one of these models, which is iCAM. Fairchild and Johnson tried to reduce this high dynamic range imaging to the displayable one. The last three in this list are proposed at SIGGRAPH 2002, and they are bilateral fast filtering, dodge and burn method, and gradient compression. You can see in this list that you can group the methods in two classes. One of the classes is general algorithms that are applied in 8-bit images to get more information from them and reduce the global contrast to something that is visible on the display.

And the other second class of algorithms, which are grouped in the last three, are algorithms that are specifically designed to handle high dynamic range images. While doing this research of these available methods, we came up with our own algorithm, And that's why I mentioned the Apple method, which I will demonstrate in a minute. One of the things that you face when you try to design or even evaluate a method is what is the intent of that method aside from just compressing the high dynamic range into the dynamic range of the display. And here is a flavor of two kind of rendering intents, one that is retaining the look and feel of a high dynamic range, and you can see this on the image on the left, and another one that is showing as much content from the image. And you can see that comparing these two images around the highlight region in the image that retain the look and feel of a high dynamic range image, you see the glowing around the window while in the other you see more detail, more color of the stained glass of that ceiling window. you can see the same thing happening on the window on the wall. And it depends on the subjective preference of the user which of these settings we'll prefer to use. The good thing is that with Apple method, we are allowing both intents to be shown. So I will do a demo on this.

And I will show you how this algorithm is working. So we open the viewer. And with this, we can choose between different images. Let's start with a memorial image that is, let's say, a classical one, which is picked up as an example by a lot of algorithms. And it is a good thing, because you can compare algorithms on the same image. This is the rendering of glowing, the intent that is showing the glowing around the windows and around the highlight in the image. I also show here the high dynamic range size, which is a good metric for that is the logarithm of the maximum over the minimum values in the file. And this give us three stops or 13 stops. And you can see on the bottom the actual range of floating values in the file. I said that we can support both kind of intents. And right now, I switch to the intent where you can see the color of the glass in the stained glass of the window. So we can open different kind of files, not only from one source. And you can evaluate the power of this method. This is an image that is from OpenEXR, used by the OpenEXR as a test image. And it's quite interesting. It has 18 stops, so it's quite large, high dynamic range. And you can see details in the shadow. You can see details in the highlight. Also, you can look at the text that is on the book and see the details under the table.

As you have noted, I haven't changed any of the default parameters of this method. So our intent was to actually design method where the user will have as less intervention as possible and will be allowed to take the default parameter and get decent results with these settings. So I choose also other image from another source, the NAV1, which is the image with the highest dynamic range that I was able to find publicly available and you can see that this image has 22 stops, one stop being meaning that doubling the range of the values in the file. So if the difficulty in rendering this image is around this window where you are required to show colors of the window and also not to show visible artifacts around like ringing or darkening the area around. The last image that I will show is the image that we create ourselves with different algorithm, the garage image and you can see details from outside and inside and details in the shadow and in the highlight. Eventually with a high resolution image you are able to see even the fluorescent tubes on the top. So this is a demonstration of the HDR viewer. This brings me to the next topic that I would like to touch which is the creation of high dynamic range. In summary about the viewer you can see that with default parameters we are able to open images from different sources and this method is quite robust in the sense of showing very good results from different high dynamic range images.

How do we create high dynamic range images is the next topic. And this is quite interesting. We can start with a file format of these images. And we have to code in RGB floats the radiance of the scene. So how do we capture this? We turn to a method that was published by Debevec and Malik, recovering high dynamic range radiance maps from photographs. And essentially, this method is requiring to take multiple shots of different exposures of the same scene, and then combine these exposures into a high dynamic range file.

We start with the block diagram of the digital camera. And if you look closely, you can see that the scene radiance, it is transformed to the digital output in the digital file, which is maybe JPEG or other file, by a set of transformation. First, the image is passing through the lenses, then the shutter. Then the image is captured by the CCD and converted by ADC converter. And then some mapping is happening in the camera, for example, gamma correction or raw image to JPEG image transformation. And you finally get the digital values in the file.

because the scene radiance is in direct correspondence with the sensory radiance, so the first block can be skipped, and we can group the last four blocks in a single entity, which can be described by a transfer function. So we get the digital output in the file as a function of sensory radiance, E, and the exposure time, dt. With a little math on this, applying the inverse transformation and log transformation, we can recover the sensory radiance from the digital output and the exposure time. This is easily possible. The only thing that we need is the middle term in there, which is the mark in blue, which is referred as a camera exposure function. Once we are able to derive this function, then we use this equation to immediately find the scene radiance. So we will concentrate now on deriving this function that we call in here in the next slides with G. So this function describes the exposure as an exposure dependency on the output gray level. The idea is to take multiple exposures and then to pick up a gray level in one of the exposure, which is that white ring in there in the first image, and then look at how the gray level is changing from one image to another. So we keep the same position, same X and Y in the image, and we look and plot the variation of the gray level from one image to another. This will give us a curve that is showing up like in the diagram. We can do this for other gray levels in the image, and we can recover several curves. And finally, we end up with a set of curves that describe this variation of the gray level. And what we need in the end is to put this curve together and derive a single curve. That is the exposure function of the camera. So now once we know this exposure function of the camera, we know the output gray level in the digital file and the exposure time, we are able to recover the scene radiance of the image.

of the image that we capture. And with this, I would like to do a demo on the application that is able to create this high dynamic range image. So let's close this and open the Creator. The Creator is an application that first will allow us to pick up several images and... I choose a number of already exposed shots, and we get a thumbnail view on the left side, and in here we can select any of these images and see what is their content. You note immediately that no matter how we take these images, you can see either details in the shadow or either details in the highlight of this and the high dynamic range file will be able to capture all information of the radiance of the scene and will will encapsulate this in a high dynamic range format. The first thing is to calculate the high dynamic range to calculate the transfer function of the camera and we did this in a single step you have seen several curves put together and you recover the transfer function of the camera. Then the next step is to use this transfer function and to compute the high dynamic range image. Now you can see that even in the paper it said that you need to do this processing of the transfer function of the camera over many and many images until you get an average behavior of the camera and use that transfer function to create the high dynamic range, we actually add more robustness to the algorithm that is computing the transfer function of the camera such that we are able to recover the transfer function only from the same set of images that we use to create the high dynamic range. So we recover the function from this set of images and we apply this function to these images and we create the high dynamic range. This brings us to an algorithm that will be able to do this kind of things by just specifying the set of images. So we choose a set of images, then the algorithm is computing the transfer function of the camera and is computing the high dynamic range in a single shot. And this gives independence of this application from any camera settings any setup that you may have been required to do for the method that is published in the literature.

So this, let's say you are switching to another set of images, for example, from a different camera and you don't have to specify the camera and you immediately get the high dynamic range file direct from this from specifying only the set of images. The interesting thing about this is we want the user, the Apple user, to be able to have less intervention in this algorithm, less guesswork and finally end up with an application that will be able to provide directly high dynamic range images without taking care of anything.

And this is an advantage for the user. Finally, I would like to mention that, as you have seen here, we select the images from a folder. But we have worked with the image capture. So I invite you on Friday afternoon from 5:00 PM to see an integration of this algorithm with the image capture modules, and you will see an interactive demonstration of how these images are captured live with the camera, and then a high dynamic range file is created. And with this, I thank you very much, and I will turn back to Luke.

Thank you, Gabriel. So at the end of the presentation on high dynamic range images, I'd like to touch on the subject which is very close to our hearts of color scene engineers. We are really interested in color managing high dynamic range images. And you must know that this is the area which is under very intensive investigation, both in academic society and in the industry. At Apple, we also are developing our own method of color managing high dynamic range images, and we are trying to take a new approach, which is based on human adaptation to image viewing environment. We think that image contains enough white point and adopting luminance information that could be used by color appearance model to predict human perception of color in different viewing environments, which basically means we can color manage our high dynamic range images.

Just to clarify what kind of color appearance modeling we are dealing with, let me say that we are looking at this kind of modeling which consists of two major, is based on two major concepts of on chromatic adaptation, which allows us to predict the influence of adopted white point on the color perception.

And the second concept is the degree of adaptation, which allows us to predict simultaneous color contrast related to the luminance of the adopted white point. So in summary, what we are trying to do is to transform the colorimetry of the source using high dynamic range data to our destination. And then after doing, after color managing and bringing to new environment, then applying tone mapping, which will compress color to the range of destination device. And this is what concludes our talk on high dynamic range images. And I'll turn back microphone to David. Thank you.

So thank you, Luke and Gabriel, for your discussion. I just wanted to bring it back just to do a quick summary slide, and then we'll have a few minutes of Q&A at least. Just wanted to summarize once again what's new in Tiger. We got a lot of great stuff here. First of all, in ColorSync, we have floating point support. Image.io, we have a brand-new modern API for reading and writing that has optimized performance and support for metadata. And then we're doing a lot with high dynamic range, supporting OpenEXR file formats, access to compressed or unprocessed data, And also, this is an area of all sorts of ongoing and future research. We'll have lots to show for you.

So, again, we have a few other places you might want to go. There's a graphics and media lab session on Thursday where you can talk to us if you have more questions than we can get to today. And then also, there's going to be some great demonstrations on the last show, last session on Friday, talking about image capture in high dynamic range.