Working with HEIF and HEVC - WWDC 2017

Media • iOS, macOS, tvOS • 58:48

High Efficiency Image File Format (HEIF) and High Efficiency Video Coding (HEVC) are powerful new standards-based technologies for storing and delivering images and video. Gain insights about how to take advantage of these next generation formats and dive deeper into the APIs that allow you to fully harness them in your apps.

Speakers: Erik Turnquist, Brad Ford

Unlisted on Apple Developer site

Downloads from Apple

HD Video (1.82 GB)
SD Video (470.5 MB)
PDF Slides (7.4 MB)

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good morning, everyone, and welcome to Friday of WWDC. [applause] Thank you. My name is Erik Turnquist and today Brad and I are going to talk about working with HEIF and HEVC. So, first off, what is HEVC? HEVC stands for High Efficiency Video Coding and it is the industry standard next generation video encoding technology. It's the successor to H.264.

Now for the more important question, is why? Why is Apple going through all of the effort to deliver a new codec? H.264 has been really good to us for over ten years. Now we thought about this a lot, and we really want to enable new features and unfortunately, H.264 has reached the limits of its capabilities.

We want to enable new features like 4K and larger frame sizes, high bit depths like 10-bit and wider color spaces like Rec. 2020. Now, we want to do all of this while lowering the bit rate, not raising that. So, how do we do that? Well, we do that with HEVC. So, now how much lower are the bit rates we're actually seeing?

Well, for generally encoded content we're seeing up to a 40 percent bit rate reduction for H.264. So, this is a really big deal. And for camera capture, we're seeing up to a 2 times better compression compared to H.264 and JPEG. So, another really big deal here. And we're making all of these changes today. So, if you've installed the iOS, iOS 11 seeds, we've enabled HEVC Movie and HEIF Image Capture by default. So, that means, many of you have already captured HEIF images or HEVC movies without even knowing it. And it just works on our platforms.

Let's go over what we're going to talk about today. I'm going to cover the HEVC Movie side of things, and Brad's going to cover the HEIF Image side of things. We're going to cover accessing this content, playing it back and displaying it, capturing and creating HEIF and HEVC Movies, and then export and transcode. So, first let's cover access. So, many of you are using PhotoKit and PhotoKit will deliver HEVC assets for playback. So, if you're using requestPlayerItem or requestLivePhoto they will deliver, or they will give you automatic playback with adopting new, any new APIs, so this should just work.

PhotoKit can also deliver you HEVC assets. So, if you're calling it requestExportSession, it will transcode to the existing preset you're already using. So, if you're using one of the dimension presets that used to give you H.264, it will still do that. But we'll cover new presets we've added for HEVC. If you're calling requestAVAsset, it will give you access to the HEVC media file and this will have an HEVC video track inside of it.

Now, if you're to backup the application, you want access to the raw bits, so you're probably calling it requestData, so I want to make note that this will actually contain the HEVC video track inside the movie file that you've receive, so you need to be able to handle this. Now that you have this content, let's call about playback and display.

HEVC playback is supported in our modern media frameworks like AVKit, AVFoundation, and VideoToolbox. We support HTTP live streaming, play-while-download, and local file back or local file playback. And we support MPEG-4 and QuickTime file formats as the source, and here there's no API opt-in required. Things should just work.

We support Decode on macOS and iOS and now let's go over where we have Hardware Decode support. So, we have 8- and 10-bit decoders on our A9 chip, so that's the iPhone 6s and we have 8-bit Hardware Decode on our 6th generation Intel Cores, that's Skylake and that's the MacBook Pro with Touch Bar. We also have 10-bit Decode on the 7th Generation Intel Core processors and that's Kaby Lake and that's the brand-new MacBook Pro with Touch Bar. We also have 8- and 10-bit Software Decode fallbacks on macOS and iOS.

So, now let's go over some code you might have and let's convert it to HEVC playback. So, here we're playing "My Awesome Movie" making a URL, then a player and playing it. So, this is the H.264 version. And now here's the HEVC version. There's no changes. So, to play an HEVC movie file, you don't need to change any of your code. We want to have you think about a couple things.

So, the first is about Decode capability. And if you're asking the question is there a decoder on the system that can handle this content, this API is for you. This is useful for non-realtime operations, like sharing or image generation. And it can be limited by hardware support. So, not all of our hardware decoders support every frame size.

Now, for the more important question is about playback capability. If you're asking, how do I have the best playback experience for my customer, this API is for you. And many of you are already using this API. So, not all content can be played back in realtime and we have differing capabilities on different devices. So, if you want to have a one stop shop for the best user experience for playback, whether that's 1x or 2x playback, rewind, scrubbing, or fast forward, this is the API for you. Now, let's go on to Hardware Decode availability.

If you want to get the best battery life during playback, you want to playback on systems that have Hardware Decode support. This will also get you the best Decode performance. So, we have new VideoToolbox API that you can query, is there a Hardware Decoder Supported for this codec? Here I'm showing you HEVC, but you can also use it for any other codec.

Now, for the final question for playback, which codec do I use for playback? Do I choose H.264 or HEVC? Well, if you're concerned about delivering the most compatible content or want to deliver one asset that just works everywhere, choose H.264. Our platforms have supported this format for over 10 years and there's broad adoption in the third-party ecosystem. However, if you want the smallest file size and latest and greatest encoding technology, like 10-bit choose HEVC. You'll have to decide what works in your application.

And with that, let's move on to capture. So, capturing HEVC is supported with AVFoundation and we support MPEG-4 and QuickTime file formats as the destination. We support HEVC capture on our A10 chip, so that's iPhone 7, and now let's go over the A, the capture graph that many of you are already familiar with.

This starts with an AVCaptureSession, this needs to get data from somewhere. You create an AVCaptureDevice, you add it as the input, then data needs to go somewhere. In this case you're using movie file to compress and write the output file. These are all connected with an AVCaptureConnection and this creates your movie file. So, let's convert this into code. And many of you probably have this in your app.

First, create an AVCaptureConnect -- or AVCaptureSession. Here we're making a 4k capture session. Then you create the AVCaptureDevice, add it as the input. Create your MovieFileOutput and this does the compression file writing, add it as the output. And then startRunning and startRecording. And then we're capturing. So, how do we opt in to HEVC?

Well, with iOS 10 we added an API to check for the available video codecs during capture. And new with iOS 11 is you can check, does it contain HEVC. On supported devices, it will return true and you can go ahead and use that in your output settings. And if it doesn't support it you can go ahead and fall back to another codec like H.264.

Now I want to make an important point here, is that order matters with the availableVideoCodecTypes and for this seed we made HEVC the first option. So, that means, if you do nothing else, you'll be capturing HEVC content. We really want to get you used to handling this content. Now, let's move on to Live Photos.

So, we have the same capture graph here, but we use our AVCapturePhotoOutput, and that makes all the Live Photos we love and enjoy. So, first let's go over couple new Live Photo enhancements we've done in the past year. We now support video stabilization, so no more shaky playback during Live Photos.

We also no longer pause music playback during Live Photo capture, and we support much smoother Live Photos up to 30 frames per second. So, let's go over capturing HEVC with Live Photos. So, we have new API in iOS 11 where you can create availableLivePhotoVideo CodecTypes, see if it contains HEVC, and it will return true on supported devices.

Then if it does go ahead and use it, if it does not you can fall back to another existing codec like H.264. I also want to make note that there's the same considerations here, is that order matters with the availableVideoCodecTypes and for this seed we made HEVC the first option.

So, again, if you do nothing else, you will capture HEVC Live Photos. You might be sensing a pattern here. We really want to get you used to handling this kind of content. Now, let's go over the most customizable capture graph, and that's with AVCaptureVideoDataOutput, and AVAssetWriter. So, you use this if you want to modify the sample buffers in some way. So, you might be performing some cool filtering operation.

With configuring AssetWriter for HEVC, you have two options. So, you can either configure custom output settings where you explicitly specify HEVC, or the video data output can actually recommend those settings for you. And we recommend this API. In iOS 7 we added recommendedVideoSettings ForAssetWriter. Now this always recommends H.264.

So, if you want to stick with that, that's fine. However, in iOS 11 we've added new API where you can actually pass in the codec type and we will give you recommended settings for that codec type on supported devices. And with that let's move onto the Export and transcode side of things.

So you can transcode to HEVC with AVFoundation and VideoToolbox. And we support MPEG-4 and QuickTime file formats as the destination. And here API opt-in is required. We support HEVC Encode on macOS and iOS and now let's go over where we support HEVC Hardware Encode. So, we have an 8-bit Hardware Encoder on our A10 Fusion chip, that's iPhone 7, and we support 8-bit Hardware Encode on our macOS on our 6th generation Intel Core processors, that's the Skylake family, and that's the MacBook Pro with Touch Bar.

And on macOS we have a special 10-bit non-realtime, high quality software encoder that you can use and we'll talk about that in a little bit. Now, let's start with the highest-level export APIs, and that's transcoding with AVAssetExportSession. So, with this, you give us an asset, then you pick a preset and we do all the operations for you including compression and we produce an output movie.

So, there's no change in behavior for existing presets. If you're using one of the existing dimension-based presets, and it used to give you H.264, it will still do that. We've added new presets here. And those will convert from H.264 or any other codec to HEVC, and these will produce smaller AVAssets, up to 40 percent in some cases, with the same quality. Now, let's move one level down the stack, to compressing with AVAssetWriter.

So, AVAssetWriter, you're either generating the sample buffers yourself, or getting them from another one of our APIs like VideoDataOutput or AVAssetWriter. And AVAssetWriter's responsible for compression and file writing. Again, like I discussed previously, there's two options for AVAssetWriter. You can either explicitly set custom output settings, in this case we're specifying use HEVC.

You can also specify your bit rate and dimensions, or you can use one of our convenience settings in capture, you can use the VideoDataOutput and for general encode you can use the AVOutputSettingsAssistant. We've added two new presets here that on supported devices will return HEVC output settings. Now, if you're in the business of creating your own custom output settings, it can be a little tricky. So, not all encoders support all output settings.

We've fixed that problem in iOS 11 and macOS High Sierra so you can now query the encoder for supported properties to use in your output settings. To do that you pass in HEVC here, and it will return the encoder ID and a list of supported properties. The encoder ID is the unique identifier for that specific encoder, and with that the properties and the encoder ID can be specified in the output settings and you can be sure that it actually works for compression. Now, let's move to the lowest level compression interface and that's compressing samples with VTCompressionSession. So, just like with AssetWriter you might be generating the samples yourself or getting them from another one of our APIs. VTCompressionsSession compresses them and produces our compressed media data.

So, to create a compression session with an HEVC encoder, it's very simple. In this case we're creating one that's compressing to H.264. Let's go ahead and convert it to HEVC. There we go, and now we're compressing with HEVC with VideoToolbox. So, that was pretty easy. Now, let's go over a couple of considerations on macOS. So, for optimal encoding performance on macOS you want to opt-in to hardware. This will use hardware when available and when it's not fallback to software. So, to do that, set the EnableHardwareAccelerated VideoEncoder property to true in your encoderSpecification and then pass it into VTCompressionSessionCreate.

Now, if you're do realtime encode, you'll want to often require hardware and never fallback to software. So, to do that, you set in your encoderSpecification RequireHardwareAccelarated VideoEncoder to true and then pass it into your encoderSpecification. Again, on systems where hardware supported, this will succeed, but hardware on systems where there's only software encode, this will fail.

All right, now let's go onto a couple advanced encoding topics. And the first is bit depth. So, if you've ever seen a nice gradient in a user interface or a nice sunrise or sunset, you notice what it looks like in real life versus what it looks like in a movie isn't exactly the same.

So, you might see these color banning effects in the video version of your, of your movie. And that's because with 8-bits we don't have enough precision to represent the subtle differences between colors. Now, the great thing about 10-bit is we actually do. So, you get these really beautiful gradients.

Now, with our macOS software encoder, we actually support 10-bit encode. So, first check that the property is supported, and if it is go ahead and use our HEVC Main10 profile for our software encoder. And we want to make sure your entire pipeline is 10-bit. We don't want you going from 8-bit to 10-bit and then back to 8-bit, because that loses precision.

So, we've added new CoreVideo pixel buffer formats to ensure that you can stay in 10-bit. One is listed here. So, now for the first time you can render in 10-bit, encode in 10-bit, decode in 10-bit, and for the first time ever on iOS and macOS our display pipeline also supports 10-bit, so we get it across everything.

[ Applause ]

Now, let's go over our second advanced topic and that's Hierarchical Encoding. And so to understand a little bit about this we need to go over a little bit of video encoding 101. There's three major frame types that compress video, and the first is an I Frame. You can think of I Frames like an image file and they can be decoded independently.

Then we have a P Frame, and P Frames refer to previous frames, so think of them like a 1-way diff and they only contain information that isn't in the previous frame. Now we have their cousin, the B Frame. B Frames refer to previous and future frames and they're like a fancy multidirectional diff. So, they only contain information that isn't in either frame they're referencing.

Now let's pretend we have a decoder that can only handle 30 frames a second, and let's say we have content that is 240 frames a second. Well that means we need to drop some frames before we can decode, because it can't keep up. So, when can we drop frames?

We can drop frames when another frame doesn't depend on it. So, in this case we can drop the last P Frame, because it refers to another frame, but no frames refer to it. So, let's go ahead and drop it. We can also drop the B frame because it refers to other frames, but no frames refer to it. So, let's go ahead and drop it. Now, let's move to a real-world case of encoding 240 frames per second content.

So, this is a typical encoding scheme used when creating content compatible with low end devices. So, for example, when encoding 240 frames a second content, we'll have one non-droppable frame for every seven droppable. So, this gives us a lot of flexibility during playback. On devices that support 120 frames per second decode we can handle that, on devices that only support 30, we can also playback there.

Now, let's throw in our frame references. Because these frames are droppable, they can't refer to each other and they all refer to the non-droppable frame. Now, those of you with compression experience, are all already seeing one problem, is that compression suffers because we can't refer to the nearby frames. So, they're all referring to the non-droppable frame and a lot might have changed between the non-droppable and the droppable frame. All right, so that's problem number one that we're going to fix. Now, let's step through and decode down to 30 frames a second.

So, first let's say we can't handle 240 frames a second, let's go ahead and drop some frames. So here we're dropping down to 240 frames a second, and let's say we still can't keep up. We need to go down to 60 frames a second, let's say we have our decoder that can only handle 30 frames a second, we can't even handle 60 frames a second. So, we go ahead and drop this last frame.

Now, I was really guessing about what frames to drop. So, there's no indication at all about whether I should drop every other frame, or just the first half, or just the second half. So, let's fix this problem too. We can fix that with a concept known as temporal levels, and this allows us to organize frames about which ones to drop first. So, let's go ahead and re-encode our content. And you can already see that this is way more organized. So, first we drop temporal level three, and then two, and then one, and there's no guessing involved. So, this really helps. Now, let's throw in our frame references.

And you can already see there's a big difference here, is that the reference frames are much closer together and they're often referring to frames that are just before, or just afterwards. So, this really improves compression. Now, let's go through and let's say we have our same decoder that can only handle 30 frames a second.

We need to drop some frames. Well, there's no guessing involved. We dropped temporal level three. Now we're down to 120 frames a second. Let's drop down to level one. Now we're down to 60, and now we have a level that our decoder can actually handle. So, this reduces guessing with frame dropping.

Let's go over what we've learned. So, with HEVC hierarchical encoding, we have improved temporal scalability. There's a much more obvious frame dropping pattern and it removes frame drop guessing during playback. We also have improved motion compensation, the reference frames are much closer to each other, so we can use more parts of other frames and it also improves compression.

We're also using file annotations and for those of you who like to read specs, check out MPEG-4, Part 15 section 8.4 and basically we're using sample groups, so no bitstream is parsing -- no, sorry. We're using sample groups so no bitstream parsing is necessary to get at this information. So, that really helps. All right. How do we opt-in to this?

So you want to opt-in to this if you want to create compatible high frame rate content and there's two properties you should set. You set the base layer and capture frame rate. First check that they're supported on the encoder you're using, then set the BaseLayerFrameRate, this is the temporal level 0 frame rate, in our previous example this was the 30 frames a second, and then set the ExpectedFrameRate, in our previous example this was 240 frames a second. The base layer must be decoded, and we can decode or drop other levels. So, now that you're all experts in hierarchical encoding, let's move it over to Brad for the image side of things. Thank you.

[ Applause ]

Thanks, Erik. I'm Brad Ford from the camera software team, and I get to talk to you about the other four-letter acronym that begins with HE, Here's the agenda for the rest of the session. First, we're going to cover what is HEIF at high level. We'll start at the very lowest level when we talk about reading and writing files with HEIF. Then we'll go up to the top of the stack and talk about how to use general use cases and common scenarios with HEIF, and we'll end with a topic that's most dear, near and dear to me, which is capturing HEIF. So, first off, what is HEIF?

HEIF is the High Efficiency Image File Format. The second F is implied and silent. You don't need to call it HEIF [extra F sound]. You'll just embarrass yourself in front of your compressionist friends if you do that. It's a modern container format for still images, and image sequences. It's part of the MPEG H Part 12 specification, and by way of curiosity it was proposed in 2013 and it was ratified in summer of 2015, just 1.5 years later.

If any of you know anything about standards organizations, a year and a half is kind of like two days in real people time. So, you know it must be an awesome spec. The technical detail I'm sure your most interested in and the reasons that you came today is how to pronounce it. So-- .

[ Laughter and Applause ]

I use the scientific method, I pulled all the engineers on my floor and the voting was largely along party lines. The German speaker said "hife", the French said "eff", and the Russian said "heef". And "heef" was the runaway winner though. That's "heef" as in I can't belief how big, or how small the files are. Now, my Finnish office-mate was quick to point out that Nokia researchers were the ones that came up with the spec, so the Finnish pronunciation should win, that would be the 1 percent "hafe".

Well, as for me and my floor we're going to call it "heef". It can use HEVC intra-encoding, which unsurprisingly compresses much better than the 20-year-old JPEG, two times as well as a matter of fact. That's an average of two times smaller, not up to two times smaller. We used qualitative analysis on a large data set of images to arrive at this number, ensuring visually equal quality to JPEG. It supports chopping up an image and compressing it individual tiles. This allows for more efficient decompression of large images in sections.

HEIF also has first class support for auxiliary images, such as alpha, disparity, or depth maps. Here's a gray scale visualization of the depth map that's embedded in this HEIF file. Having depth information opens up a world of possibilities for image editing, such as applying different effects to the background and foreground like this.

Here I've applied the Noir black and white filter to background, and the fade filter to the foreground. So, notice that the little girl's tights are still in pink, while everything behind is in Noir. Knowing the gradations of depth, I can even move the switch-over point of the filters like this, keep an eye on her flower. Now, just her hand and the flower are in color, while everything else is black and white. You can even control foreground and background lighting separately, exposure, such as this.

Now, she looks like you Photoshopped her into her very own photo. I'm not saying you should do it, I'm saying you could do it. That was just a teaser for a two-part session that we had on depth, and that's sessions 507 and 508. I hope you'll make some time to look at those videos. When it comes to metadata, HEIF has a great compatibility story. It supports industry standard Exif and xmp as first-class citizens.

HEIF isn't just for single images, it also supports image sequences such as bursts, exposure brackets, focus stacks. It also has affordances for mixed media, such as audio and video tracks. Let's do a demo, shall we? Okay, this is a showcase that takes place in Apple's very own Photos app.

All right, I'm going to start with a pano and this is a nice looking pano, this one is from Pothole Dome in Yosemite. It looks great, it's sort of what you'd expect from a pano until you start zooming in. So, let's do that. Zoom in a bit. Looks nice, let's zoom in a little more. And then zoom in a little more. And zoom in a little more. And keep zooming. And keep zooming, oh my gosh I can see what the speed limit is, and wow.

[ Applause ]

There are cars there, and there are Porta Potties. I can even go and take a look at the peaks in the background. Notice how it snaps into clarity as I go. This is actually a 2.9 gigapixel pano. It's 91,000 pixels by about 32,000 pixels. The RGB TIFF file for this is well over 2 gigabytes and I assured it brings any fast Mac to its knees, whereas the HEIF file is 160 megabytes, you literally cannot do this with JPEG, since JPEG maxes out at 64k by 64k pixels. HEIF does not max out. It supports arbitrarily large files and it keeps the memory in check by efficiently loading and unloading tiles.

So, while I have this enormous data sitting in front of me, I'm never using more than 70 megabytes of memory at a time in the Photos app. So, it's responsive and I can zoom in and zoom out. I could do this all day long, but I should probably go back to slides.

[ Applause ]

On all iOS 11 and macOS 10.13 supported hardware, we read and decode three different flavors of HEIF. The three different extensions you see here relate to how the main image in the file is encoded. For HEIC, .HEIC also the UTI of public.heic that refers to HEIF files in which the main image is compressed with HEVC. The second flavor is AVCI, in which the main images is compressed with H.264, and then the .HEIF extension is reserved for anything else, could be JPEG inside, could be any of the supported codecs.

We only support one form of HEIF for encode and writing, and that's the HEIC format, in other words the ones in which you use HEVC. We figure if you've gone far enough to adopt the new file container, you might as well adopt the greatest compression standard as well. Support is currently limited to iOS 11 devices with the A10 Fusion chip. All right, let's go over to low-level access to HEIF.

The lowest level interface on our platform for reading and writing images is ImageIO. It encapsulates reading from either a file or in-memory data source using an object called CGImageSource. It also supports writing to files or to immutable data using CGImageDestination. These objects have been around for a long time. You've probably used them.

To open a JPEG image file on disk, this is how you would do it using ImageIO. First you create the URL, then you call CGImageSourceCreateWithURL to create your source. The last argument is an options dictionary where you can optionally pass the UTI of the input. It's not needed when you're opening a file on disk, because the UTI can be inferred from the file path extension.

Once you've got a CGImageSource, you can do several things with it, such as copy the properties at any index, that's getting metadata out of it such as Exif. You can also create a CGImage from any of the images in the file. For JPEG there's typically only one image in the file. CGImage is of course like a promise, a rendering promise. The JPEG data can be lazily decoded when necessary using CGImage such as when you're rendering it to a CG bitmap context. You can also get a thumbnail image using a variety of options.

For instance, the maximum size that you would like, what to do if there's none available in the file, and when you call CGImageSource CreateThumbnailAtIndex it does decode right away. Now, here's the analogous code for opening a .HEIC file. Can anyone spot the differences? Here, I'll make it easy for you. That's it.

It's a comment and it's a file path, that's it. In other words, CGImageSource just works. The one difference you don't see is how the HEVC is being decoded. On recent iOS devices and Macs the decode is hardware accelerated, whereas on older devices it's done in software and with thus be slower.

A quick word on the tiling support that we just saw in the demo, CGImageSource can provide a dictionary of properties about the image by calling CGImageSourceCopy PropertiesAtIndex and the properties dictionary is a synonym for metadata, Exif, Apple Maker Note, et cetera. There's also a subdictionary called the TIFF subdictionary, in which you'll find the size of the encoded tiles as the tile length and tile width. By default they are encoded as 512 by 512 pixels.

CGImageSource provides you with CG Images as we saw, and CGImage has a nifty method called cropping(to: that takes advantage of the tiling. This call creates a new CGImage containing just a subsection of another image. This isn't a new API, but it works really well with HEIF where the tiles are encoded individually. You don't need to worry about the underlying encoded tile size, you can simply ask for the subregion that you want to display or render, and know that under the hood you're getting all of the tile-y goodness. It's only decoding the tiles that are necessary for that subregion.

Now, let's talk about the writing side. Here's how you write a JPEG with ImageIO. You, after creating a CGImageDestination calling CGImageDestinationCreateWithURL, where I should point out you do need to specify what the UTI is. Here I'm using AVFileType.jpg which is the same as the UT type public.jpg. I'm being careful with the result, I'm using guard let just in case destination is nil. Now, in the, with the current JPEG, the only reason it would be nil is if you asked to write to a file that's outside your sandbox, but to be defensive you should really write code in this manner.

Next, you add your CG image or images, one at a time with accompanying metadata if you would like. And then when you're done, you call CGImageDestinationFinalize which closes the container for editing and then writes it to disc. Now, let's look at the HEIC writing. Again, differences are very small.

Just the file path extension, the UTI, the comment. One important difference here though between JPEG and HEIF is that creating a CGImageDestination will fail on devices with no HEVC hardware encoder. And when it fails, destination is nil. So, the good defensive code that I wrote on the previous slide, is even more important to do with HEVC where there is now a new reason that the destination might be nil. Please always make sure that you check this is the one and only way to know whether writing to HEIC is supported on your current platform.

Also worth noting is that ImageIO has added support for reading and writing depth maps as I talked about earlier. We've done that for both HEIC and we manipulated JPEG in strange sorcery ways that we probably shouldn't talk about, I'm not going to delve deeply into that though because it's covered in the dedicated session 507 and 508 where we talk about depth. And I hope you'll go look at those session because they're many segues to the auxiliary image format in HEIF.

All right, it's time to move on to our next major topic which is high level access to HEIF. But before we do that, I feel that WWDC should be a cultural experience, culturally enriching, not just an educational one. And that's why I want you to rest your brains for a moment with some compression poetry. All right. Wait for it. JPEG is yay big, but HEIF is brief. [laughter] Thank you.

[ Applause ]

See it's compression poetry, so it's small. Did you like that? Do you want to hear some more? Okay, let's do another one. Here's a compression haiku. HEVC has twice as many syllables as JPEG progress. Thank you. All right let's move on. [applause] I'm sure they'll edit that out later. Okay, we're going to talk about HEIF and PhotoKit. PhotoKit is actually two frameworks, it's Photos framework and PhotosUI and it's very high level, it's even above UIKit.

The way that you work with HEIF in PhotoKit when applying adjustments we're going to cover just briefly and we're going to talk about how you apply adjustments in three different scenarios, photos, videos, and Live Photos. And then we'll talk about common workflows that you would use with PHPhotoLibrary.

Let's briefly outline the steps involved in applying an edit or an adjustment to an asset using PhotoLibrary. You ask the PHPhotoLibrary to performChanges and in that change request you start with a PHAsset that you want to edit, such as a photo. And you call request content editing input on the asset to get a PHContentEditingInput.

This is the guy that gives you access to all the media associated with your asset such as a UIImage, a URL, an AVAsset, or a Live Photo. Next you create a PHContentEditingOutput by calling in it with content editing input. The editing output tells you where to place all of your rendered files on disc by providing you with a renderContentURL. You then perform your edits to the media that's provided you from the editing input, and then you write them to the specified location. Finally, the PHPhotoLibrary validates your changes and accepts them as a whole or rejects the change.

So, the rules with respect to renderedOutputImages are unchanged, but you may not been aware that they were in force. In iOS 10 your output images must be rendered as JPEG with an Exif orientation of 1, that is if there's any rotation to be done, it is baked into the image in the outputRendered file.

You may have overlooked this detail since probably 99 percent of the content that you are editing was provided as JPEG and then you just outputted it to the same format. But now you will see a proliferation of input content that is HEIC, so you should be well aware that you must still render all of your output content to JPEG with Exif orientation 1.

Here's the code for it. Make, first you make a CIImage, this would be one way of doing it. You could make a CIImage from the content editing inputs file URL, and then apply your edits. Here I'm doing both an application of a filter and baking in the orientation. And then when I'm done, I call ciContext's handy dandy writeJPEGRepresentation, which if you've used this boilerplate code in the past, it still works correctly because it's outputting to a JPEG regardless of what the input was.

Our second applying adjustments use case relates to videos, and the rule again same as iOS 10 is that no matter what the format of your input movie content, you must produce a movie compressed with H.264 as your output. Yes, even if the source movie is HEVC, you still need to render to H.264 for output.

Here's some boilerplate code to edit video content that looks like this. First you get an AVAsset from the PHContentEditingInput, then you can create an AVVideoComposition in which you are handed each frame one at a time and you can get them as CIImages and then request an object that has a mouthful of a name, AVAsynchronous CoreImageFilteringRequest.

You get a CIImage and then you produce a CIImage, when you're done rendering it you call request.finish and then as a final step, you export your AVAsset to a file on disc at the URL told to you by the PHContentEditingOutput. Now here's the important part. The preset to use is AVAssetExportPreset HighestQuality or any of the existing ones as Erik said, still compressed to H.264. Don't use the similarly named new ones which have HEVC in the name because you're change request will fail with an error.

Finally, applying adjustments using Live Photos, the video content of Live Photos. What I'm talking about here is the moving aspect of a picture when you either swipe between photos that were Live Photos or when you Force Touch on a picture or swipe between pictures. This is the simplest use case as you never get to deal directly with the input or output files. You're passed CIImages and you produce CIImages. The encoding is done on your behalf.

There's a lot of good code to look at here, but I'm not going to spend a lot of time on it. You can pause the video later and take a good long look at it. The one take home point is that after you've filtered each frame, in a Live Photo movie you can tell the Live Photo content to save your Live Photo to a given URL and that's it. The Live Photos will be saved out using H.264 on your behalf just as the stills will be encoded as JPEG.

Okay, let's move over to the common workflows with PhotoKit. When displaying content from your photo library, you use an object called the PHImageManager and this provides you with one of three things. You could get a UIImage if it's an image, a PlayerItem if it's a video, or a PHLivePhoto if it's Live Photo content. Here you don't need to make any changes because all of these are high level abstractions in which you don't care where the sources came from, all you're doing is displaying them. No code changes needed here.

The next is backup. When using PhotoKit for backup purposes, you probably want to access the raw assets such as the HEIC files and the QuickTime movies. And you do that using PHAssetResourceManager. It will give them to you in the native format. The only thing to be aware of here is that you might get different file types coming than you're used to, so make sure that you're ready for it.

The third and most complicated case is sharing. Here you're sort of leaving Apple's nice walled garden. You have to think about your compatibility requirements. Are native assets okay? You might be doing your clients a favor or you might be doing them a disservice by giving them HEIC content depending on whether they're ready for it. So, here you must weigh compatibility versus the features that HEIC affords.

If you do choose compatibility over features, you can ensure format compatibility by specifying the output format explicitly. For images, you can just check the UTType that you get, and see that it conforms to say JPEG, and if it doesn't, explicitly convert it. With videos, you can always force compatibility by requesting an export session with a preset that you know will deliver H.264 such as PresetHighestQuality.

All right, onto our last topic of the day, capturing HEIF. Finally, one that I know what I'm talking about. But let's do compression haiku number two, please would you let me? It's fun for me. Here we go. HEIF a container, compresses four times better than HEVC. Think about that. Okay, so, why are we wasting our lives saying HEVC, it's supposed to be a good codec right? Why aren't we calling it "hevick".

All right. So, Erik mentioned that AVCapturePhotoOutput added support for Live Photo movies encoded with HEVC. This class was introduced last year as the successor to AVCaptureStillImageOutput. It excels at handling complex still image capture requests where you need multiple assets delivered over time. It is currently the only way on our platform to capture Live Photos, Bayer RAW images, Apple P3 Wide Color Images, and new in iOS 11 it is the only interface on our platform for capturing HEIF content.

HEIF capture is supported on the A10 chip devices which are iPhone 7 Plus, iPhone 7, and the newly announced iPad Pros. We'll do a brief refresher on how to request and receive images with the photo output. First, you fill out an object called an AVCapturePhotoSettings this is sort of like a request object where you specify the features that you want in your photo capture. Here it's the orange box. Here, I've indicated that I want auto flash, meaning photo output only use the flash if it's necessary, only if the light is low enough to warrant it.

I've also asked for a preview sized image to accompany the full-sized image so that I can have a quick preview to put on screen. I don't know exactly what the final aspect ratio of it will be so I just ask for a box that's 1440 by 1440. I then pass this settings object with a delegate that I provide to the photo output to start or kick off a capture request.

Now the arrow on top shows when the request was made, and now I'm sort of tracking this package delivery, PhotoOutput calls my delegate back with one method call at a time. Very soon after I make the request the PhotoOutput calls the delegates first callback which is willBeginCaptureFor resolved settings and it passes you this blue box which is a ResolvedPhotoSettings. This is sort of like the courtesy email that you get saying we've received your order, here's what we'll be sending you.

And this ResolvedPhotoSetting sort of clears up any ambiguity that you had in your settings that you provided at the beginning. In this case, we can now see that flash is not auto, it's true or false. So, it's become true, we know that the flash is going to fire. Also, we now know what the final preview image resolution is going to be.

Finally, after we get the willBeginCaptureFor, we, our second call back that we receive is willCapturePhotoFor ResolvedSettings. This is delivered coincident with the shutter sound being played. And then shortly thereafter comes didCapturePhotoFor ResolvedSettings just after the image has been fully exposed and read out. Then some time typically passes while the image or images are processed, applying all the features that you asked for.

When the photo is ready, you receive the didFinishProcessingPhoto sample buffer call back and the image or images are delivered to you. Here I got the main image, and the preview image. They're delivered together in the same call back. Finally, you always always always get the DidFinishCaptureFor ResolvedSettings callback. And that is guaranteed to be delivered last. It's the PhotoOutput's way of saying we're done with this transaction, pleasure doing business with you, you can clean up your delegate now.

This programming model has proved to be very flexible. We've had a lot of success with it because we've been able to add new methods to the delegate as needed when we add new features. For instance, we added support for RAW images. There's a call back for that. We added support for Live Photos, there's a separate call back for that, for getting the movie.

So, it would seem like HEIF would be an easy addition to this very flexible programming paradigm. Unfortunately, it's not. The incompatibility lies in the CoreMedia SampleBuffer which is and has been the coin of the realm in AVFoundation for many many years. We have used it for still images since iOS 4. It's a thin container for media data such as video samples, audio samples, text, closed captions.

HEIF on the other hand is a file format, not a media format. It can hold many media types. Also, CMSampleBuffers can of course carry HEVC compressed video, but that HEV compressed, HEVC compressed video doesn't look like the HEIF containerized HEVC. Remember, HEIF likes to chop things up into individual tiles for quick decode. You can't store that kind of HEVC compression in a frame in a QuickTime movie, it would just confuse the decoder.

So, at this point, you might be asking yourself, if we have this fundamental tension between file container and media container, how would we be able to use CMSampleBuffer for so many years with photo output and still image output? Well the answer is JPEG. We got away with this because of the happy coincidence that JPEG, the image codec, and JFIF the file format are virtually indistinguishable from one another. Both are acceptable as images, in another container such as a QuickTime movie.

So, the answer to our quandary is to come up with a new purpose built in-memory wrapper for image results and we call that the AVCapturePhoto. It's our drop-in replacement for CMSampleBuffer. It is in fact faster than CMSampleBuffer because we are able to optimize delivery of it across the process boundary from the media server, so you get even better performance than you did in iOS 10. It's 100 percent immutable like the, unlike the CMSampleBuffer so that it's easier to share between code modules. It's also backed by containerized data. I'm going to talk more about that in a minute.

Let's talk about some of its attributes. It has access to critical information about the photo such as the time at which it was captured, whether or not it's a RAW, Bayer RAW photo, and for uncompressed or RAW photos, you get access to the pixel buffer data. Also, side band information travels with the AVCapturePhoto too, such as the second smaller preview image that you can ask for. You can also now request a third image that's even smaller to be in embedded as a thumbnail in the container.

An ImageIO property style metadata dictionary is provided that can contain Exif, or other metadata that you've come to expect. And with the iPhone 7 Plus dual camera, you can request that a depth data map be delivered with the AVCapturePhoto results as well. AVCapturePhoto also provides a number of convenience accessors such as a reference to the resolvedSettings object that we saw in previous slides.

Also, it gives you easy access to bookkeeping about the photos. For instance, if you've fired off a request for a RAW plus HEIC, you would expect to get two photos. So, the photo count accessor will tell is this photo one or photo two? If this photo is part of a bracketed capture, such as an auto exposure bracket of three or four different EV values, it can tell you which bracket settings were applied to this particular result as well as its sequence number and whether lens stabilization was engaged.

AVCapturePhoto also supports conversions to different formats, so it's friendly and able to move to other frameworks that you would use with image processing. First and foremost, it supports conversions to data representations if you just want to write to file. And it can produce a CGImage of either the full-size preview or the -- sorry the full-size photo or the preview photo.

Now the mechanism for opting in to get an AVCapturePhoto instead of a CMSampleBuffer is just that you need to implement one new delegate method in your AVCapture PhotoCapture delegate, and that's this one here. It's very simple it just has three parameters. It gives you the AVCapturePhoto and optionally an error. Now, error or not, you always get an AVCapturePhoto with as much information about it as possible, even if there's no backing pixel data.

The following two really lengthy delegate methods have been deprecated to help steer you towards the new and better. We used to have separate call backs for getting the RAW or the uncompressed or compressed, didFinishProcessingPhoto which would give you a CMSampleBuffer or didFinishProcessingRawPhoto which would give you a SampleBuffer. You needn't, you needn't use these anymore. You can just use the new single which subsumes both of them into one.

All right, in iOS 10 we supported the following formats. For compression, all you could get was JPEG. For uncompressed you had your choice of two flavors of 420 or BGRA, and of course we supported Bayer RAW. Now, in iOS 11, in addition to adding HEVC support, we're adding a new dimension to this as well. Every image format that you, that you request is also backed by a file container format. In other words, implicitly, every image that you capture is being containerized.

For HEVC the implicit container is HEIC, for JPEG it's JFIF, for the uncompressed formats it's TIFF, and for RAW formats as before it's DNG. Now, why would file containerization be a good thing? The answer is performance. Let me explain using a case study. So, here's the old way you would get a JPEG and write it to disc.

PhotoOutput would deliver you a SampleBuffer with a full-sized image and a preview image and it would attach some metadata to it such as Exif. If you wanted to mutate that in any way, you would have to wait until it delivered the call back and then you would get the attachment that had the Exif, manipulate it, and re-add it to the SampleBuffer. Then when it came time for writing it to disc, you would call PhotoOutput's JPEGDataPhotoRepresentation and pass it the two buffers.

Outcomes a JPEG data, ready to write to disc. While in code it looks simple, a lot is happening under the hood. Because we conflated preview image with embedded thumbnail image, we had to take something that was sized for the screen and scale it down, compress it to JPEG, incorporate all of your Exif changes, and rewrite the full-size image. So, a lot of scaling and compression done just because you wanted to include a thumbnail with your image and manipulate a little bit of metadata. Not efficient at all.

Now in the new way, AVCapturePhoto lets you specify up front what you want in the container. If it has enough information to prepare the file container right the first time, then it's done before you ever get the first call back. The way you do this is you fill out some extra features in the AvCapturePhotoSettings.

This time you can specify in advance the codec that you want, and optionally the file type. You specify metadata that you would like to add such as GPS location, you can now do this before you've even issued the request. You can also tell it I would like an embedded thumbnail and I would like it using these, these dimensions.

You then submit your request to the AVCapturePhotoOutput and eventually it gives your delegate an AVCapturePhoto as its result. This AVCapturePhoto is backed by something that's already in a HEIC container. It's already been compressed in tiles. It's already embedded that thumbnail image that you asked it to. It's already put the metadata in the correct place.

So, the final call that you would do to write it to disc photo.fileDataRepresentation is much simpler than in the previous example. All it's doing is a simple byte copy to NSData of the backing store. No additional compression, or scaling, or anything. It's all done in advance. This is much more efficient and especially when we're dealing with HEIF, it's necessary to get all of the performance of that great tiling format that I talked about earlier.

Now, let's switch over to a few performance considerations with HEVC and HEIF. The first is what to do about photos that are taken during still capture. When you take a HEIC photo while capturing a movie, you should be aware that the same hardware block that's compressing video, that is the one that does H.264 or HEVC compression, is also being asked to do double duty if you want to encode a HEIC file where HEVC is the compression format. That hardware block may be very busy indeed if you are capturing high-bandwidth video such as 4k 30 or 1080p 60.

Video is on a real-time deadline, so it gets priority over stills. This means that it may take longer to get your still results back and it also may mean that there are up to 20 percent larger than they would be otherwise because the encoder is too busy to use all of the features that it would if it didn't have to meet that real-time deadline for 30 or 60 frames a second. So, our recommendation is if you're capturing video, and taking stills at the same time, you should use JPEG for the photos to leave the encoder for HEVC as available as possible for the video.

Another concern is HEVC and HEIF bursts. This is where you mash on the button and you're trying to get a constant frame rate maybe 10 frames a second of capture images. HEVC encode obviously is doing a lot more work than JPEG did, it's delivering a file that's less half the size of JPEG. Therefore, HEVC encode does take longer. Now we've benchmarked and we're comfortable that HEVC HEIF can meet the 10 fps minimum requirement for bursts, but if you need to capture at a higher frame rate than that, our recommendation is to go back to JPEG for bursts.

And we've heard a lot about compression today and I feel I would remiss if I didn't give you my thoughts on WWDC. It is after all a compression talk. So, I can't just leave this dangling there. World Wide Developer Conference, nine syllables. W-W-D-C, eight syllables. That is like the worst compression format ever. It's lossy, it's like .1 to 1 compression ratio, which is even worse than lossless JPEG. So, please as a service to me, for the rest of the conference, which you please only refer to conference as WWDC or Wuh-Duck. [laughter] All right. Let's summarize what we learned today.

HEVC movies are up to 40 percent smaller for general content than H.264 and for camera content on iOS they are 2x smaller. Also, HEVC playback is supported everywhere on iOS 11 and High Sierra, sometimes with software sometimes with hardware. And to create HEVC content you need to opt in to new capture APIs or new export APIs.

Also, we learned about HEIC files that they are twice as small as JPEGs and that decode is supported everywhere on iOS 11 and macOS where capture is supported iOS only and where we have an A10 chip, and you do that using the new AVCapturePhoto interface. For more information, here is the URL for today's session.

I also wanted to point you to some sister sessions to this one, the first one in the list High Efficiency Image File Format is one that went straight to video. This is where we really delve deeply into the bits in the HEIF file. It's a great, great presentation. You should definitely listen to it.

It's performed by Davide so you get the nice Italian accent going at the same time. Also, the Introducing HEIF and HEVC which was one Tuesday, which gave a higher-level introduction to what we talked about today. And finally, the depth sessions that I've made several references to, they have several segues to the auxiliary image format that we use to store depth in HEIF. Thank you and enjoy the rest of the show.

[ Applause ]