Using Digital Imaging Devices in Your Application - WWDC 2004

Graphics • 50:35

The Image Capture Framework is evolving to support a wide range of imaging devices and communication protocols. View this session to hear about the latest Image Capture-related developments and to learn how to harness the power of digital imaging devices in your application.

Speaker: Werner Neubrand

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.

Hello everyone and welcome to this year's Image Capture Show. Well, actually by just looking at this slide I said, well, I guess we have a problem in showing you a pretty cool demo later on. Because this slide says we have to refrain from taking pictures and I guess we have set up everything here, but I guess we just file it through. So, let's talk about Image Capture. We will start with a short overview of what Image Capture does. And the easiest way to describe Image Capture is really, it's a framework that allows you to work with still image capturing devices. A camera, a scanner.

And the nice thing really is we allow a wide range of devices and we support multiple communication protocols. It's funny, like a couple of years ago when we did the first Image Capture session, we were forcing you, or at least asking you, as device manufacturer, "Well, wouldn't it be great if we had a few, maybe just a handful of protocols that we can use?" And really by just using a handful of protocols, just using a handful of driver or device modules, and be able to talk to as many devices as possible.

The industry trend has really shown that most of the new devices are really mass storage or PTP, Picture Transfer Protocol based, which is really, really great. And so later on in the session, we will show you A couple of new APIs that we are going to add for Tiger to make the usage of new devices, new device classes, even easier.

Let's start with a short overview of the Image Capture Framework and its pieces. At the bottom level, we have the hardware, the device that you connect. And this has actually a device module that is responsible for talking, doing the communication to that device. This device module is a user-client background application, so there's no need to write any kernel drivers or so. And it talks to the Image Capture extension, the Image Capture Background App. The Image Capture Extension can handle multiple devices at the same time.

and can actually handle multiple clients at the same time. So in theory, you could use iPhoto, you could use the Image Capture application, FileMaker, or whatever your application, for example, application and then at the same time access a single device. This is a bit different from, if you're familiar with the Twain model, where all have a layering stack that's very similar to what we have, but they all run in the same process. We have different processes and really the Image Capture background application as a bottleneck will serialize the device requests and really make it easy and straightforward for a client application to talk to one or multiple devices at the same time.

Now if we look at this structure here, actually everything inside Image Capture is made up of very small building blocks. First building block is the notion of an ICA object. An ICA object just has a type and subtype and may contain a reference to another ICA object. And it may contain properties or actually a collection of properties and metadata in a dictionary. Which leads us to the second building block, that's the properties, ICA properties. Here again we have type and subtype and these properties contain actually the real data. With real data we mean the kind of image data and the available thumbnails.

[Transcript missing]

When we came up with the initial design of Image Capture, we were thinking, well, we could put all the metadata, all the properties, into separate objects, I say properties. It turned out it's kind of cumbersome to get all the metadata collected. So it was much easier, and you will see that in a little bit, much easier to use CFDictionaryRefs or NSDictionary's that you can pass around and they will contain all the metadata, for example, for a given image. So all the excess data will be in one of these dictionaries.

If you look at these three building blocks, they form kind of a unit. And so once again, the object has basically type and subtype. The properties has the entire image and thumbnail. And the dictionary contains the, in this case, EXIF metadata. That allows you to work with images in a very convenient way.

Out of these, you can build up your tree and the top level object that you see here, the device list, is actually always available even if you do not have any device connected. What was new with Panther was the addition of a dictionary to that device list object.

That dictionary, of course, does not contain any metadata because there's no image for the device list object. But what it contains is a list, a reference to all devices that are connected. So, two calls: "Give me the device list" and then "Give me the dictionary for that device list." If you do these two calls, you will have the information of what kind of devices are connected.

And again, by just using these building blocks, that's what we do inside the Image Capture Framework. So whenever, for example, iPhoto wants to talk to a device, first thing, as all other clients have to do, first thing is really give me the device list and then query the device list object.

For example, get the dictionary. Get out of that dictionary the number of devices for each device. You can get the dictionary, and that dictionary actually will contain... "Either flattened out or as a tree, both variants are in there, all the images, all the files that are on that device."

Now you can use that on a single machine, but what if you have two machines? Well, of course, you can have the same thing running on two machines. But with TIGER, we were introducing device sharing for image capture devices. By just setting up the devices, share on one side and listen on the other side, you can take a standard camera or scanner and just hook it up to one device but use it on a second device.

The detection is fully automatic using Rendezvous and you just pick and choose. But an even more cooler thing on Panther was the Image Capture Web Server. I guess this was a best-kept secret for Panther because a lot of people were later on in reviews, we were seeing, well, I didn't know about that feature. That's a really cool feature.

The Image Capture Web Server is a faceless background application, a regular client for the Image Capture Framework. And what it does is it takes the attached camera and publishes that over the net. So what you could do, you can connect at home, you can connect a camera, and from work, you could actually look at the images on that camera.

Or, if you have a camera that supports PTP and allows you to take pictures, you can switch to a mode where it actually allows you to take pictures over the internet. And it has a monitoring mode so that every so and so many seconds, it will automatically take a picture and display it. And that's a really convenient way to get to your data.

Now, that was really the kind of quick, what is Image Capture and what are the Image Capture components. One thing that we are really handling in the Image Capture Framework is devices that are connected via USB and FireWire. We handle that pretty well because, like, out of the box, just fresh installed Mac OS X, you connect a camera and whoops, the camera shows up. AI Photo launches or Image Capture application launches and allows you to download the images.

So, this hard plugging is done through an IC notification framework, completely invisible to the users. But what is happening really, the device gets connected at I/O Kit level. We learn, oh, there's a new device. We do a matching based on information that we have inside the device modules. And we then launch actually the appropriate device module.

So that's all great if you have USB and FireWire devices. But what about devices that are non-HOT pluggable? Well, up to now that was a problem. And for TIGER we want to have a solution for that. So, We really see that there are a lot of new devices, for example cell phones, coming out that have Bluetooth and a built-in camera.

What about these devices? Or PDAs? Some of the PDAs have a built-in camera. Others have just attachments where you can plug in a camera and a fairly decent resolution camera nowadays. And one thing that has probably a very bright future is the area of wireless cameras. So by the end of the year or early next year, you will see a lot of new devices showing up that have wireless support directly built in. So these devices, well, they may use Rendezvous or may use, like, worst case, just an edit field where you type in an IP address. But there will be a way, and there needs to be a way, to use these devices as well.

And of course the same thing for SCSI-based devices. Good old SCSI, non-hard pluggable, I mean you have to really make the connection up front before you turn on the machine. And those devices, it would be nice if we had a simple way for the user to use those as well.

So, in order to support these non-hot-pluggable devices, we have to introduce new APIs. And it's enough to just introduce, like, an ICA Load Device Module and Unload Device Module. For the Load and Unload, You can imagine then we probably don't want everyone to write code to do that. So the idea is to add a device browser to the Image Capture application. So you would go to the device menu in Image Capture, and then you could bring up a browser.

to select the non-HUD-pluggable device that you want to use. And ideally, and that's really dependent on requests, so if you would want to have something like a common panel, common UI for that, we were thinking about implementing that as well. That would just allow you, again, to do the device browsing and handle the connection.

So if we look at the ICA Load Device Module, you see we have the same structure of calls. It's just a parameter block and an optional callback proc. If that callback proc is nil, it's a synchronous call, otherwise asynchronous call. And then the ICA Load Device Module's parameter block has, beside the header, that's in all parameter blocks we have, it has a field that allows you to specify a device module. via URL. And we have an OS type that specifies the transport type. So that could be Bluetooth. It could be USB, FireWire, whatever. And then a set of parameters that are specific to the transport type.

Ideally, you wouldn't have to worry about the device module URL. So you could just pass in NULL, specify the transport type, and then specify a Bluetooth address that you get via any I/O Kit call. The unload device module is a lot simpler because all you have to do is really just pass in the ICA object that you want to unload. So let me show you very quickly how we do that in Image Capture. Can we please switch to Demo 5?

So I will launch again a small helper app that shows you really what's going on on the image capture side. It has entries for all image capture components, and we'll put up a check mark right next to the name whenever one of these components is running. So let me launch Image Capture.

You see Image Capture was launched. It's a checkbox, but also the very first entry, the Image Capture extension. Image Capture extension is running now, but we do not have any device module. Well, there's no device connected. So let me go to the device menu and we see we have a connect item in here.

And what will happen, and this is really not the final UI, but that's just a temporary UI to show you what direction we are going, is it will list, in this case, the Bluetooth devices that I have right next to this machine. And that's a PDA and a phone.

Down here, it shows you similar information that you will get from the regular Bluetooth control panel. So let me select the phone and let's do a connect. See what's happening? The phone was recognized and yeah, we do see the items on the phone and we can download those.

And by the way, a nice thing we added for... Actually, wrong one. For Tiger, whenever you want to specify a download folder, you can actually drag it up here and that should be a shortcut to the pop-up, so very easy to get to the download folder. And yeah, we could specify a preview to open it and then just try to download. And you see, here's the image that we have from the phone.

Okay, so that was the manual way of loading the device module. Let me quit Image Capture and show you another... A small application that will actually hopefully make it to the SDK. That's an application that allows you to just test all the Image Capture APIs. And you see on the left, you see all the Image Capture APIs we have. And you see down here, the new one, the ICA Unload Device Module. And up here, you see what it takes as parameter. There's only one input parameter.

That's the ICA object. And if we open the tree, so basically we see on the left side here, on the right side here, we see all the objects that we have on the device. We have up front the device list. We have the device itself. And then we have three images.

We could now select the device object, copy that, and just paste that in here, and then execute the command either synchronous or asynchronous way, and then... You see the device is gone and the device module as well. Image Capture Extension will quit in just a bit. Can we switch back to the slides, please?

So that was just a quick demo loading/unloading device modules and really just a very first preliminary UI for doing that. So if you look at the cell phone, if you want to really use it as a digital wallet, it's a really nice way to say, hey, that's my cat, that's my kid, that's my dog. And the cell phone, as I said, will probably have a way to take pictures or download pictures, but there's something missing.

So what about uploading? Well, again, wouldn't it be nice if you could upload to a device? And if you look at the API set that we have for Image Capture, we used to have a set property data. Well, the set property data, you would think that this is the right way to do the upload. Unfortunately, that's not just enough.

Because we just don't want to set the actual image data, we probably need a little bit more information. So we will have a new API to handle the uploading, device uploading. And it's, I say, upload file. And it's really just more than just a file copy. Well, it's a file copy for non-image files. But for image files, it also allows you to scale the image.

And the parameter block for this looks like that. We specify a parent object and a file, a .fsref, and a couple of flags. There's one thing about the parent object that's just a kind of suggested parent object. So really depending on the device, you may not be able to really upload to the specified folder.

So it's always good, I mean you could try it, it's always good to just upload to the device itself and then normally the device takes care of the positioning within the device file structure. So let me just give you a quick demo of the uploading. Switch back to demo five.

[Transcript missing]

And now you have two ways of doing the upload. One way is just drag it to the icon. Or whenever you are in the download mode, just-- Drag it in here. And then you get a sheet popping up that allows you to specify the option. Do you want to scale it? Yes or no?

Yeah, in this case we want to scale it. And you see, actually it suggests the size. It knows that this device has the screen size 120 by 160. But actually the device also supports as native size also 288 by 352. So you could choose either one. Let's stick with the first one and do the upload.

Now really depending on the speed of the device, you see the screen size is 120 by 160. You see we got the new entry in here. And it's not that big of a difference. It was scaled to 107 by 160, the image. The original image was larger. It was about this, and you see it was 429 by 640.

And again, show that using the ICA API test. The upload file, well, we just specify again a parent object. And in this case, all we want to do is even upload a text file. And if that works, then we should see--it takes a while. Theoretically, we should see it on the right side. It should pop up. We're waiting for that file. Too long. Okay, can we switch back to the slides, please?

Another area that was introduced, a new feature was introduced with Panther was the fast user switching. For Image Capture, there were really no problems if you had a scanner or a mass storage device connected. Mass storage device was very simple. Like all your FireWire hard drives, you would just expect that it would work if you use a A and switch to user B and then both users should have access to the device.

For a scanner, it was also not a problem because scanner, we from the beginning had an ICA scanner open session and closed session and there was only one client allowed for a session. So that simply means if you are user A, you are working with a scanner, you have a session open, you switch to user B, then there was no way that user B could get the scanner. If you are User A and you are done with your session, then User B will have immediately access to the device.

For cameras that are non-mass storage, so PDP cameras, We do not have the concept of a session. So what's happening is really User A uses the camera. You switch to User B. Now the question is, what should happen? User A or B have the camera. Well, instead of just unplugging the device and reconnecting the device, which actually would give User B access, we want to have a better solution. And what we're going to do is introduce a session handling for cameras.

So we will have an open and closed session. These calls are really similar to the ICA scanner open and close session. But they're different because they're optional. Well, we do not want to break existing apps. So the current iPhoto should just work. The current Image Capture app, FileMaker, and your app, they should just work.

And the other thing which is different is really the number of clients. Well, for cameras, there's no need to limit that down to a single client. So multiple clients can still use the device. So these optional calls are just to make fast user switching a lot easier, a lot safer.

Because then, for example, whenever the future iPhoto is done with importing, or you're not at the Import tab, Then there's really no need to pluck the device from a second user. So ICA Open Session just takes a device object and returns a session ID. ICA closed session just takes the session ID that you got on the open call.

These are the main changes we have API-wise for TIGER. So let's quickly look at the scanner support, the scanner side, before we go to some real camera-related demos. So on the scanner support, since the Jaguar, we were supporting native scanner modules and also Twain data sources. For Panther, we added support for transparency units and also added a very simple to use UI that allows you to do some image enhancement, either automatically or with a manual correction. And for Tiger, we are going to support something that was also requested by a lot of users, the support of Document Feeder. And let me just show you that on Demo 6.

So this is the standard Image Capture UI. If you have a scanner connected, you see in this case, we are using the Twain bridge, which means we are using a Twain data source. Using, in this case, the Epson data source that is just rerouted through Twain Bridge to Image Capture.

And since this device here supports as a twain capability, A document feeder. We will modify the UI and add this extra checkbox here. So you can select the document feeder, then the main view will change because when you are in the document feeder mode, you are probably not in this workflow where you do a pre-scan first, then select your scan rectangle, and then do the final scan. What you want to do in the document feeder mode is really scan the entire page and do it quickly and do it often.

So, again, this is probably not the final UI because the one thing that you could not select right now is, let's say, do you have an A4 or U.S. letter? U.S. letter or U.S. legal? So we need some way to allow the user to choose between different paper formats. Right now, in this demo, we will just scan the entire document. I hope that the scanner is working. Well, it was probably in a sleep. Hopefully, it wakes up.

Light is flashing and yeah, it is scanning the first page. So this is not a professional 24 pages or 60 pages per minute scanner. This is just a low-cost, easy to use, kind of home type scanner that you would use in order for scanning in multiple documents. See what's happening? The first page is scanned in the background. Preview is launched. And as soon as a page comes in now, it will be added.

to preview and yeah, each page will be displayed. Actually one nice thing you could do in preview, since these pages come in one at a time, what you can do in preview, you could set the preferences for images to say, well, create only one. If you don't want to create multiple windows, just only create a single window, and then all the pages that come in will be added to the same window. And then for example, in preview, you could do a print and print to PDF, so you have a PDF document of whatever you have.

Okay, so I guess it was done with the four pages. Also new, just one minor remark. In the past, we were just generating like a cryptical name based on scan and then date and time. In this case, you can really specify a prefix, and we will just add a page number to it. So I think it's a little bit user-friendlier than before. Okay, back to the slides, please.

[Transcript missing]

What do we have new for them? Well, first of all, thumbnail handling. We were introducing an API, ICA Copy Object Thumbnail, in Panther. And the initial support was for ICA thumbnail format. With TIGER, we are going to extend that and really allow you to download thumbnails directly in JPEG or TIFF format.

And for the developer of a device module, this means, well, not that much of a change, because you can directly give us the native thumbnail that you have. Or fall back to the old way, just return an ICA thumbnail. So if you are not going to change, that's fine, because we will do the conversion between ICA thumbnail to JPEG or TIFF. So the user will get whatever thumbnail format he wants.

One limitation we had in the past was with, I say, thumbnails, was really the 128 by 128 pixel limitation. And we want to loosen that a little bit and allow, really as described in this FEMA 15740, allow larger thumbnails and up to a maximum of 160 by 120.

One thing that we got a lot of questions about, because I guess the documentation was lacking a little bit on that, was what's up with the InfoP-List, Device InfoP-List? Well, for a device module that we have on Image Capture, we need, since it is a regular bundle, we need an Info.plist that has all the things that are required for an Info.plist, so bundle executable, icon name, and these things.

But we needed in the past also information that would allow us to do the matching of the hot plug event with this device module. And this information, this matching information, was kept in the Info.plist. We, in addition, needed a device Info.plist, which was in the resource path of the device module. And that device Info.plist was containing information about a cut-async profile or device icon, for example.

For Tiger, we are going to change that. We will have all that information just combined in a single plist. That's the device Info.plist. You still need the Infor.plist, but no Image Capture-related entry in there. And it will be the same location as the current device Info.plist and really combines the entries of two, plus some new entries.

We will, for example, specify in that Info.plist, we will specify the physical transport or media information. For example, does this device support JPEG, TIFF, movie, AVI, whatever format. And it sizes. And you already saw why we were going to do that. You saw on the upload, we had this pop-up that was giving us the size information.

So Image Capture uses really this for the upload panel and allows the user to choose the kind of native size. It has information about the screen size of devices or sizes that this device produces. And since we are also extending that to add the device capabilities, we will have the information, for example, is this device able to handle uploads of JPEGs? Is this device able to handle an erase or a take picture? Let me just show you the current format on number five.

So if we dig into the device modules, so they are in System, Library. Image Capture devices, and here we have a whole set of devices. So if we show the package content, go into the resources, here we have the device Info.plist. And now, this has some USB-related information.

and some device-related information. And now you can go in and look at all these entries, or instead of using the Property List Editor, we can also use a small tool that we are also going to add to the SDK, the device for the PList Viewer, and this allows you to just, like, double-click on, for example, the PTP module.

And what is happening is just collecting all that information. Actually, it looks up the icon and displays that as well, and you can pick whatever device you have. And go to the device and you see, okay, this one supports JPEG and these resolutions. This one supports download and erase. It does not support capture, for example.

And we see the protocol level. They are all PDP cameras. These are all the vendors. So see all main vendors from Canon, Kodak, Nikon have, and Sony, they have PDP cameras that we do support. And this is just a convenient tool to get to that information. And as I said, we are using these sizes heavily in the future. Okay, back to the slides, please.

And finally, I guess come to the main demo part of this session. And the idea was to just give you kind of ideas of new applications that you could do with Image Capture. First thing, the high dynamic range images. For those of you who attended session 207 on Wednesday, they learned everything about high dynamic images you probably will ever need to know. Luke and Gabriel were doing just a great job in introducing that.

And to just keep it very short, one of the problems you will have, and Gabriel was showing that, if you look at the Apple's Garage and take a picture of that scene, you see with the short exposure, you get pretty much good detail in the back. Everything in the front, you have no idea what's there. If you have too long exposure, you see all the cars. These are not BMWs, no.

[Transcript missing]

You can take multiple exposures and do a tone compression and you really get a result that's like that. So it's more or less the best of all words. Up to this point, it was kind of complicated to take these high dynamic range images. Because you had to go out, put your camera on a tripod, and then Take an image, change the exposure time, take an image, change exposure time, and do that a couple of times, and then run that through an algorithm.

Well, if you have a camera that allows you to do that automatically, You could just use image capture calls to really do a one-click high dynamic image, high dynamic range image capture. I really wish I could say that you could go out and take any camera and you could do it. Right now we have one.

I really hope that next year, more vendors will open the protocol, will really support more of these kind of things. So, the Canon D70, that's the one we are going to use now. And... Oh, sorry, Nikon. Ah, look at that. Okay, what I want to do is... Create a scene here for a camera that's kind of difficult to capture. We have a bright light, a box, there's something in the box, and the camera just pointing to that. So, let me turn the camera on.

[Transcript missing]

What we're going to do now is we take the first image. This is automatic exposure. And we see underneath the image is the exposure time. So what we do is we take out of the metadata, we display the exposure time. So basically verify that whatever we set the exposure time to before we do the take picture, these times are really in the image. We still have it here.

And now we take a couple of exposures and then... Take Gabriel's secret sauce and combine that. It's already done because the progress bar has stopped moving. And we will be able to open this. That's the current time. And we captured this scene. Now, if you look at this, It's actually pretty amazing because what we will see up here, we can actually see the inside of this desk lamp. But we can also see inside the box. We can actually make it a little bit brighter.

And yeah, we can really see what's inside the box and also see what's inside the lamp. Nice thing, these are really floating point images. So the information is really not frozen to this, whatever you see now, but we always have the slider to go either into the dark or in the lighter side.

So, to just give you an idea how these images look without this processing. Well, that's what you normally see. As soon as you have the tigers visible-- see, really, tiger is coming heavily-- Yeah, you can see the tigers, but you have no detail in the desk lamp. Or if you have the detail in the desk lamp, the tiger is gone. So this is a pretty nice way to capture high dynamic range images and really it's simple to do. The only thing you have to do is click one button and it happens automatically.

So the next demo is TimeLapse. And here, I guess I have to thank all of you, because you were all really contributing to that demo, without knowing, though. So let me just show you what we did yesterday.

[Transcript missing]

Okay, so that was the... So how did we do that?

Well, a small application sitting on top of Image Capture that just has two sliders, very easy to use. You can specify the time that you want to capture, in this case, whatever, 12 hours, and how far do you want to compress it. So if you want to play 12 hours within 60 seconds, you take this setting.

Otherwise, yeah, you can change it, like, take a whole day, 24 hours, and you want to play it back in, like, two minutes. And then just hit start and it automatically takes the images, creates the movie, and two hours later you're done. Or 24 hours later you're done.

Okay, if we have a little bit, we still have time? Yeah. So, last thing I want to show you is... Just for fun, a small application that actually allows you to take 3D images. And the idea is very simple. You have a camera that flips. And so basically the distance of the eye and we flip the camera.

And just doing that, we could... We would mount it on a tripod. I just don't want to do that because we already have those images captured. Connect the device. And then... We're going to bring up a UI that will just look at the images on the device. And the nice thing, since there's a sensor in the device, really showing whether you rotate it this way or that way, we can pre-sort the images and actually pre-sort them in a way that whenever I change one image here, you see the one on the right side changes as well because they are kind of grouped by date. So we know it's a left and right side.

And then we can very easily group it. And yeah, just select whatever you want. And then you could say process. And then what should happen is... What should happen is it will display it like that and you can do some fine adjustment and using one of the red/green classes you can look at it and you have your 3D image. Also very simple to do and without any extra effort. Okay, go back to the slides. We see we did the demo and we are now up to Q&A.