Fundamentals of Digital Video - WWDC 2007

Content and Media • 1:03:19

As the world of digital video changes, the key to staying ahead of the curve is making your content look its best regardless of where it is viewed. Gain a comprehensive overview of the current digital media landscape including tape formats from SD to HD, color space, aspect ratios, audio and video codecs, and much more. This is a must-attend session for anyone creating rich media.

Speakers: Ben Durbin, Kenji Kato, Craig Syverson

Unlisted on Apple Developer site

Check out Bezel, our iPhone mirroring app →

Transcript

This transcript has potential transcription errors. We are working on an improved version.

Good morning, Ladies and Gentlemen. And welcome to Session 600. This is the fundamentals of digital video session. And today, we're going to be joined by three very special guests. The first presenter is going to be Craig Syverson from gruntmedia. And the next two presenters are from Pixel Corps.

We have Ben Durbin and Kenji Kato. And these guys are going to give you a pretty good overview of the entire video production process from the very beginning all the way through, you know, actually getting your video on the web and then all those, you know, camera formats and everything you need to understand about making great video that looks, you know, even better when compressed. So with that said, I'm going to hand it over to Craig.

Thank you very much.
Steven Tuttle, Ladies and Gentlemen, from Apple Computer.
He can't introduce himself, so I'll introduce him for him.

Thanks a lot for coming. We're going to do a survey of digital video. Very kind of a big broad overview. So we're not going to be really technical here. This is not the technical session. This is the 9:00 warm-up, oh my gosh, we've got to start this whole video thing. So I'm going to go through a presentation that I've made that addresses people just getting started in podcasting, wanting to get a sense of what does it take to create a video podcast. And I'm using my own video podcast as the model for that.

So that would be us. And the show that I'm-- my company is gruntmedia. And my first show that I've done is called videogrunt. And this is a shot of my logo, which was up on the QuickTime content guide last summer. I don't mean to brag. Okay, I'm going to brag. I was the first independent podcaster put on the content guide. So thanks to the content guides for that. That was really great. My show is about the basics of digital video. It's a digicational show. I teach people, you know, what's an aspect ratio, what's a frame rate.

It's not a how-to. It's more of a what-is. But moving on, the thing I'm going to talk about right now is I want to give people a 30,000-foot overview of what it takes to get a video podcast from 0 to 60. But I'm only going to go three inches deep. So we're just going to give you a big picture.

And that big picture is kind of broken up into these four different groups of these twelve different points. Starting in the green area, you know, it's getting the show together, it's thinking of what the show is going to be like all the way up to the top part about promo and monetization, which none of us know anything about because of all the millions of pico dollars that I've been making off of podcasts. It's going to happen soon.

So let me just start right ahead here talking about the first section. This green section is about really, you know, when you want to start a show, it's really important, I think, and I've found in preparing my shows, is to really think about what you want to do. Obviously you'll want to figure out what you want to have done at the end, and what's the feeling?

That's kind of the main point of this thing is what's the vibe? You know, do you want to do a comedy, do you want to do a round table discussion, do you want to do a news show? And it's really important, I think, at the beginning to kind of feel what it is you're trying to say.

And you'll want to get a sense of what people are going to feel like when they see it. If you can nail that at the beginning, it will really help you down the line. A lot of people start off with one idea and they kind of, you know, go to comedy, and then they realize they're not funny, so it goes somewhere else. And it can be a real mess.

And another thing to think about is how it's going to be experienced. This is where the whole podcasting and online media is slightly different in that the actual conditions by which your audience are viewing your media are different than the old days, different than, and by the old days, I mean those of us who have been in video production.

We're sort of used to having people watch our stuff on screens in a reasonably controlled room or a projector. And with online and podcasting, they're in a totally different space now. They're on the road, they have a lot more distractions. So in thinking of the show, it's really important to really think about that user experience down the line, right?

Indeed.
Just say yes, you're great.
Yes, you're great.
You're great.
Thank you. And so next in this green section then is think strategically. And this is part of what I talked with for corporate people is, you know, how does this podcast fit in overall with your company strategy, realizing that this media edition is sometimes just an element. Don't try to make-
if it's part of a corporate thing, don't try to make this show the entire conveyance of information. Think of it as a supplement to a lot of other things.

And use media in its proper way, sort of in the proper macluinistic way of, you know, what can it accomplish best, what can moving media accomplish best, which is more about feeling and experience more than part data. And think about how to scale it. This is a problem a lot of Web 2.0 companies have, of course, is how to scale.

But even for, you know, small individuals like me who are starting off, you've got to be prepared for success. You've got to really understand that if it does take off, you've got to be able to grow your bandwidth, you've got to, perhaps, grow your production line or your production flow, enhance your website. What's that?

Enhancing your website, things like that.
So one observation I've made in looking at just symbolically representing radio or audio on the top and video on the bottom or traditional media, just in being involved in the podcasting world for the past couple of years, just the observations that I've made. And I also am involved in audio podcast as well. I do a weekly show with Kenji and two others called This Week in Media. And I do a fort nightly show on venture capital called Venture Cast.

So I have my hand in both audio and video production. And things I've observed with online media and podcasting, in particular with audio, is that compared to radio and other traditional sort of audio media formats, with podcasting, we can go longer. We don't have these time constraints, we don't have these commercial marks that we have to hit.

And do you know, people download stuff, they tend to be more, you know, focused and specialized, so they're willing to deal with our hour and a half, two hour jawing on of things. And I actually love listening to it. I actually like things to be longer. Especially for me in listening to audio podcast in the car with my iPod, I don't want to be switching programs a lot while driving. So I like these one-hour programs. I just hit the button once and go.

And the other thing about the audio, at least in podcasting in general, is culturally, it's more candid, it's more casual, certainly in a corporate setting with the corporate podcast, you tend to get a more humanistic vibe out of the CEO rather than just the straight-up, you know, we're doing great kind of comments.

On video, what I've noticed is that for online, it's shorter. There's a lot of good reasons for it to be shorter. You know, bandwidth and all that notwithstanding. Bandwidth and that will be a problem that will go away very soon. So it's not really-- that's really not the reason.

The reason is that, again, this context of viewing, shorter attention spans, and it's really, I think, given rebirth to this idea of the short-form video. It's the last time-- the only time we've really had that historically has been on cartoons, things that are, you know, contained programs that are five minutes or so.

But I'm finding that with video, there's a really great spot for short length programs in the non fiction space. It's really great because it's really easy to do that. And I'm finding that as a barrier, not as a barrier, but as an entry point, non fiction for me is a lot easier.

Now, I think those people who are doing fiction, who are doing comedy, they're very brave. I think they're awesome like Ask a Ninja and those guys because they're really talented. So you have to kind of, again, this goes back to, you know, what's your vibe, what are you trying to accomplish? And really think about how you can optimize your talents with what you've got available to you.

And video programs are simpler. We've got smaller screens. The whole composition aspect of designing a screen for a video podcast versus a presentation is a big, big different for me. It's a completely different way of looking at it. So I'm finding that these-- the whole layout and the whole way that you're presenting information can be a lot simpler, right?

Indeed.
Yeah. Format.

Basically, this is just, once you're designing your show, you're thinking about what you want to do. Then build your format. And the format is sort of the professional edge. You know, it's the professional touch, it's the consistency. And, you know, can you have your format tie in with your concept?

It's a subtle point, but it really somehow-- sometimes the way that your program flows actually enhances the concept that you're trying to give off. And it really helps you tighten up your script writing or tighten up the concept of your show, if you give it constraints because constraints give you a lot of power.

And so this is just a very quick sort of survey of the shows that I've done in the grunt series. I took the average length of a show and sort of figured out how much time I was devoting to each section. So I just did this little graph for someone who's helping me with script writing. And I was saying, you know, the sections of the show, this is roughly how long they are.

And this whole circle represents about five minutes. And starting at the top, I have a-- in that orange section in the top, I have a fake commercial on videogrunt. It's a satirical non-commercial-- I won't go into that. Then I have an intro bit, and then I'm on camera for a little bit introducing the subject. Then that blue section down there, I do a lot of motion graphics or I'm, you know, into the meat of the subject.

I come back on camera with an exception or as a reminder that I'm there. Again, more motion graphics. I come back at a close, closing tag, and we're out. And so that's kind of the format. To me, this was the culmination of really the six or seven months I took in just thinking about this green section. It really took me the most time to get that ready.

So now we're moving up into production. And this is sort of the, you know, this is the heart of where we come in here with Apple and actually making the podcast. Kenji and Ben are going to go into a little more detail on the guts. So I'm going to go through this rather quickly. But for me, for the kind of show that I do, my most important thing is having a good writer.

That it's all about the script, in my case. Now, obviously certain shows aren't scripted. But in mine, it is. So that's my most important asset, is to have a good writer and a good host. It happens to be me, but that's just because they can't find anyone else. And you really have to pay attention to copyrights and permissions. It's a whole other, you know, week-long session about that.

And this notion that people are beginning don't really understand. And that is that pictures are easy and sound is hard. And, you know, video cameras are really sexy and they're really, you know, new technology about imaging, but really the hard part is the sound because if you're not really paying attention to sound, you get these crappy mics on these cameras, so my whole emphasis for people who are getting started is buy the camera last. Worry about your script, worry about your sound, buy your mics and practice on the sound first. Then go with the video part.

So very, very quickly, one way that I'm, you know, suggesting people go is to start with at least HDV for cameras. There's no reason to not do HDV or high definition if you're starting out. It's very important, I think, that you do, that, you know, people are always saying, well it's always so small at the end. But those of us involved, you know, you always want it as high as possible. And then at the last minute, you can cramp it down to whatever size you need. And these guys are going to go into a little bit more about video formats and the like.

For audio gear, I think it's super important to have a lavaliere mic. These are two that I've used in the past. I currently use the Sennheiser. These are under $1,000. Don't spend less than $500 on a wireless mic because it just won't work. Or it will work for a week, and then it won't work at all.

And it's just really, really critical. You could spend up to 10 grand for wireless stuff, but you can start here around $5,000. And then I suggest if you actually do want to go further, these are, I call them Kung Fu mics, but the countryman doesn't call them that.

But this is a great microphone company, that the lavaliere, this little B6, the diameter of that mic is smaller than the diameter of the wire of my current wireless. I mean, incredibly small, incredibly well made, and they sound great. And so obviously the smallness means that they're hidden on the video camera. And that's a good thing.

Other mic types to use, there's a shotgun mic, this guy here. This is a Heil PR40. This is more for voice podcasting. This is a handheld mic, which is good for field work. And I put this one up here, the camera mounted mic, only because if you have to do a camera mounted mic, which is like the last thing you want to do, but if you do it, Rode makes a pretty nice one, from what I hear. I don't use it, but it's got a shock mat, which is this guy here, which absorbs some of the s ound.

And they really thought about, if you have to have a camera mounted mic, How it would be a pretty good one for a reasonable price. I think they're like 150 bucks or something. Guys, do you know?

Sounds about right.
Yeah, so, you know, if you can help it, don't use the camera mounted mic, is my point.

Okay, so the mic production process, our most important tool is a word processor. Like going back to the whole writing thing. And so, you know, I write the script, and what I'll do is I'll actually record what I feel is the first draft, I'll actually record into my digital recorder so I can hear it. It's all about the auditory things. It's not about, you know, looking and hearing are two totally different things. Lots of rewrites with that.

Lots of changes and, you know, weird onomatopoeias or things like that you don't notice when you're writing. I'll flow them to soundtrack and actually maybe even do some sound editing to help me edit the script. And so that creates the cycle of, you know, until I get the thing that I want, which is the script, that's sort of the key.

And using the script, and I will record myself using a little HDV camera by Sony, and like I've been talking about, I don't use the onboard mic, I use a shotgun mic. And then I tie that, in the old days, when I first did the podcast, I tied it straight into my iMac G5 and recorded straight to there.

And the whole first series of videogrunt was done with an iMac G5, which I think is a huge testament, again, to Apple that I could produce a high definition podcast with a quote consumer computer. And it was truly amazing that I did that. It worked really, really well. Step footage goes into Final Cut, and there I've captured my on-camera stuff and I've captured the voice. I've basically captured a script into Final Cut.

So now I've recorded. Let's go on to post production. What you need is a good editor. And I don't mean the actual software, I mean the person, someone who understands cadence, understands timing, and, you know, it's that whole je ne sais quoi of how you hook up things together. We've got to do sound sweetening because, again, harping on sound.

Your motion graphics, thinking of the screen, thinking of the small composition. And then the part that is really not sexy, but it's about workflow and backup, and that's the part I usually talk about because no one else likes to talk about that. A little bit on the workflow, so if I consider my scripts get translated in the Final Cut, then in terms of the pictures, everything I do goes into motion and comes back in Final Cut. So Final Cut is my entry point and my assembly point. I actually don't do a whole lot of cutting or special effects or anything in Final Cut. It's all pretty done in motion.

So a very quick example of this is my on-camera stuff, the shot for a Final Cut, throw it in motion, and then I do that in motion too, make the black and white, add the tag, move me a little bit. And then that will go right back into Final Cut onto the timeline.

And then for other elements of the show, be they from older versions of Photoshop Illustrator, or I show you, which is that blue icon there for the screen capture. Everything that I get goes into motion. And in motion, I set the right screen size and do all I want to do and then send those back to Final Cut.

So this is my Final Cut timeline. Here's a close-up. Again, everything is driven by the script and the sound, then the voice recording is the script. So, you know, that is my timing, that's my metronome, that's where all the important cuts are based on. And so on this particular timeline, how I do it is on these lower rows, that's all the stuff that moves. This is all the things that move that are from motion. And then the parts that don't move I can create still-frames in Final Cut.

So I'm only doing the rendering of the parts that are actually moving. And then Final Cut does a great job with still frames. So then I can just drag those along to meet the timing of the voiceover. So that's how I can kind of get a pretty efficient workflow out of this.

So then when we're done with the pictures, then we go into sound and do a little sound sweetening. This is a quick little illustration using the early version of soundtrack of my voice as recorded. And then I have one that's just slightly modified. If the sound is working, we'll hear it right now.

This is program five, videogrunt. I'm Craig Syverson. Okay, we've all seen what happens to a wide screen film when it gets modified for television using pan and scan. And as many of you, the days of pan and scan-
this is program five of Video Grunt. I'm Craig Severson.

Okay, we've all seen what happens to a wide screen film when it gets modified for television using pan and scan. And as many of you, the days of pan and scan--

So the differences there are pretty subtle, pretty subtle, pretty minor, but important. I use, you know, sound compression to sort of bring the presence up, some noise reduction for the noise in the room, very simple sound sweetening, but that's just that little bit really helps the whole understanding of the show.

Then this is kind of the workflow idea. And if this green box sort of represents the amount of data that I'm creating in production, arbitrary walk of data, be it from tape or, you know, from hard drive recording, when I get into post production, that amount of data really gets big.

And then when I'm actually getting down to my final edit, the amount of that data that that final edit is, it representatively is smaller. And then, you know, my actual final output, of course, is a very, very small little file. But the point that I'm making here is that for people getting started, especially beginners, really understand that your storage needs are going to go through the roof. And it really gets big, it really swells really fast. So be prepared with, you know, hard drives.

Buy a lot of them. Be prepared. And then the other thing, of course, is that you might have a lot of hard drives, but you've got to back everything up because, you know, this time machine is telling us, but as I've always said, you know, one copy of data is just charming. But it's not-- to me, I don't have data unless I have at least two copies because if, you know, we all know it can go away in a flash. So in production, be really prepared to get really smart about large scale storage.

You guys deal with a lot of storage, lots. Yeah, you guys-- this is like amateur hour compared to what you guys are up to. Moving onto the whole file compression thing, the best I can say is that it's a dark art, that it's something that really requires practice. I can't give you a formula. Every show is different.

And it's just hard, it's just hard. And the thing with video is that there's different flavors as well out there on the web. You can go with Flash or Windows Media Player, sorry, or QuickTime. But you kind of have to make a decision, I feel, I'm telling people, you know, you either offer everything one is kind of my approach. It's like you can go down that path.

And I thought well, I'm doing a show about digital video, I should offer them all. And then I realize that's crazy. I'm just going to do H264 because it's the best. And even if you find settings that work for one show, it doesn't mean your next episode of a show or program is going to work with those same settings. To optimize your video, indeed.

So, again, here's this idea of all this data. In my particular flow, I shot everything HDV. I keep it HDV all the way to the end. This is going to probably change. I'm working now with Final Cut 6 and strongly considering that I'm going to convert everything to Progress 422, even though in my particular case, that will mean a larger amount of data. But there's so many advantages to having this central point of the central hub of a standard and Progress 422 really looks "promising that it can be that.

But the point I'm making here is keep the resolution high all the way to the end. And then, you know, what I have is the final output file off of the timeline. You know, I export out of QuickTime movie of that complete program. So that's the stem cell of this program. That's the thing that's going to, from there, then I will throw it into a compression software from which I can create these other formats.

Of course I have no interest in the other two, so, you know, for me, it's all about how do I make a really good H264 out of it? And used Compressor to do that. And in the old days, like before Compressor 3, 2, before the new Compressor, they didn't have really good presets or settings for the iPod, oddly enough. So I actually used this little piece of shareware, great piece of shareware with a somewhat unfortunate name called Viddyup.

But it's awesome. This plasma software makes it $10. And it does really, really great encoding. It has custom settings where you can go in and monkey with it as well as having presets. You can do batch processing, which I discovered completely by accident that it could do that. And for a lot of people, too, it didn't require QuickTime Pro to do a lot of QuickTime Pro like things.

So I recommended this a lot. I do recommend this a lot, especially to, you know, start-up podcasters who might not have Compressor, but I've always, always pushed them to get the Final Cut Suite. And then these are the particular settings, which you can't see. It doesn't matter because they're not right.

These are the settings from my show, but it won't work for you. In my particular show, the way I've shot it, the way I've-- the elements that I have are very efficient for online. I go on black and white, but that's not really the point. The point is that I'm not on camera much.

I use a lot of motion graphics. And motion graphics, by definition, are very efficient in being compressed because they're very predictable and they don't, you know, it's not like a tree with moving leaves where you have to recalculate all those frames. So I could do a really low data rate. And my files are really small. And they look really good. But that's because I planned it that way from the beginning. So, again, it's the dark art.

So now we've done our show, we've compressed it, we have this great file. Now we want to get it out in the world. What's the next step? This is the hard part of the blue part of getting your feed ready. And then this section identity is basically just a rant on my part about naming. But it's really important. And I don't know if Pete Alcorn is here. He's the iTunes directory.

He rants about this too, which makes me feel good, in that you have all this opportunity with all this metadata with the file itself, the ID 3 tags, the information in your RSSV itself to really optimize the O in the SCO, having people find your show. And so-- and to me, it's just asthetically better to be consistent in your naming because if these things show up in a list over time, so you really have to consider that, sorry, I clicked on that accidentally, each show has its-- each file has its own name, each RSS feed might actually change the name of the show.

You know, be consistent with the spacing. I mean, I don't have to tell a lot of you this, but this is, again, this is my personal rant. Just be aware that, you know, we all know how things get alphabetized in computers. And just, you know, be consistent. And I rant about this, and then just last week, I misspelled David's name on a show. So there you go. What can you do? I'm not going to go through the entire section. I'm going to go up to feed.

Basically, this is, after you've got your thing ready, how do you get your RSS feed out there? And you have to build the feed file and find a good host, maintain the integrity of the feed and perhaps add some extra information to the feed. And I use a really great little program called Podcast Maker, $30, it does a very wonderful job of creating the whole RSS feed for my show. And these are sort of the two screens that I did just to set it up. And then the rubber hits the road when I actually go up and post the file to my server.

So in this diagram, this is me, and I've just created a show. That's the QuickTime movie. And I used Podcast Maker. What I do is I send that up to my host, which is Cache Fly. Cache Fly is a content delivery network that's very, very fast. And Podcast Maker creates the RSS feed itself and the media file sits on Cache Fly. The next thing I do, or I did it initially, was get an account with the FeedBurner. And I told FeedBurner point to this RSS file at Cache Fly.

And why? Because what that does is this gives me this really nice address of, you know, where people can find an RSS feed. And I'll talk a little bit about why I like that in a second. The next thing I did is I contacted the iTunes directory, and I said here's my new show, it's at FeedBurner. So, you know, go there to get my feed.

Then the audience eventually hears about my show, they go to the podcast directory, which then bounces them to FeedBurner, bounces them to Cache Fly. Cache Fly then sends my audience the file directly. So that's the whole feedback loop of the RSS feed in kind of diagramatic form. After this initial subscription, my audience now has a direct relationship with FeedBurner, as you probably know. The iTunes directory is out of the loop. They're just there to set up the initial thing. They don't continue that monitoring of the flow, you know, the programs don't ever go through Apple itself. They just pointed my audience to my feed.

The reason I like FeedBurner is that for whatever reason, but I can't imagine, but if I would ever leave Cache Fly and change to another service, this means my feed changes. And feeds are very, very fussy little irritating things. They get really upset if things change. And so you could blow a feed. You can have a feed break or weird things happen. And it happens all the time.

And then you've got everybody redownloading all of your shows again or weird things happen. It's really easy to make things go wrong on an RSS feed. So what I like about it is that if I-- should I ever change where I'm actually hosting my files themselves, as long as I tell FeedBurner that I'm at a new place, my audience both new and existing, are still pointing to the FeedBurner address, which stays consistent. So I have a less likely chance of things going south from there.

I think I'll-- that's kind of the main part. And then we'll go on to your bits.

Sounds great.
We're going to take questions at the end. We're kind of blowing through this now. In fact, I need to blow through this now to get to your slides.

Basically, I build my website in iWeb, goes up on Media Temple, and then people get-- the website, the webpage is from Media Temple. I get the actual media file from my webpage from Cache Fly. It's the same piece of data. Do you guys want to enter this while I flip through, what you're going to talk about?

Yeah.
My name is Ben Durbin. I work with Kenji at the Pixel Corp. And the Pixel Corp is a guild for people doing media development and post production work, whether that be visual effects or video podcasting. We're doing research and training in a bunch of different areas.

And one of the things that we do is podcasts. Mac Break is one of our podcasts. We have a couple at N stable, some audio podcasts that were mentioned this week in media, Mac Break Weekly and some others along those lines.

Our new N Mac Break Tech that we're on.
Right, Mac Break Tech.

So one of the things that we're going to talk about is just build on Craig's presentation and talk a little bit about the things you'll need to consider once you're going to scale a given project. A lot of projects do start out single person, sort of a one-man band sort of thing.

And there's all sorts of considerations you'll need to keep in mind when you start to grow that to something larger than that. And so welcome this morning to Session 600 on file naming and organization. I'm going to follow up on Craig's rant. It's something that bears repeating. As developers, you're probably already sort of in this sort of structured mindset with how you organize your files and how your naming conventions are taken care of. But especially when you start to work with media folks who maybe aren't thinking in that way, and just in general, when you're trying to scale a project where you have multiple people coming in to work on it, you're going to need to keep these things in mind.

So one way or one way to sell a naming conventions and file organization for people who may be somewhat resistant to it is it's sort of the precursor for growth. It becomes very hard to scale beyond just a few people. And I would argue that six months from now, you might forget exactly what you were doing naming something, whatever you were naming six months ago.

That's never happened to me.
Right. Take advantage of the fact that-
or bear in mind that your memory is bad and it's going to get worse. So taking care of these things early is going to save you in the long run.

Now, what some people end up doing or one of the things that I've found is something that people punt is I can't organize this stuff until I've got a clear idea of what everything is going to be, what all the assets are and what everything is-- where everything is going to live in this space at the given project. And what I would argue is that even if your names are somewhat arbitrary to begin with, as long as they're consistent as developers, you know that you can easily transform that to whatever scheme you end up with using in the long term.

So as long as you're encouraging the users to come up with something that is readable, I would argue against short codes and numeric-based systems. You can use those as you get to a larger system or if you've got a metadata system that is sort of reading those codes on behalf of the user and just presenting them with user friendly information, but start out with something that's just readable and consistent. Use full words.

Use a consistent delimiter to take care of fields in your file names. And know that file names are a sort of medium quality solution at best. They provide you a single view. You're sort of baking in an idea of what the fields are going to be and what their hierarchical relationship is and which view you want of that hierarchy at a given point. But know that that's going to fail you at some point and know that you're probably going to need to move to some sort of metadata solution.

And consistency is the big thing now.
Consistency is a big thing.
Make sure you stay consistent.
Exactly.

So as you move on to something a little more robust, you can move to metadata formats that are either text description files, they could be XML data that's read by some sort of user system. It could be something as simple as structured spotlight comments. But you will definitely find at some point you're going to need to move on to something that is beyond just the file name. So a general trend, just start simple, readable, consistent, knowing that you can transform that later and you can enhance that by adding metadata through some sort of metadata system.

So one other thing that you'll run into, and this is actually a boon to those of us who are trying to scale this to more of a couple folks working on it or working in distributive fashion. We have folks working on our podcast that are all over the world. And one of the things that we're starting to work on now is leveraging the XML file formats, both in motion and Final Cut, to start some automation.

Now, the last slide was really just a way of getting everyone on board with the file name because that's one of the things that enables this sort of automation down the road. We do a lot of blue code in python. It's one of our preferred languages. And the nice thing about Final Cut Is that you are able to do programatic transformations to file.

There's an actual specification, an XML specification for Final Cut. And the way this works is that you're able to export an XML file from Final Cut. The new version has even added an enhancement where instead of having the user have to manually export the XML file, you can set an option where the XML file gets baked out at every save.

So when you're ready to go in with your scripts and make transformations to the file, you can do that without having user interaction there. A couple of things that you can do with that, you can validate projects. So if there aren't any conventions or structures within the file itself with the way that you're organizing your bins, the way footage should be named, even the way lengths of different section, time between lower thirds, if it's a lower third heavy or an information heavy podcast. You can validate those things programatically rather than having someone have to open up the file and sort of do a manual. Oh, you can do this right here. You can do this right here. That can push change requests back to the editor.

You can change the project state. We do a lot of offlining for our podcasts, especially some of these commercial ones that we're starting to work on Cocktails on the Fly.

And the reason offlining is a big thing is that it's just too heavy, even with fast bin to move large data sets around, especially when you're working with HD.

So we do a lot of offlining and sending out edits and so forth because a lot of our editors are not here in the bay area, where we're actually based. So we're sending out edits all around the world and basically having editors work on low res versions and sending it back. And being able to manage that workflow can be tricky sometimes. But as you maintain some good practices, it helps.

Right. So changing the project state, you get an edit back, an offline edit that someone has done, and you need to reconnect that to all of the large format media so you can do your final render and start the compression process locally. And you can do those sorts of transformations, changing sequence settings, reconnecting to media, you can do that programatically using the XML interface.

And then assembly of per episode content. And we're not doing this quite as much as we used to with Mac Rate, but in the beginning we were doing a fairly lower third heavy sort of format, where we would add a bunch of information beyond just introducing the guests, we would add a bunch of information and click the links to podcast. And that was done in a distributive fashion as well, where you would have people reviewing the edit, and they were sort of tooling around looking for good pages to link to or good data to enhance what was being discussed in the podcast.

And then they were making time code notes, so we would have a fairly simple database where you have a bunch of lower thirds with what the title is going to be, what the URL is going to be, or on the display URL, and then actually where all the people would click on.

You can add that programatically using XML as well so you could have a track where lower thirds that were being baked out of motion or being added to the timeline at the time stamps that were designated by the folks who were creating them. So another third party that's working remotely on this, and then that data flows back into your project.

And in general, these examples just show how you can start to work in a distributive fashion, which is becoming increasingly important. Not everyone is local to your project. And this is one of the technologies that enables that. Do you have any data on this photo? So what we're going to do for the rest of the presentation is talk about some workflow considerations and technical considerations that you'll run into. This being an intro, we're going to sort of just tell you about all of the issues. These are categories of things that you will run into when you start to work with video, whether you're a consumer of someone else's video or you're creating it yourself.

And then this is sort of going to build for you a foundation that you can take with you to the other more detailed sessions that dive into all of these topics in depth.

And this is truly just a primer we're going to go through. We're going to go through this fast.

Just because we want to get to questions you guys might have. But there's a lot of stuff up here. We could spend days on all the information up here. And we have. We had a conference not too long ago called the Gear Media Tech Conference where we spent basically three days doing nothing but what you see up here pretty much and talking about podcast production. So there's a lot of stuff involved here. And we're going to really just glance over this really quick.

So moving on, the things that we're going to talk about, we're talking about the acquisition formats, what's coming out of the cameras, what you should expect to be handed to you at some time for inclusion in a project. Frame rates, what are the common frame rates that you'll end up dealing with.

Dealing with pulldown where 24P material, 24 frames per second material, is imbedded in a stream that is not 24 frames a second and having to get that data out. We'll talk about that. Resolutions, some common resolutions that you'll deal with, and the aspect ratios. And then the peculiar monster pixel aspect ratio, which is one of the things you'll also run into.

A little discussion on color space, the different color spaces and how video space deals, or excuse me, is differentiated from the traditional RGB color space that many of us are used to. And then a little bit about codecs. Codecs you'll expect to be using or finding useful when you go to compress your content, whether that be intermediate stuff like we were talking about for the offline edits or final renders.

And then delivery formats, the sort of packages, the containers like QuickTime that you'll expect to run into.

And a real quick word on this. If you don't think this stuff is important or you think, oh, well, I'm just worried about the final delivery format, things like that, even if you're only in a small portion of this, you have to worry about all of these steps in essence that as a maybe a web developer, if I'm putting my webpage up and the video looks crappy, why does it look crappy up there when it looked great at the acquisition? And some of these things come into play.

So if you understand the basis of where all of this is coming together, how it fits together, you can understand why your video that you're putting up on the web might look totally bad compared to the original source of material that you've acquired early on using maybe a really high-end camera.

Because every one of these things affects the end.
Yeah.
Every one of them.
So let's talk a little bit about acquisition. Kenji, do you want to talk a little bit?
Yeah. So up here, basically, is just a quick rundown of different acquisition formats. So you have DV NTSC, you've got DV PAL, you've got DV 2 offline RT. This is just a sampling of some of the ones you're going to find inside Final Cut. It's a small sampling, but these are some of the ones that you're going to encounter more often than others.

And what we mean by acquisition formats is what is the camera acquiring in the first place? Is it a DV camera, standard, you know, I go down over to the Sony store, for example, and buy a standard DV camera. Or is it an HD camera? Well, if it's an HD camera, what format is it recording into? For example, HD has several formats. There's DVC Pro HD, there's HDV, there's these different codecs that you have to worry about.

Up until just recently, if you were just doing DV video, standard 720 by 480 or 640 by 480, depending on how you want to look at it, video, you pretty much had to worry about one format, and that was DV, either NTC or PAL, whether it was European or North American standard, basically. Beyond that, you were basically dealing in the pro realm. Now that's changed, especially with the consumer level HD cameras. And even if you get it in HDV format, HDV is not a true format in the sense that everyone plays the same game with that format.

Every camera manufacturer has implemented HDV differently. So you can't just apply one rule set to HDV for every camera format. So this is where it starts getting really tricky. Now, one of the advantages to the new version of Final Cut that just came out is it starts to handle these formats a lot easier than the old versions did.

In fact, any other application did. Up until just recently, it's been a major headache. I literally would spend a week at a time, if we had three cameras on the shoot, conforming the formats, the right formats, getting the footage into a standard format that I could use for editing purposes.

About HDV is that we came across on a shoot that I did is I brought my Sony camera one day for a shoot, and I left, and then they took my tape and they couldn't play it back on the--
Yeah, a different camera.
HDV same quote format. But it's--
Yeah, it gets interesting.

So beyond the acquisition formats, you want to move on into basically frame rates. Now, what I mean by frame rates is how you're recording it. Most of you are probably familiar with the frame rates that are up here. But, again, there can be some differences here. And some of you might not understand the differences between progressive and interlaced and the way systems are capturing it.

I'm here to tell you right now that unless you're using a very special camera, everything you're capturing on any consumer level or even pro consumer level camera is actually interlaced footage, even if it says it's progressive scan, you have to play some tricks to get it to an actual true progressive frame format.

So even though we say 23.98p up there or we say 24p, these are probably going to be interlaced formats inside the camera as you're acquiring it. And basically it's just a series of different frame rates you're going to see out there. A lot of video professionals are really excited about 24p because of different reasons. One of the biggest reasons for delivery is that actually the nice thing is that it makes a smaller file in the end.

So it might be a little more headache up front to deal with 24p or 24p acquisition, but in the end, you have a smaller data file and it actually works very nicely in Apple PD in a way. 24p footage is going to work a lot better because you can get more data into the Apple TV than a true 30p or a 60i program basically.

One other reason you'll find folks excited about the 24p is that they're doing narrative or dramatic content. Getting the film look has been one of those holy grails that everyone has wanted for a video, and there are various tricks and successes and failures along those lines. But getting that 24 frame look, the kinetic look of it anyway, is something that people are after. And you'll notice that even some of the pro consumer cameras I guess you'd call it, the new Canon HV20 that just came out is actually using a progressive CMOS chip.

And it will do a 24 frame mode$. Now, like Kenji said, what's coming out of your camera, since it's HDV, is still an interlaced-- it's an interlaced format, 60 frames a second, and we'll get to this in just a second, but there's a method by which you extract that 24 frame materi$al. But you can get, even now you're starting to get progressive material from some of the lower end cameras. Until recently, it's been the domain of higher end, the Sony CineAltas and these more money than we have cameras.

So pull bend, which Ben was just alluding to. So getting from 30 to 24. Now, even if the camera says it's acquiring at 24p or some wei$rd 24, it's not. Again, it's requiring in something else. So this is just a quick rundown, a little geek bit talking about this.

And the reason I want to bring this up is that a lot of these new cameras that people are shooting with, HV20, which is probably one of the most popular new entry level HD cameras, it's a Canon camera, it's a great camera. You can actually hook up. There's adapter kits to hook 35 millimeter lenses to it. You can get all these great attachments for it.

On HV20? That's cool.
But the problem is going from the 24p into the final mode. So I want to walk through this. And it's basically-
it has changed a little bit because the new version of Final Cut it handles is better. But 24p source, and I just want to walk you through this real quick, if we look at it as a series of frames, A, B, C and D, that information gets recorded actually in the 60i. So that actually gets recorded into field information here.

And this is how it actually gets broken up in a standard what's called 23 pull bend when you break that up. So you actually have the A frame getting interlaced, in essence, twice together. The B frame gets interlaced twice together as a whole frame. Then it gets combined with the B and C frame. The C frame gets a whole frame. And then the D frame gets interlaced again on the disk, interlaced again.

So when you reverse that, you're, again, pulling frames out. And this is what the reverse telecine or pull-down of that means. Is it you're recombining frames to get a 24p source out? So the top is the original 24 frame acquisition. The intermediate is how it gets recorded to tape and then how it gets pulled back down into the 24p. $ Now, a lot of you are probably familiar with the DVX if you've done any video. It was really popular. It was the first consumer level digital video camera that did 24p. $ It used a different pattern, which is called the 2332 pull down. And I'm just going to run through this really quick just so we can get to some questions.

But this is basically, again, how that gets broken apart back into 24p. So you can start to see this is getting kind of complex. You're like A B B B C. This is stuff that really you don't have to worry about as much anymore because the new version of Final Cut. There's some things that are taken care of for you a little bit more that in the past you had to run through all these hoops.

So it's a little easier.

What happens to those Bs and Cs?
They just get left out. All to make it compatible with the current video technology so that video engineers didn't have to reengineer all of their equipment and change the standards and things like that. Now, the Sony V1U is an interesting example. It's a higher end camera. It's prosumer. It's about $4,500. Actually, you can get it probably for less than that now.

It's a great camera. It has one of the best looks we've found on a consumer level camera hands down. It's got some great features in it. But again, it works a little differently. When we first started using it doing editing, we were getting all these weird patterns and no one on the internet could figure out how to use it. You shot in 24p, but no one knew how to get it out into a delivery format. And it takes some tricks.

When you look at it as a pure QuickTime movie, so if I went into Final Cut and I was to capture it, this is actually how it would look as a QuickTime movie is you would see three whole frames. And actually, it's not quite-- you would see three whole frames, I should say. And you would see two interlaced frames in the way it gets broken up.

And you have to play some tricks going through and cutting it up. Now, the new version of Final Cut, actually, I haven't had a chance to test it yet, but supposedly it fully supports the V1U now so you don't have to worry about this if you're using the new version of Final Cut. But if you have someone who captured footage using a V1U before the new version of Final Cut, this doesn't quite work. You'll get these weird frame rates, you'll get these weird interlacing problems. Just be aware that a lot of these new HD cameras, you'll see these problems.

Okay? So resolution. Really quick, there are different resolutions most of you are probably with. 640 by 480 and 720 by 480. The 720 by 480 is really what most people, what most digital cameras are acquiring. Your television set is 640 by 480. And actually, it's not even that. It's actually half that when you really get into how it all works and how it signals coming down your TV set.

That's the great thing about things like Apple TV and new digital TV formats. They're using these true HD formats. Apple TV is using the 1280 by 720 format. And then on the high end, you have 1920 by 1080. So when you hear 1080i or 1080p, it's actually a 1920 by 1080 image.

Now, the i or the p is the big thing, whether it's interlaced or progressive. So if you're looking at a nice high end display, you want to get the best picture quality. You want the best overall delivery. You want a 1080p screen. But those are still relatively expensive. And really, most people are going to be looking at the 720p format.

In fact, most of you who get HD cable or get HD over satellite, you're actually only getting 720p usually. There are very few people actually doing 1080i. And there's no one really doing 1080p broadcast right now.

It's worth mentioning that even if your output resolution right now is not one of those higher end HD formats, the demand for that, of course, is growing.

So be thinking in terms of future proofing your content by rendering out to the full HD spec, if you can, if you're capturing at a resolution that will hold up to that. And knowing that six months down the line, a year down the line, you'll probably be starting to put that content down. Mac Break is a good example. We put out a full 1920 by 1080 version of the podcast. And folks with larger monitor and a CPU to handle the H264 at that resolution love it.

And that's a good point. We master all of our footage now at full 1080 quality. So this frame actually gives you an idea. The full image here is that full 1920 by 1080 pixels. The blue outline box is at 1280 by 720. That would be a 720p image.

The yellow box is showing you what standard DV cameras would acquire, a 720 by 480 image. And then the smaller one is the good old standard NTSC 640 by 480 image there, to give you an idea. And most of you have seen pictures like this. But the thing is that even if we're delivering down to a 320 by 240 small web image, we're still mastering in this full image quality to maintain the maximum resolution.

One of the things that happens when you take large images and you scale them down is you get that nice image quality still. You have more data to work with there. It might take longer to process your footage or it might take longer to crunch through it, but your end result is going to be much better than if you start out with an image that's 640 by 480 and you're going down to 320 by 240. So whenever possible, you want to get your footage in a maximum quality possible.

So pixel aspect ratio is really quick. Square versus non square. This is really a difference between acquisition and final delivery. Most people are not going to worry about this on their computer because you're just aware of pixels. So one-to-one ratio. And if that image would represent the square pixel ratio. Then you get all these non square ratios.

So you get DV, which is a .9. You get DV, which is a .9 aspect ratio. So a lot of you might be familiar in the old days when you put graphics up on screen, circles would also look like ovals, things like that. It's because of your pixel aspect ratio.

But then there's all these other ones. There's a 1.2 pixel aspect ratio, which is a wide screen DV. This is when you take a standard consumer level camera, you shoot wide screen, and it's still a standard NTSC signal, but they're playing this pixel trick to make the image wide screen basically into your standard DV camera.

On the higher end, on the 720p, where HDV uses a 1.33 pixel aspect ratio. And this is, again, because of the way they're using it. Now, when these camera manufacturers say they're shooting true HD, there is no one shooting true HD except on the high end. They're all shooting some smaller resolution, and they up res it to a 1920 by 1080 resolution.

So, for example, HDV actually shoots at 1440 by 1080. And DVC Pro HD shoots at 1280 by 1080. So it's a 1.5 pixel aspect ratio. Now, cameras like the HV20 actually has-- is one of the first cameras that has a true 1920 by 1080 sensor, meaning that the actual acquisition chip is a true HD chip. But again, because this is getting recorded into the camera, it's getting crunched down, it's getting reprocessed, things like that.

One of the first cameras that really is going to be usable at a less than quarter million dollar price that's coming out soon is the red camera. Many of you might have heard about coming from the founder of Oakley. That camera could revolutionize things. And its resolution is four times the highest HD right now.

It's 4096 by 2048. You could fit four 1920 by 1080 HD resolution images in the same area as a red camera's acquisition.

Is it square pixel?
And it's square pixel. So it will change the game. If you haven't heard about red, you're going to start hearing a lot about it. They're getting ready to ship those cameras. And it will probably revolutionize the video industry single handedly.

Color space. Yes, color space is interesting.

One other thing to talk about on the last slide, too, is not so much to scare you about this bunch of different pixel aspect ratios, and your stuff is going to be stretched or squeezed without you asking for it. One thing to keep in mind is that you may end up with source footage that's that way and knowing that you're probably, unless you're going back out for broadcast, you're going to be going out as square pixels.

So one thing to keep in mind is that even if you have one of these strange sort of non square resolutions that you're working with coming in, you just choose the appropriate square pixel resolution coming out, whether that be the 720p or the 1080p, that sort of thing.

So color space. There's always a need for compression, as we know. And one of the ways that they've done this with video signals in the past is to split up the signal into its luminance, its light and dark values, and then the colors separately. So RGB, we're used to the sort of additive color model where you've got three channels, you add a certain amount of each and you get your resulting color. Going back to days of black and white television, that was just a single luminance channel, light and dark.

The two others that you'll run into, and these acronyms are just there for demonstration purposes, is to-- when you split the color out, you're able to do something where you're scaling the color channels to reduce the bandwidth of the signal. And the reason that you're able to do this is that we are more sensitive to changes in light than we are in color. We're more rods than cones. So the trick that they used is to split-- there's a way to transform an RGB signal into the luminance and two color channels. And the two color channels are differences between the luminance and the red and the blue, if memory serves.

But in any case, you'll end up with these different color spaces for your video signals. So when you're getting to seeing things like, I'm going to skip one here, we'll get back to that in just a sec, so you'll see these sorts of designations when you're talking about the color depth of a given camera.

You'll see 4:4:4, 4:2:2, 4:1:1, 4:2:0. What those represent is that scaling of those color channels to reduce the bandwidth for a given video stream. So 4:4:4, there's some historical stuff behind why 4 is 100 percent. But for our purposes, 4 is unscaled. It's 100 percent resolution. And the first of those in that triad is your luminance. And then the second one is your first color channel. The third one is your second color channel.

So the high end codecs and high end cameras, like the CiniAltas, can capture two codecs that have no degradation of the color channels. And this is especially important if you're doing things like color different keying for visual effects. Since that is relying on the color information to pull your key, you can get edge artifacting with these lower color depths.

That's why DV has historically been hard to key, for example. So there are custom codecs in some of these high end video card manufacturers that will capture the 4:4:4:4 color data. Now you're going to end up needing raids that can capture at a speed that CiniAlta, for example, captures at 190 megabytes per second.

So you need some hardware to take care of these. But know that there are codecs out there, capture codecs that will preserve all of your color information.

And you can capture new a codec. But if your camera doesn't acquire the video at that color sub sampling, then it doesn't matter. So there's only really a couple cameras out there that are actually going to capture 4::4:4 video.

There's the F950, which is a qPuarter million dollar set-up. The new red that's coming out. A couple cameras along those lines are going to do 4:4:4. Most cameras in the high end are only doing 4:2:2. And prematurium (phonetic) below that is doing 4:2:0 or 4:1:1.

So the 4:2:2 designation means that the color channels are being scaled, I believe it's horizontally, so you are losing some color data there.

Now, perceptually, if you're not doing-- if it's not a green screen shoot and you're not doing a key or doing some sort of alpha channel generation off of that, you will most likely not see the difference. If it's just out in the field shooting, there's nothing wrong with 4:2:2. In fact, the 4:2:0, which is just a sort of funky designation to say that they scale, instead of just scaling on one direction, they're scaling on both directions, X and Y on the color channels.

Even HDV footage, you probably will not notice a big difference because of the color scaling. It's just a way of crunching down the signal to get it to tape. And then lastly, 4:1:1, just because it's something that's been used in the past. If you're using standard definition and DV footage, you're going to be on 4:1:1. So the color channels are actually scaled to one quarter of the resolution of the luminance image.

Just really quickly, codecs, your delivery format, the codec magic.
So there are a couple different ways to think about the codecs. You want to think about the codecs that are going to be used when you're capturing the footage or when it's delivered to you. Photo jpeg is one that's worth mentioning because you can use it for offlining.

There's a spec called offline RT, which is pretty much a photo jpeg compressed video stream that is at a small resolution. We use 3D4 by 216, I believe. And this is what allows us to get a couple hundred megabyte files that we can throw to our editors, have them do a remote edit and send us back the Final Cut profile, and then we reconnect to something, whether it be the CiniAlta or HDV or something like that. We connect to the large footage that's way too big to try and move out to everybody, and possibly even with the CiniAlta, too big to edit on it.

It just won't play back well.

And Final Cut has this built in as an offline format. It's actually what the basis of their offline format is.
Right. And then DV and HDV, which we've covered. Now, AIC is interesting, and it's Apple Intermediate Codec. And what that is is it's an all full-frame codec where it's like a photo jpeg, where each frame is compressed, but there's no interframe compression like you'd find with MPEG2, MPEG 4HH264.

And what that allows for is on lower end machines, you can get better playback than you would if it's trying to do the HDV stream where it's got to take the change information and compute that on the fly when you're scrubbing around. So it's worth mentioning AIC is another one that's in there.

DVC Pro HD is a higher end acquisition format. The electronic news gathering camera's shooting this. And it's something that you can use. In our project, we use it as the codec to conform all of the other codecs. This is before the sort of open timeline stuff in the new version of Final Cut.

Yeah, yeah.
And then the new and exciting one, of course, Apple ProRes 422. And what's interesting about this is it's a low bandwidth codec that has very high visual quality. You can use it either as something to push all of your footage too to use as an intermediate codec and everything conformed.

And it's also light enough to be edited on a laptop. So you can get very high visual quality. The thing about the photo jpeg and the offline RT, I'm not sure if anyone's seen footage like this before, but it is very highly degraded.

And it looks crunchy.
Yeah, very crunchy. It's hard even to see numbers on the smart slate or something like that, if you're trying to do sinking.

So the nice thing about the ProRes is for a very low, relatively low bandwidth rate, you're getting very high quality footage that can be edited on a laptop with a FireWire 800 drive or something like that.

Yeah, and we'll jump onto some questions. Real quick, I won't talk too long on it, is the output format H.264, which must be our-
we're familiar with QuickTime, MPEG4, MPEG2, and many of you might not know, or maybe you do, that the flash video actually uses on to codec. So it's their own codec or they bought the codec. It's a nice codec, but we actually don't prefer it as much as the H.264, personally.
H.264, for size versus quality, is still sort of king of the hill in our opinion.
And real quick, I want to mention on this before we drip to some questions is that delivery format is different than codec. Your codec is the way it's encoded. The container, which actually holds that codec, is your delivery format. It's a dot movie, it's a dot MP4, it's a DVD, it's a .swf file. So don't confuse your codec with the delivery format. There's a container format, and then the encoding within that container. The worst of these that drives us all crazy is AVI, because there is no standard for it. You have no idea what's inside this container.

It is just a pain in the butt. So we personally hate AVI most of the time.

QuickTime is also an architecture, so it also will play back the other format.
Yeah.
But adds to the confusion a little bit.
So we're going to go ahead and jump past this. We want to see if you guys have some questions in the last couple minutes here.