Networking and Security • 34:56
URL Access provides high-level APIs to allow applications to easily take advantage of various Internet protocols, including HTTP and FTP, as well as support file verify. Learn how to add URL Access to your product.
Speaker: Sari Harrison
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Hi, good morning. My name is Sari, and my topic for today is a set of APIs called URL Access. Now, URL Access has actually been a topic at WWDC for the past couple few years now, but this year we got our very own session, which is pretty exciting. We're going to have time to go through pretty much everything you've ever wanted to know about URL Access and then some.
Let's get started. Here's what I plan on covering today. We're going to go through what URL Access does and why you might want to use it. We'll spend some time, quite a bit of time, on going over the APIs in depth. We'll talk about what URL Access is like on 9 versus 10 versus your DP4 CDs that you got at the conference.
We'll talk a little bit about what we plan to do with URL Access in the future. We'll have a very quick demo, which you guys are going to need to pay attention to because we have very little UI in this product. So we'll have a quick demo, and then we should have some time left over to do Q&A.
My goals for this session, what I hope you'll learn today, is what you can do with URL Access, how to use it to get your Carbon applications going. And finally, since this API is a mature API now, it's been around for a few years, some developers have run into some limitations with it and some issues, and I want to make sure that any first-time users don't hit those as well.
So what is URL Access? If it's not totally obvious from the name, it's an API to allow you to access data on the Internet. And the way you do this is you give it a URL. We have support for the following schemes or URL prefixes: HTTP, HTTPS, FTP and file. It supports both downloading of data and uploading of data. And it's a Carbon API, so it's available on 9 and 10.
So this diagram gives an overview of the Apple security APIs, of which URL Access is one, partly because it supports SSL for HTTPS and also for organizational reasons. So as you can see, URL Access is part of the classic environment and part of the Carbon environment. And it's callable from legacy apps, Carbon apps, and it's also actually callable for Mac OS X apps, although the diagram does not make that clear.
The URL Access architecture, it's part of Carbon. As I said before, it implements HTTP, HTTPS, FTP, and file. And some of the toolboxes it uses are OpenTransport, Internet Config. The Keychain and SSL. We use OpenTransport, of course, to do our networking. We use Internet Config to find out about proxies and firewalls that the user has configured. We use the Keychain to automatically access sites that require authentication. And we use SSL to do HTTPS.
So now you know what it does overall. What does it do specifically? This is a set of features that are available to you independent of what scheme your URL is. We have an authentication dialog, which will come up automatically if the site you're going to requires authentication, and that is an optional feature. We have a progress dialog if you'd like for the user to be able to view the progress of the download or upload and stop the download or upload. And that, once again, is optional.
We support encoding and decoding of files. We encode binhex before you upload a file, if you like, and we also attempt to decode the file. We support binhex again, and we support using the stuff engine if the stuff engine is installed. It is a scriptable API, and we do have proxy and firewall support. Specifically, we support HTTP proxies, and we support the SOX firewall. And as I mentioned before, we support the keychain for automatic authentication.
HTTP feature set. We support download, no upload yet. We support posting, but that does require a little additional work, and I'll go through a code sample of that. We support auto-redirection, so if the URL you're going to is relocated to another site, we will follow that redirection until it terminates in some actual data and just hand you back the data. That is an optional feature. You do have full access to the server response if you need it. And we do support proxy authentication, so some HTTP proxies require you to authenticate, and we will bring up a dialogue for that as well.
With FTP, you can not only download files, but you can upload files. And you can not only download and upload files, but you can download and upload directories, entire directories as well. Or if you prefer, your URL points to a directory. You can also get a directory listing, but you're going to have to parse it yourself because we don't provide any parsing support. HTTPS, right now there's a little bit of a disparity between Mac OS X and Mac OS 9. On 10, we already support SSL 3 with 128-bit encryption. The current status of OS 9 is that we support SSL 2.0 and 40-bit encryption.
Now I want to apologize in advance for this next few slides because the whole point of URL access is that you don't need to know the technical details behind the protocols. However, that being said, let's talk about some functional limitations in URL access. First of all we don't parse HTML.
Well, of course not, but what does that mean to you? What that means is when a webmaster designs a website, they often have two options for implementing a given feature. They can do it in the HTTP or they can do it in the HTML. If they opt for the former, then we likely is not supported.
If they opt to implement it in the HTML, there's nothing we can do. The best example of this is login. When you use your web browser and you get a dialogue saying, you know, please log into this site, that's an HTTP authentication. Now when you use your web browser and you get a form asking you to -- for a username and password, that's HTML.
That's HTML authentication and we don't support that. Some other examples are file not found could, you know, there could be a web page saying file not found and they don't tell us in the HTTP that the file isn't there. And redirection can also be implemented in the HTML, but it's usually implemented in HTTP.
Now because of this fact that we don't parse HTML, there's some FTP limitations. If you go through a proxy, FTP is basically transformed into HTML. And because we don't parse the HTML, we can't upload files and we can't download or upload direct views. And the last one has been hit by a number of people. URL Access is pretty popular as an option for getting HTTPS support because it's some fairly...
[Transcript missing]
So the first thing that happens when you make an SSL connection is the client asks the server for the server certificate. And the server obliges.
And then the client has to do due diligence to make sure that this certificate is from a server that they trust. How do they do this? They need to find out who issued the certificate, who signed the certificate. And that certificate in red, which we also call the root certificate.
URL Access goes through its list of known root certificates to see if that root is in its list. If it isn't, you don't get to connect. If it is, you do. And this set of root certs is a static set. The only way to update it is to get a new version of URL Access, which is unfortunate, but we hope to rectify that with the secure transport library, which some of you may have heard about in the security overview.
So now let's talk about what you can do with URL Access. I thought I'd give a few sample uses, and hopefully that will get you thinking about what you can do with it, too. Internally at Apple, we have several high-profile clients of URL Access. Sherlock uses it to do its Internet searching. Software Updater uses it to upgrade files over the Internet.
Help Viewer uses it to display Internet content. When the content isn't local, it uses URL Access to go get the content. Network Browser uses it to access FTP sites, which is a pretty cool feature if you haven't played with it. You go to Network Browser, enter the Connect To dialog, your FTP site, and you can browse through the FTP sites.
Some third-party uses, iCab. Some of you may be familiar with this web browser on the Macintosh. It uses URL Access only for HTTPS sites. Adobe uses it in Acrobat to implement a feature called Web Capture. This is a feature that allows them to take HTML files and transform them into PDF. And finally, numerous companies are using the scripting component internally to do various and sundry things.
So now you know what you're going to do with it, I hope, and let's talk about how to do it. Let's go through the API. We'll talk about the synchronous API, which is the easiest to use. We'll go through flags which control aspects of the transfer. We'll talk about properties, and then we'll move into the asynchronous API. It's a little more flexible, but consequently a little more difficult to use. And we'll talk about states.
The very easiest way to download a URL on any platform-- OK, I don't know for that for sure, but it's kind of fun to say-- is URL Simple Download. It's a single call. to download a URL. All you do is you provide the source URL. You provide it the destination FS spec if you want to download to a file, or you provide the destination handle if you'd like to download to memory.
And you also pass in any flags, you know, if you want it to do something in particular. Now, I'm going to make a request that you all, if you use this API, please use it in thread. It's a synchronous API. It's going to take some time to complete, so please put it in a separate thread.
So let's see how URL Simple Download works. Here we have a routine, getURL, which I'm going to implement in several different ways throughout this presentation. It takes a source URL, it takes a destination FS spec, And in this case, we're going to use URL Simple Download to implement it.
As you can see, it's pretty simple. We declare some flags. We declare status. And we call URL Simple Download. We pass in the URL. We pass in the destination FS spec. We say we don't want to download to memory, which is the following nil, and we say, here's some flags saying what I want you to do with my download. When the URL simple download completes, the download is complete, and we're done.
So I kind of breezed over the two last parameters to UFSimple Download, and those parameters involve the system event procedure. This is a way, a parameter to all the synchronous calls. And if you're going to display a dialog, or you think a dialog is going to come up, you really need to implement one of these. It allows the dialogs to be displayed as movable modal. If you don't implement one, the dialogs will be just your regular modal dialogs.
And what the system event procedure does is basically receive update events so your app has time to update itself while we're moving our dialog around. The second synchronous API I want to talk about is slightly more complex. There's two more steps, but it's significantly more powerful, and it's called URL Download. The two extra steps are you have to create and dispose of a URL reference.
And once again, it should be used in a thread because it's going to take some time. Now, why would you use URL Download instead of URL Simple Download? Well, because you have this URL reference, you can make any of the other API calls to do various and sundry things. You can call get and set property before and after the download. Or if you're threaded from another thread, you could monitor the state of the download or even abort it.
So here is our get URL routine again, implemented in terms of the URL download call. As you can see, there's just two extra steps. Once again, we declare our flags, we declare our status, we declare URL reference now, called URL ref, and we create it by calling URL new reference.
Then we call URL Download. This time we pass in the URL Ref instead of the source URL directly. We pass in the destination FS spec. Don't download to memory. Pass in our flags. And when that call returns, the download is done and we are free to dispose our URL reference. So now we've talked about synchronous downloading. Let's talk just briefly about synchronous uploading.
There's two ways, once again, URL Simple Upload and URL Upload to upload synchronously, and they're very similar to their downloading counterparts, with the following exceptions. You provide a destination URL instead of a source URL, of course. You provide a source FS spec instead of a destination FS spec. In addition, you cannot upload from memory. You can only upload from a file. And, of course, different flags apply to uploading as opposed to downloading. Finally, in its current state, uploading is FTP only.
So we've talked about flags a lot. Let's get into some detail on those. They do control optional aspects of the transfer, and they are used in both synchronous and asynchronous calls. Here are some sample flags. Replace existing flag. basically says if the destination file, be it an upload or a download, exists, then go ahead and replace it. Now if you don't set this flag, what we do is we create a unique name. We never return destination exists error.
So that's something to keep in mind. The next flag, display progress flag, is how you indicate that you want a progress dialog. Display authentication flag is how you indicate you want an authentication dialog. Expand file flag says, after the file is downloaded, please do everything you can to decode it for me. Benhex file flag says before the file is uploaded, if the file has a resource fork and it's not of type text, please Benhex it for me.
And finally, the no auto-redirect flag. If the HTTP site does redirect you and you don't want that redirect followed, then you can set this flag and it won't be. Now let's talk about properties. Properties fall into one of four different categories. They are either aspects of the URL itself, for example, the host name or the entire URL. Or they are aspects of the resource that the URL points to. For example, the size of the file, the name of the file, or the MIME type of the data.
The third category is aspects of the transfer itself. A good example here is the URL status string, which changes as the download progresses. It goes from initiating to connecting to downloading to aborting, et cetera. URL Total Items is a property that's available if you're downloading or uploading an entire directory. It tells you how many items there are total. Those first two are both useful if you're displaying your own progress dialog.
HTTP redirected URL. If there is a redirect, you can find out about it in this property. The last category of properties is they're another way to control the transfer. If something is more complicated than just a yes or no question and we couldn't make it a flag, we made it a property instead. For example, the HTTP request method is something you can set. Or the HTTP user agent. This is really useful. A lot of websites are quirky in that they want you to be Mozilla compliant, so you can set your user agent to have a string Mozilla in it.
So properties can be retrieved with URL Get Property. They're set with URL Set Property. And they may change during the transfer. Two examples of this are the status ring, which changes frequently. Another example is some properties may move from an unknown to a known state. In HTTP, if the HTTP server tells us the size of the file, the resource size property goes from unknown to a known state, and that is a property change.
Here's a list of all the properties you can set. This is conclusive. No other properties are settable. That first line should read URL HTTP request header, which is how you set the entire header of the HTTP request. Or you can just set the method or the body. You can set the user agent and the last two, username and password. This is how you would implement your own authentication dialog.
Display the dialog, enter, have the user enter the username and password, call set property on both of those items, and then do the download. So now I want to show you how to do a post. Doing a post is a pretty popular thing in URL Access. A lot of people use it for posting.
In this case, we're going to use URL download once again, but we're going to set some properties first. We're going to create a new URL reference. Then we're going to set the HTTP request method property to be post, and the HTTP request body property to be our post data.
And then we simply call URL Download. That does the post, and the answer to the post will be put into whatever your destination is. So that's it for the synchronous APIs, flags, and properties. Let's move on to the asynchronous APIs. Once again, to use the asynchronous APIs, you need a URL reference. But instead of calling URL Download or URL Upload, you're going to call URL Open.
URL Open is an asynchronous call. What it does is return immediately and starts the download or upload in the background. You can use it to download to a file, or you can choose to get the data yourself by calling get buffer and release buffer. Now, there's two ways to monitor an asynchronous download. You can either use a notification routine, which is the preferred method, or if you insist, you can poll with URL Get Current State.
Now, why would anyone want to use the asynchronous APIs? If you can't think of a reason, here's a couple. If you're a filter, there's a very good reason. For example, if you're a web crawler and all you do is parse the HTML that you receive for more URLs and then go and download those, you can consume the data before the data is completely downloaded.
This is a very good reason to use the asynchronous APIs. Another good reason is if you want to display your own progress, the async APIs are the only way to get enough information to do that. And finally, if you have some moral opposition to threading, then you should probably use the async APIs.
So this one's a little longer, as you can see. And please don't go back and copy this code, because it's not how we want you to do it. It does pull, but I want it to be consistent with my example. And so we're going to implement getURL in terms of the URL open call.
So the first, the beginning is pretty much the same. We set some variables up. We create a new reference. The first highlighted line, notify UPP, creates a new URL notify UPP. That is a carbon-only step. If you're implementing online only, then you don't need to do this, but hopefully you all are going to carbonize your applications, so you need to create a notify UPP.
You pass in the notification routine that you want to use, and then you simply call URL open. - Yeah. Pass in the URL reference. You say, "Nil, I don't want to download to a file." You pass in any flags you want to control the download. You notify UPP. And then you pass in a mask for what events you want your notification to receive.
and any user contacts. In this case, I'm passing in a pointer to a Boolean. Done. That's the part I don't want you to emulate. And then I'm going to sit and spin. Well, I'm not done. URL Idle. And URL Idle is a call that allows URL Access time to do its work. It's very important, especially for polling. And finally, once I'm done, I can dispose my UPP, I dispose my reference, and that's it.
So here's our notification routine. My notifier. The parameters passed in are your user context, of course, the event for which you were called, and the rest of the information is in the URL callback info struct. This is mainly so if we want to pass you in more information in the future, we can do so by modifying the struct. There is a version number in the structure.
So the first thing we're going to do in our notification routine is set up our user context, our done variable. And then we're going to switch on the event that we received. If we get a data available event, we need to go process some data. If we get a completed or error occurred event, that means we're done. So we set our done variable to true.
Process data. We're passing the URL ref so we can actually get the data. And then we're going to sit in a loop, getting data and releasing data. Now, URL get buffer says I'm borrowing a buffer from URL Access. You notice that there is no memory allocation in this routine. That's not a bug. It's not going to crash. We're basically getting the buffer from URL Access. We consume it in any way we so desire, and then we give it back.
Once again, we're going to call URL idle because we could be in this loop for a while. And then we get the current state of the download to see if there's still data available. When you get the data available event, you get it one time per set of data available. So you have to consume all the data before you're going to get another event. Let's talk a little bit more about events. They are, of course, reported to the notification routine.
You can register for a subset of events. You can say, I only want events that relate to data being downloaded. That's the all buffer events mask. Or you can just, for example, register for just the completed and error occurred event, if all you care about is whether or not the download is done.
Let's talk about a few of the events that we have. The resource found event means URL Access has gone out and looked for the resource and it actually has successfully found the resource. The resource is there. Everything is great. The property changed event happens anytime any property changes, either from an unknown state to a known state or from one value to another value.
The error occurred event reports non-recoverable errors. It does not report any interim errors that we've worked around. If you get an error occurred event, the download is done. The data available event, as I talked about earlier, you're going to get one per set of data that's available. Periodic event just gives you time to do whatever you want to do. I didn't mention earlier that the notification routine is called at a memory-safe time. You can do whatever you want, allocate memory, take a whole lot of time, draw to the screen, do anything you want, as long as you call URL Idle periodically.
Abort initiated event says, "Somebody hit the abort button," or "Somebody called URL abort." And we have this event mainly because aborts are non-instantaneous, so you may need to know that an abort is beginning to happen, and that will be followed quickly by the completed event. States. Let's talk about states anyway, because states are useful even if you don't want to poll.
URL Get Current State reports the state of the transfer. And some states have event counterparts. For example, there's a data available event in a data available state. There's an error occurred event in an error occurred state. But some of them don't. The one thing that you cannot find out with state that you have to have events for is property changes.
And they are useful with both the async and sync APIs. All you need is a URL reference, and you can find out the state of your download. Here's a list of sample states. It should be all pretty obvious. Connecting, we're downloading, there's data available, an error occurred, and we've finished.
So let's go through some issues that other developers have hit in the past when they're programming to URL Access. URL Access is a highly-threaded API. So you absolutely have to call URL Idle, and I hope I drilled that home in my examples. Set property expects C and Pascal strings in some cases. Now this is a bug, but it is a released bug, so we have to continue to support it. I do want to try and rectify this issue on Mac OS X, possibly with the addition of a new API.
So for example, when you set the user agent, it's got to be a C string. When you set the user name, it's got to be a P string. And this is documented in the latest rev of the documentation. Another issue people have had is that you can't call URL Dispose Reference from notification routine. You will crash, I promise.
A few more. The HTTP request body cannot contain null bytes. That's a direct result of the fact that we expect it to be a C string. This is fixed in post-904 releases, of which there are none, I realize, but there will be some day, I expect. URL reference is not reusable.
You can't set properties on it, call URL download, and then set some more properties, call URL download again. You have to create and dispose it, one download per. Finally, the authentication dialog and the progress dialog, for that matter, are not supported in the async APIs. So if you are using the async APIs, you have to do your own authentication dialog and your own progress dialog.
So that's it for the API. Let's just quickly go over what we've talked about. We talked about the synchronous API, URL simple download, URL simple upload, URL download, URL upload, four calls, very easy to use. We talked about the asynchronous API, URL open, URL get buffer, URL release buffer. We talked about flags that control aspects of the transfer and properties which do all sorts of things. We talked about events which are reported to your notification routine and states which are available to you any time you have a URL reference.
So Mac OS 9 and Mac OS X URL Access are different slightly, and here's how they differ. For those of you who have been to this talk in the past couple of years, we talked about the ability to extend the capabilities of URL Access using plug-ins. Now, this is not supported on Mac OS X.
We may add it back at some point, but it will not be in the first release. Scripting may not make the first release of Mac OS X. Actually, now that Steve has announced that summer is our beta and January is our GM, I think we'll be able to get scripting done.
Mac OS X, as I mentioned before, has 128-bit SSL 3. And the current Mac OS X has SSL 2, 40-bit. Properties will work correctly, possibly with the addition of a new API. And this last one is just kind of an FYI. Mac OS X is a carbonized version of the Mac OS 9 source base, which means that it will be bug-for-bug compatible. Hopefully that's good news.
Now, DP4 works as much as it needed to work for Sherlock, basically. We locked down quite a while ago, so things are much better on my machine. But HTTPS isn't working. Deauthentication and progress dialogues are not yet working. Proxy support is disabled because of some issues we had with Internet config. Keychain support is disabled for the same reason. And we're not yet part of Carbon. It's a separate framework right now. The path to that is System Library Frameworks URL Access.framework, so you have to link that in.
So hopefully you're all raring to go, you have ideas, you want to get started. If you want to develop on Mac OS 9, you need to download the security SDK, because we are a member of the security APIs. That's where the URL Access SDK is. Or of course you could just download CarbonLib.
You want to develop on Mac OS X, URL Access is on DP4 with its limitations. And you need to link with system library frameworks, URL Access.framework. You are going to want to get the latest documentation. It has been revved recently. It's a big improvement. And that is up on the TechPubs website.
So what are we going to do in the future? We are going to support 128-bit SSL 3, and we're going to have new root certs for Mac OS 9. We're going to do some bug fixing, too. We're going to fix a memory leak that a number of you have told us about. Thank you very much. We knew about it.
But it's fixed now. We're going to support NELS in the HTTP request body. That has been requested by several developers. And this last one, support for SHA1 with RSA signed certificates. I'm going to briefly get SSL technical again. VeriSign has recently changed the way that they sign their certificates. So as people, as servers on the web update and get new certificates, these certificates are not going to be readable by URL Access. It's a very bad thing. We are going to fix that.
On 10, we have more grandiose plans. We do intend to implement HTTP upload. We want to implement connection reuse. What this means is that if you want to do a series of uploads or downloads, you don't have to bring up and tear down a connection every time because you're going to have a session.
We're going to move to the secure transport library, which will give us our dynamic route cert database. We'd like to improve our built-in encoding and decoding. Certainly, we want to support Nick Binary and whatever else you all think is important. And we may re-enable plugins and/or add additional scheme support built in.
Okay, so this is where you all have to pay attention. It's a very quick demo. We can move over to the Demo 1 machine. Now what I have here is an application URL Access demo. What I did was I went to the Carbon Sample Apps folder and I basically used it without modification, except I added two menu items, FTP Download, HTTP Download.
Both of which simply call URL Simple Download with a hard-coded URL. Let's start with the HTTP download. So here's our progress bar. As you can see, the user can see the progress of the download. They could hit the stop button if they so desired. They have an indication of the time remaining less than a minute.
If they want more information, they can find out what they're copying, where they're copying it from, where they're copying it to. In this case, we're downloading to memory, so the to field is empty. And they get a running total of how many bytes have already been copied and how many bytes are total. And of course, we can abort it at any time.
FTP download, this is connecting to FTP.apple.com, as you can see from the initial line. You can go ahead and type in your username and type in a password. You could add it to the keychain if you wanted to. Or if you know that the site that you are connecting to supports anonymous login, you could just go ahead and click the anonymous button. I'm going to try this, actually, earlier this morning. It did not work. I think FTPI.com is just really busy today. It's just refusing my connection, but I'm going to go ahead and give it a try anyway.
and it refused my connection. So what you would have seen is another progress bar. So we've already seen that. And that's it for the demo. Can we move back to the slides, please? So to summarize, URL Access is the API to use if you want to get data off the Internet. It's a Carbon API. It's available to you on 9 and 10. It's very easy to use, synchronous APIs, single call to do a download or upload. And we also have flexible async APIs if you need them.
The one session I want to point you to is the Security Feedback Forum. If you don't have time to get your feedback to me today, then you can go to this. We are including URL Access in the Security Feedback Forum, and that's in room J1 today at 5 p.m. And if you want to talk to somebody about this after the show, you want to contact Tom Weier, who's our Network and Communications Technology Manager in Apple Worldwide Developer Relations.