Networking and Security • 34:30
This session presents the Mac OS X APIs available to easily access the Internet from within your application. Various internet protocols, including ftp, http, and https are discussed.
Speakers: Richard Murphy, Becky Willrich
Unlisted on Apple Developer site
Transcript
This transcript was generated using Whisper, it has known transcription errors. We are working on an improved version.
Hello. My name is Rich Murphy, and I'm the manager of the platform security group. This is session 311, Web Enabling Your Application. During this session we're going to be talking about URL access, giving an overview of the functionality in the library, discussing the APIs, talking about URL access on Mac OS X, and also talking about the NSURL APIs and Cocoa Web Programming. During this session, you're going to learn what you can do with URL access, basically how it applies to your problems, how you can incorporate URL access into your Carbon applications, the limitations of URL access, and future plans for the APIs.
What is URL access? URL access is an API that allows you to easily access data on the Internet by the use of URLs. supports http, https, ftp, and file type URLs, has downloading and uploading capabilities, and it's a Carbon API that's available on both Mac OS 9 and Mac OS X.
This slide indicates how we see URL access actually being used. For the most part, if you're building a Carbon application, you need to use URL access. URL access is available on both platforms and will provide all the basic URL functionality or URL download and upload functionality that you need.
Richard Murphy, Becky Willrich It's based on the Mac OS itself, which provides all the networking functionality underneath, actual socket access or file access and that sort of thing. Richard Murphy, Becky Willrich On the Cocoa side, we have the CF APIs. Core Foundation, or CF, provides an additional URL accessing type APIs. Richard Murphy, Becky Willrich And they're also based on the Mac OS X kernel and the networking functionality. capabilities of that.
Here's some standard URL access features. We have an authentication dialog that pops up whenever you need to actually authenticate or log in to a remote server, whether you're doing ftp type access or whether you're going to an authenticated http or https site. There's a progress dialog built into the library that will come up and show you how much time is left.
It includes file decoding capability. It links into doing dbin hexing and also working with Apple single decoding. Additionally, in Mac OS 9, it currently will, expand .sit files using the Aladdin stuff and expander capabilities. It's scriptable using Apple Script, has proxy and firewall support, And it also supports the keychain for storing your authentication information.
For http, we have download capability. We have the capability of doing post and put, although it requires you to do a little bit of additional work with the APIs. It auto redirects. You have full access to the server response by taking a look at properties. It will also do proxy authentication.
For ftp, URL access will also provide upload of files and directories, or download of files and directories, and it gives you the capability of getting directory listings from the ftp server. For https, we also include SSL version 3.0 capability with 128-bit encryption. There are some functional gotchas when you're using URL access. Basically, it doesn't bother to parse the response of HTML.
This can be a problem when you're doing redirection, if you have a file not found on the remote side, or when you have login failures and that sort of thing. There are proxy limitations for ftp uploading, when you're trying to send a file up to an ftp server, or when you're doing ftp directory downloading.
There's a couple third party applications that are actually using URL access, both in 9 and 10. iCab is currently using it in order to download https sites, and Adobe uses it in Acrobat to implement the web capture facility. There's also a fair number of companies that are using the scripting capabilities to help manipulate files, get them up to servers, and back down from servers as well.
So the next thing we're going to do is go over the APIs. There are synchronous APIs. They include a system event procedure that allows you to trap system events and react to those. There's various flags that you can specify to URL access, which will help modify its functionality.
There's properties that can be set and that sort of thing and interrogated in order to perform various operations like http puts and posts or to get back information from an http server. There's the asynchronous API, which allows you to start URL access off doing a transfer. And in those cases, you want to be checking for events. Then there's various states that URL access goes through that you can interrogate to see how it's working.
The first of the synchronous APIs is URL simple download. This is really the easiest way to download a URL. If you're writing an application that basically just needs to grab something off of a server, this is really the function to use. All you have to do is provide a source URL, a destination file spec or handle, and then flags indicating how you want URL access to actually behave. If you're using this in an application that needs to be interactive, you need to use this in a thread because it is a synchronous API. It will block until the transfer actually occurs.
Here's an example of using URL simple download. Notice here we've created a little function called getURL, which takes a source URL as an argument and a file spec in order to download the actual file to a file on the Mac OS file system. We set some URL open flags for passing into the actual function call, telling it to replace an existing file if it's there, display the progress as you're doing the download, and expand the file as it comes down.
We call urlsimpledownload, passing in the URL, passing in the destination file spec, passing in nil to the memory argument, because we're not going to download it to memory, passing in the flags, and in this particular case, we aren't providing an event proc, and so therefore we also don't need to provide a context. It makes the call, and then it simply returns, along with its status. It will have downloaded all the data into the file and that sort of thing, or return an error status.
System event procedure can be used as a parameter to the synchronous calls. This will allow any of the dialogues that are displayed to become movable modal. Any update events will be passed through to the procedure that you've passed in. If you don't have one of these, the dialogs will become non-movable modal.
The flags for URL access control optional aspects of the transfer. As we saw a little bit earlier, it had the replace existing flag or the display progress flag and that sort of thing. These flags can be used in both the synchronous and the asynchronous calls. and various calls will accept different flags. Check the documentation online for that.
Here are some sample flags that we have. The Replace Existing flag, as I said earlier, will replace a file on the download or the upload if it's already there. Displaying Progress. will happen if you pass in the KURL display progress flag that will put up a little dialogue.
DisplayAuth will actually invoke the URL access authentication window to get your login and password. Expand File will invoke Aladdin Expander, Stuff Expander. Bin Hex on the way up will actually bin hex the file. and then there's the no auto redirect flag. In other words, come back with an error if you're being told to redirect.
There's also, on the synchronous side, there's a little bit more complex call if you want to, called URL download. If you call URL access through URL download, it takes a few more steps in order to actually invoke the call, but it allows you a lot more creativity or a lot more flexibility in what you can actually get out of the calls.
You have to create a URL reference using URL new reference and URL dispose reference. Once again, this should be used in a thread since it's a synchronous call, but once you have that reference, you can set various properties and do additional manipulation of how URL access will actually be used.
So as I said before, if you use URL download, you can get and set properties before and after the download. You can set things like the user password if you want to or things like that. And you can also, after the call is made, interrogate the URL reference in order to find out what the status was coming back from the server a little bit better or in a little bit more detail.
If you've actually threaded it, you can monitor the state of the download, or you can abort the download as it's progressing if it's taking too long. Here's an example of using URL download. In this particular case, we've made another function, getURL, same as before, taking a URL and passing in the file spec destination. You'll notice the call to URL new reference, which actually allocates a URL ref, which can later then be used in URL download. You'll notice the call structure doesn't take a URL as the first thing. It takes a URL ref. and has the arguments shown here.
After the call to URL download, we do a URL dispose reference. On the synchronous upload side, we have URL simple upload and then also URL upload. And they pair up the same way the URL download calls pair up. URL simple upload is easy to use. URL simple upload is very easy to use.
URL upload is a little bit more difficult, gives you a little bit more flexibility. In this particular case, you give the destination URL of where you want to go, and a source file spec is provided. One of the limitations of this is you can't upload from memory. Different flags apply to the upload side. And right now it's ftp only.
For properties, you can get the URL host or the URL actually passed in. or you can get the resources it points to, meaning on the way back you can check to see what the size was of the resource that was downloaded or the resource name on the way down, which might have been specified by use of the URL itself or you could have specified it through the file spec. Also the MIME type that it ended up mapping to.
In the download, you can get the status string, checking to see how things went, or you can get the total number of items. For instance, if you've downloaded your directory, this would tell you how many files actually got brought down in the download. For http specific properties, you can get the request method, the response header body, or a redirected URL.
Properties are received using URL get property. You specify the property that you're asking for and give it a variable that you expect to have it come back in. In order to set properties, you call URL set property. Any of these properties may change as a result of the transfer itself. It may be that the username and password might change as a result of actually redoing the authentication from the auth dialog coming up, that sort of thing.
Or it may be that other properties are set as a result of the transfer from the http server returning certain information. These are all the properties that you can set before making the call. Once you have the URL reference, you can set any one of these properties in the URL ref.
Here's an example of using the properties in order to do an http post operation. Notice before we said that you could do a put or a post, but it requires a little bit more work. In this particular case, we're going to end up doing a URL download in order to actually post data up to the server.
We make the new reference, we fill out the request method property with a post request, We also set the property of the http request body with the actual data we want to send up to the server. Then we just call urldownload, which should have the full complement of arguments. Sorry about that. After the call we dispose the reference.
The asynchronous APIs need a URL reference. In order to do an asynchronous call, you need to create a URL reference. The one call that you can make for this is URL open. URL open will either do an upload or a download, depending on what you've requested. You retrieve the data with URL get buffer and URL release buffer. You monitor the state of the transaction using URL get current state.
Various reasons why you might want to use the asynchronous APIs. If you're basically just filtering something, you want to send it on through. That's a good reason. If you want to display your own progress rather than having URL access provide the progress, or if you don't want to include multiple threads in your application.
Here's an example of an asynchronous download using URL Open. So in this particular case, we created a URL reference. We also set up a notify proc. with the notifyupp equals new URL notifyupp with the function mynotifier, which we'll just show you in the next slide. The URL open call includes the URL ref, the flags, the notify proc pointer, and then the events mask that we want to take a look at and also a pointer to the user context.
In this particular case, we're just setting up a little done flag. We have all of that in a do while loop here and checking to see if we're done or not. and in order to give URL access a little bit of time to work, it calls URL idle.
Here's the notification routine that we'd actually registered, my notifier. You notice that it's switching based on the events coming in for a data available event, in which case we just process the data, or in the particular cases of a completed event or an error occurred event, we simply set done to true and we break out of the routine.
Here's the process data call that gets called from the notify proc. Gets the current state, gets the current buffer, does something with the buffer, and then releases it. Events are always reported to the notification routine. You can register for a subset or you can actually handle all events coming in. If you need all the events, use K URL All Buffer Events Mask.
Otherwise, you actually OR together the appropriate flags that you want to look at. In this particular case, the completed event mask and the URL error occurred event mask. Here's a sample of all the events that you can possibly be waiting for. Resource found, property changed, and so on.
Here's a case of an asynchronous download where we're actually going to poll. So we set up to do the download. Call URL Open, and then we go into a Do While loop. Basically what we do is we just check the state by calling URL Get Current State, saving that into Current State, and then check it against the various states that would be completed, the URL Completed State and the Error Occurred event.
The URL get current state function itself reports the state of the transfer. Most states have some sort of an event counterpart. You can't find out about property change events by using states. So you can't tell about, for instance, the http properties changing or something like that. There is no way to actually signify that in a state. Checking states is useful in both the asynchronous and the synchronous APIs, so that if you've used one of the synchronous calls and gone off into another thread, you can get the current state of a transfer.
Here are some sample states that are available. The connecting state, download state, data available state, error occurred, and completed state. So in review, there is a synchronous API using URL simple download, URL download, URL simple upload, and URL upload. And then the asynchronous API where you'd use URL open. various flags and properties and events as we've seen help you manipulate how the transfer actually works.
Some more programming gotchas. URL access currently is highly threaded. You have to call URL idle in order to give URL access a chance to work. The set property function sometimes expects C strings and sometimes Pascal strings. One of the things we're working on in URL Access is updating some of the documentation.
As we've started to use it, we've had a transition period over the last couple of months. We have some new people working on it. We're finding that we need to actually get a little bit more documentation. So expect that to be updated really soon, I hope, as far as what properties are expected as C or Pascal strings. You also can't call URL Dispose Reference from a notification routine.
If you're using Mac OS less than 9.04, the request body can't contain null bytes. The URL reference itself isn't reusable. And if you're using URL open, the authentication dialog isn't currently supported. For Mac OS 9 versus Mac OS X, there was some capability of having plug-ins for URL access in Mac OS 9. We no longer support that for Mac OS X. We've fixed some of the ways some of the properties work in Mac OS X so they work a little bit more correctly. And currently the file protocol is not available on Mac OS X.
If you want to develop applications on Mac OS 9, here's the URLs. If you want to work with CarbonLib 1.02 or later, there's a URL for that. I think these are all available at the end on one master URL at the very end, so you'll be able to get all those. If you want to develop on Mac OS X, install Mac OS X. And currently, link it with the system library framework's Carbon framework. We're all part of Carbon. And the latest documentation is at the URL provided here.
For future plans for Mac OS X, we're working currently on a new URL access that will be a little bit more targeted directly towards Mac OS X. We've had some problems where some things aren't working quite properly using some of the old ways of working with 9. And this is giving us some performance problems and some stability problems. So that's really been kind of an area of focus over the last month or two months, actually. There will be a lot less threads in the library. It'll hopefully be a lot more stable, it appears to be so far. And it'll include streaming upload and download capability.
will actually be supporting connection reuse in 10, so that if you have multiple instances of going to an HTTP server, they'll actually be able to pick up the same connection. Secure Transport will start to use the security framework root certificates. This has been a problem for people with their own self-signed certs and that sort of stuff, trying to access secure servers. We've had problems where people need to update for new root certs that are out in the environment.
So in summary, URL access allows you to access internet data by URLs. It is a Carbon API. It has fairly easy to use synchronous APIs and pretty flexible asynchronous APIs. So, now next, in order to talk about web access from Cocoa, Becky Willrich will be out here to talk about the core frameworks.
Thank you. Thanks very much, Richard. So I want to talk a little bit about how you can do the same kinds of things as URL access provides to Carbon from the Cocoa world. And I'm going to start by talking about CFURL. CFURL is just a simple CF type from Core Foundation that's used to represent URLs. I need to correct one thing that Richard said.
Core Foundation is not part of the Cocoa APIs. It's part of the Carbon APIs, and it's at a low level in the system where it can be leveraged from both. So CFURL is not a Cocoa-specific type. Now for those of you that are working in Cocoa, you'll find that CFURL is bridged to NSURL. So a CFURL is an NSURL, and an NSURL is a CFURL.
What CFURL does for you is it parses the URL string and gives you access to all the different component pieces. It also handles your internationalization issues, things like the percent encodings. Right, things like the percent encodings in particular. And when you're working with the CFURL APIs, you should be aware that unless the encoding is explicitly specified in an argument, we're assuming UTF-8. However, CFURL is a pure data class. In other words, it does nothing but represent the string. There's no support in there for doing the actual network I/O necessary to get the data.
So just one more note on using CFURL. You'll find that a lot of the core foundation APIs, and increasingly the Carbon APIs as well, are taking CFURLs to represent files. This is trickier than it sounds because in part of the encoding issues involved with high-bit characters. So whenever you create a file URL, we recommend that you use the functions that are named with the word file, and I've given some of those up there.
And then for further compatibility with the Carbon world, there are conversion routines which will take a CFURL and convert it to the matching FSRef. One catch there, of course, a CFURL can represent files that don't exist. FSRefs can't, so in that case the conversion will fail and will return an error.
Okay, so from that I'm going to go on to talk about Cocoa. These are the topics we're going to cover. We're going to start out by showing how you can use URL access as Richard described it from Cocoa. Then we're going to look at the specifics of using NSURL. From there we're going to look at NSURL Handle, which is the class that actually does the download work inside of Cocoa. And then finally we're going to talk about how you can extend NSURL Handle to do your own downloads to support different schemes and so on.
So the first thing is you can in fact use URL access from Cocoa, more or less as you would from a Carbon application. There are two things that you have to be very much aware of though. The first one is that you cannot use any of the UI switches in URL access. That's because URL access brings up Carbon panels, and Carbon panels and Cocoa panels don't coexist very well.
The second thing you have to be aware of is that URL access uses thread manager threads. So when Richard talked about URL access being highly threaded, those were all created via the thread manager APIs. Well, the thread manager uses cooperative threads, whereas people programming Cocoa are more used to pthreads, which are not cooperative. When you use URL access, you have to be prepared to work in the cooperative threading world of the thread manager.
So what does that mean? The first thing it means is that you must call URL access from a thread manager thread. URL access is not prepared to be called from a preemptive thread. The good news is the main thread counts. The bad news is if you call it from another cooperative thread, the main thread has to be playing nice. That means the main thread must be prepared to yield to the other cooperative threads.
The other thing that means is if you are using the asynchronous APIs, URL open, you must remember to call URL idle for the same reasons that Richard described. If you don't call URL idle, you won't be giving time to the other cooperative threads. The easiest way to do this from Cocoa is on the main thread, post a notification to yourself with the posting style post when idle. When you receive the notification, you know that there are no events in the queue. So take the opportunity to call URL idle and give up time to the other threads.
Okay, so now we're going to do the same thing that Richard did with the URL Access APIs. We're going to talk about the synchronous ways you use NSURL and then the asynchronous ones. Synchronous one is very simple. You just send the URL the message "Resource data using cache." Pass a flag yes or no depending on whether you want NSURL itself to hold that data for future reference.
The asynchronous APIs are a little more complex. You send the message, load resource data, notifying client, and then you pass whatever object you wish to use as the client, using cache, yes or no. What that does is it starts an asynchronous download in the background, and as events come in on that download, your client is notified with the methods listed below.
Those methods are provided in an informal protocol, so you implement only the ones you're interested in and the others get ignored. You can see the events there. You get told when resource data comes in, when the download finishes, if the download was cancelled, and if the download fails.
[Transcript missing]
So, one of the side effects, well, one of the reasons for this is so that NSURL handle itself can provide a more complex API, and when you need to do more sophisticated URL work, you work directly with the NSURL handle. You can obtain a handle by passing the message URL handle using cache, yes or no.
Once you've got the handle, you can manipulate it for more direct control. You can also use it to flush the cache, so if you find, once you've used the data as much times as you need to, you can then flush the cache in empty memory. And for the full details, I'm going to point you towards NSURL handle.h.
So here's the good, the bad, and the ugly. NSURL currently, well, NSURL handle currently supports only http and file. Unlike URL access, you're not going to find proxy or firewall support there, nor are you going to find https. However, the good news is we intend to do all of those, and the architecture itself is designed to be extensible so that you can easily add new subclasses to NSURL handle and thereby provide new functionality.
So how do I do that? Well, what you do is you simply subclass NSURL handle. Once you have written the subclass, you register that subclass with NSURL, and once it's registered, NSURL will automatically pick it up, and as it's messaged for its data, it will start using your subclass.
So, start by creating a subclass, implement all the methods inside of nsurl_handle.h that are marked to be overridden, and then call nsurl_handle register_url_handle class to register your subclass. Here's a list of the methods to override. I'm not going to go through them in great detail right now, but it gives you an idea of the scope of the problem. There may be ten methods up there.
From your subclass, you should message the superclass occasionally to inform it, basically, of the state of the download. The two methods you need to use are the ones here. Background load did fail with reason, so when your engine detects a failure, you need to message your superclass to pass that information on. And did_load_bytes_load_completed, which you use to tell the superclass that new bytes have arrived and whether or not the download is now finished.
That's all I have for you today about NSURL and Cocoa. I'm going to ask Richard back up here on the stage. and Thomas as well. Tom Weier is our DTS contact, excuse me, Worldwide Developer Relations contact. So if you have questions about either URL access or the Cocoa side, the NSURL and NSURL handle, he's the guy to contact.