Graphics and Games • iOS • 27:37
ARKit 4 enables you to build the next generation of augmented reality apps to transform how people connect with the world around them. We’ll walk you through the latest improvements to Apple’s augmented reality platform, including how to use Location Anchors to connect virtual objects with a real-world longitude, latitude, and altitude. Discover how to harness the LiDAR Scanner on iPad Pro and obtain a depth map of your environment. And learn how to track faces in AR on more devices, including the iPad Air (3rd generation), iPad mini (5th generation), and all devices with the A12 Bionic chip or later that have a front-facing camera. To get the most out of this session, you should be familiar with how your apps can take advantage of LiDAR Scanner on iPad Pro. Watch “Advanced Scene Understanding in AR” for more information. Once you’ve learned how to leverage ARKit 4 in your iOS and iPadOS apps, explore realistic rendering improvements in “What’s New in RealityKit” and other ARKit features like People Occlusion and Motion Capture with “Introducing ARKit 3”.
Speakers: Quinton Petty, Praveen Gowda
Downloads from Apple
Transcript
Hello and welcome to WWDC. Hi, my name's Quinton, and I'm an engineer on the ARKit team. Today, both Praveen and I get to show you some of the new features in ARKit with iOS 14. So let's jump right in and explore ARKit 4. This release adds many advancements to ARKit, which already powers the world's largest AR platform-- iOS. ARKit gives you the tools to create AR experiences that change the way your users see the world.
Some of these tools include device motion tracking, camera scene capture and advanced scene processing, which all help to simplify the task of building a realistic and immersive AR experience. Let's see what's next with ARKit. So first, we're gonna take a look at the Location Anchor API. Location anchors bring your AR experience onto the global scale by allowing you to position virtual content in relation to the globe. Then we'll see what the new LiDAR sensor brings to ARKit with Scene Geometry. Scene Geometry provides apps with a mesh of the surrounding environment that can be used for everything from occlusion to lighting.
Next, we'll look at the technology that enables Scene Geometry-- the Depth API. We're opening up this API to give apps access to a dense depth map to enable new possibilities using the LiDAR sensor. And additionally, the LiDAR sensor improves object placement. We'll go over some best practices to make sure your apps take full advantage of the newest object placement techniques. And we'll wrap up with some improvements to Face Tracking.
Let's start with location anchors. Before we get too far, let's look at how we got to this point. ARKit started on iOS with the best tracking. No QR codes, no external equipment needed-- just start an AR experience by placing content around you. Then we added multi-user experiences. Your AR content could then be shared with a friend using a separate device to make experiences social. And last year, we brought people into ARKit. AR experiences are now aware of the people in the scene. Motion Capture is possible with just a single iOS device, and People Occlusion makes AR content even more immersive, as people can walk right in front of a virtual object.
All these features combine to make some amazing experiences, but what's next? So now we're bringing AR into the outdoors with location anchors. Location anchors enable you to place AR content in relation to the globe. This means you can now place virtual objects and create AR experiences by specifying a latitude, longitude and altitude.
ARKit will take your geographic coordinates, as well as high-resolution map data from Apple Maps, to place your AR experiences at the specific world location. This whole process is called visual localization, and it will precisely locate your device in relation to the surrounding environment more accurately than could be done before with just GPS.
All this is possible due to advanced machine-learning techniques running right on your device. There's no processing in the cloud and no images sent back to Apple. ARKit also takes care of merging the local coordinate system to the geographic coordinate system so you can work in one unified system regardless of how you want to create your AR experiences. To access these features, we've added a new configuration, ARGeoTrackingConfiguration, and ARGeoAnchors are what you'll use to place in content the same way as other ARKit anchors.
Let's see some location anchors in action. We've got a video here in front of the Ferry Building in San Francisco. You can see a large virtual sculpture that's actually the Companion sculpture created by KAWS and viewed in the Acute Art app. Since it was placed with location anchors, everyone who uses the app at the Ferry Building can enjoy the virtual art in the same place and the same way. Let's see what's under the hood in ARKit to make this all work.
So when using geo-tracking, we download all the detailed map data from Apple Maps around your current location. Part of this data is a localization map that contains feature points of the surrounding area that can be seen from the street. Then, with the localization map, your current location and images from your device, we can use advanced machine learning to visually localize and determine your device's position. All this is happening under the hood in ARKit to give you a precise, globally aware pose without worrying about any of this complexity.
The Location Anchor API can be broken down into three main parts. ARGeoTrackingConfiguration is the configuration that you'll use to take advantage of all the new location anchor features. This configuration contains a subset of the world-tracking features that are compatible with geo-tracking. Then, once you've started an AR session with a geo-tracking configuration, you'll be able to create ARGeoAnchors just like any other ARKit anchor.
And also, while using geo-tracking, there's a new tracking status that's important to monitor. This is contained in ARGeoTrackingStatus and provides valuable feedback to improve the geo-tracking experience. So building an app with location anchors can be broken down into a few steps. The first is checking availability of geo-tracking. ARGeoTrackingConfiguration has a few methods that let us check the preconditions to using the rest of the Location Anchor API.
Then, location anchors can be added once we know there's full geo-tracking support, and after anchors are added, we can use a rendering engine to place virtual content. We'll then need to take care of geo-tracking transitions. Once started, geo-tracking will move through a few states that may need some user intervention to ensure the best geo-tracking experience. Let's build a simple point-of-interest app to see what these steps look like in practice.
In our app, we're gonna start with helping our users find the iconic Ferry Building in San Francisco, California. As you can see, we've placed a sign to make the building easy to spot. To begin the app, let's first start with checking availability. As with many ARKit features, we need to make sure the current device is supported before attempting to start an experience. Location anchors are available on devices with an A12 Bionic chip and newer as well as GPS. ARGeoTrackingConfiguration's isSupported class method should be used to check for this support.
For geo-tracking, we also need to check if the current location is supported. We need to be in a location that has all the required Maps data to localize. The geo-tracking configuration has a method to check your current location's support as well as an arbitrary latitude and longitude. Additionally, once a geo-tracking session is started, ARKit will ask the user for permission for both camera and location. ARKit has always asked for camera permission, but location permission is new to geo-tracking. Let's see what this looks like in code.
ARGeoTrackingConfiguration has all the class methods that we need to check before starting our AR session. We'll first check if the current device is supported with isSupported. Then we'll check if our current location is available for geo-tracking with checkAvailability. If this check fails, we'll get an error with more info to display to the user-- for example, if the user hasn't given the app location permissions.
Then once we know our current device and location are supported, we can go ahead and start the session. Since we're using RealityKit, we'll need our ARView and then update the configuration. By default, ARView uses a world-tracking configuration, and so we need pass in a geo-tracking configuration when running the session.
The next step is adding a location anchor. To do this, we'll use the new ARAnchor subclass, ARGeoAnchor. Geo anchors are similar to existing ARKit anchors in many ways. However, because geo anchors operate on global coordinates, we can't create them with just transforms. We need to specify their geographic coordinates with latitude, longitude and altitude. The most common way to create geo anchors will be though specifying just latitude and longitude, which this allows ARKit to fill in the altitude based on Maps data of the correct ground level. Let's now add a location anchor to our point-of-interest app.
So for our app, we need to start by finding the Ferry Building's location. One way we can get the latitude and longitude is through the Maps app. When we place a marker in the Maps app, we now get up to six digits of precision after the decimal. It's important to use six or more digits so that we get a precise location to place our content. Once we have a latitude and longitude, we can make a geo anchor. We don't need to specify an altitude because we'll let ARKit use Maps data to determine the elevation of the ground level.
Then, we'll add the geo anchor to our session, and since we're using RealityKit to render our virtual content and we've already created our geo anchor, we can go ahead and attach the anchor to an entity to mark the Ferry Building. Let's run our app and see what it looks like.
We'll start near the Ferry Building in San Francisco looking towards Market Street. And as we pan around, we can see some of the palm trees that line the city. And soon, the Ferry Building will come into view. Our sign looks to be on the ground, which is expected, but the text is rotated.
Since we'd like to find the Ferry Building easily from a distance, we'd really like to have the sign floating a few meters in the air and facing towards the city. So, how do we do this? To position this content, we need to first look at the coordinate system of a geo anchor.
Geo anchors are fixed to the cardinal directions. Their axes are set when you create the anchor, and this orientation will remain unchanged for the rest of the session. A geo anchor's x-axis is always pointed east, and the z-axis is always pointed south for any geographic coordinate. Since we're using a right-handed coordinate system, this leaves positive Y pointing up, away from the ground. Geo anchors, like all other ARKit anchors, are immutable. This means we'll need to use our rendering engine to rotate or translate our virtual objects from the geo anchor's origin. Let's clean up our sign that we placed in front of the Ferry Building.
Here's some RealityKit code to start updating our sign. After getting the signEntity and adding it to the geoAnchorEntity, we want to rotate the sign towards the city. To do this, we'll rotate it by a little less than 90 degrees clockwise, and we'll elevate the sign's position by 35 meters. Both of these operations are in relation to the geoAnchorEntity that we had previously created. Let's see what this looks like in the app.
Now when we pan around and we get to our Ferry Building, our sign is high in the air and we can see it from a distance. The text is much easier to read in this orientation. This looks great, but we're missing some crucial information here about the geo-tracking state that we can use to guide the user to the best geo-tracking experience.
When using a geo-tracking configuration, there's a new geo-tracking status object that's available on ARFrame and ARSessionObserver. ARGeoTrackingStatus encapsulates all the current state information of geo-tracking, similar to the world-tracking information that's available on ARCamera. Within GeoTrackingStatus is a state. This state indicates how far along geo-tracking is during localization. There's also a property that provides more information about the current localization state called GeoTrackingStateReason.
And there's an accuracy provided once geo-tracking localizes. Let's take a closer look at the GeoTrackingState. When an AR session begins, GeoTrackingState starts at Initializing. At this point, geo-tracking is waiting for world tracking to initialize. From Initializing, the tracking state can immediately go to Not Available if geo-tracking isn't supported in the current location. If you're using the checkAvailability class method on GeoTrackingConfiguration, you should rarely get into this state.
Once geo-tracking moves to Localizing, ARKit is receiving images as well as Maps data and is trying to compute pose. However, during both the Initializing and Localizing states, there could be issues detected that prevent localization. These issues are communicated through GeoTrackingStateReason. This reason should be used to inform the user how to help geo-tracking localize. Some possible reasons include the device is pointed too low, which we'd then inform the user to raise the device, or geoDataNotLoaded, and we'd inform the user that a network connection is required. For all possible reasons, have a look at ARGeoTrackingTypes.h.
In general, we want to encourage users to point their devices at buildings and other stationary structures that are visible from the street. Parking lots, open fields, and other environments that dynamically change have a lower chance of localizing. After addressing any GeoTrackingStateReasons, geo-tracking should become Localized. It's at this point that should you start your AR experience.
If you place objects before localization, the objects could jump to unintended locations. Additionally, once localized, ARGeoTrackingAccuracy is provided to help you gate what experiences should be enabled. It's also important to always monitor GeoTrackingState, as it's possible for geo-tracking to move back to Localizing or even Initializing, such as when tracking's lost or map data isn't available.
Let's take a look at how we can add this tracking state to improve our sample app. Now we can see this whole time we were actually localizing when looking at Market Street and the surrounding buildings. As we pan around, we can see from the tracking state that we localize and then the accuracy increases to high.
I think we've got our app just about ready, at least for the Ferry Building. We've added a more expansive location anchor sample project on developer.apple.com that I encourage you to check out after this talk. For more information on the RealityKit features used, check out last year's talk, "Introducing RealityKit and Reality Composer." In our sample app, we saw how to create location anchors by directly specifying coordinates. We already knew the geographic coordinates for the Ferry Building. However, these coordinates could have come from any source, such as our App Bundle, our web back end or, really, any database.
Another way to create a location anchor is via user interaction. We could expand on our app in the future by allowing users to tap the screen to save their own point of interest. GetGeoLocation(ForPoint on ARSession allows us to get geographic coordinates from any world point in ARKit coordinate space. For example, this could have come from a raycast or a location on a plane.
Location anchors are available for you today with iOS 14, and we're starting with support in the San Francisco Bay Area, New York, Los Angeles, Chicago and Miami, with more cities coming through this summer. All iPhones and iPads with an A12 Bionic chip and newer, as well as GPS, are supported.
Also, for any apps that require location anchors exclusively, you can use device capability keys to limit your app in the App Store to only compatible hardware. In addition to the GPS key, you'll need to use a new key for devices with an A12 Bionic chip or newer that's available on iOS 14.
With location anchors, you can now bring your AR experiences onto the global scale. We went over how ARGeoTrackingConfiguration is the entry point to adding location anchors to your app. We saw how to add ARGeoAnchors to your AR scene and how to position content in relation to those anchors. We also saw how ARGeoTrackingStatus can be used to help guide the user to the best geo-tracking experience. And now here's Praveen to tell you more about Scene Geometry.
Hi, everyone. I am Praveen Gowda. I am an engineer on the ARKit team. Today, I am going to take you through some of the APIs available in iOS 14 that help bring the power of the LiDAR Scanner to your applications. In ARKit 3.5, we introduced the Scene Geometry API powered by the LiDAR Scanner on the new iPad Pro.
Before we go into Scene Geometry, let's take a look at how the LiDAR Scanner works. The LiDAR shoots light onto the surroundings and then collects the light reflected off the surfaces in the scene. The depth is estimated by measuring the time it took for the light to go from the LiDAR to the environment and reflect back to the scanner. And this entire process runs millions of times every second.
The LiDAR Scanner is used by the Scene Geometry API to provide a topological map of the environment. This can be optionally fused with semantic classification, which enables apps to recognize and classify physical objects. This provides an opportunity for creating richer AR experiences where apps can now occlude virtual objects with the real world or use physics to enable realistic interactions between virtual and physical objects... or to use virtual lighting on real-world surfaces, and in many other use cases that we've yet to imagine.
Let's take a quick look at Scene Geometry in action. Here's a living room, and once the Scene Geometry API is turned on, the entire visible room is meshed. Triangles vary in size to show the optimum detail for each surface. The color mesh appears when semantic classification is enabled. Each color represents a different classification, such as blue for the seats and green for the floor.
As we saw, the Scene Geometry feature is built by leveraging the depth data gathered from the LiDAR Scanner. In iOS 14, we have a new ARKit Depth API that provides access to the same depth data. The API provides a dense depth image where a pixel in the image corresponds to depth in meters from the camera. What we see here is a debug visualization of this depth where there is a gradient from blue to red where blue represents regions closer to the camera and red represents those away.
The depth data would be available at 60 Hz associated with each ARFrame. The Scene Geometry feature is built on top of this API where depth data across multiple frames are aggregated and processed to construct a 3D mesh. This API is powered by the LiDAR Scanner and hence will be available on devices which have LiDAR.
Here is an illustration of how the depth map is generated. The colored RGB image from the wide-angle camera and the depth readings from the LiDAR Scanner are fused together using advanced machine-learning algorithms to create a dense depth map that is exposed through the API. This operation runs at 60 times per second with the depth map available on every ARFrame.
To access the depth data, each ARFrame will have a new property called sceneDepth. This provides an object of type ARDepthData. ARDepthData is a container for two buffers-- one is a depth map, and the other is a confidence map. The depth map is a CVPixelBuffer where each pixel represents depth and is in meters. And this depth corresponds to the distance from the plane of the camera to a point in the world. One thing to note is that the depth map is smaller in resolution compared to the capturedImage on ARFrame but still preserves the same aspect ratio.
The other buffer on the ARDepthData object is the confidence map. Since the measurement of depth using LiDAR is based on the light which reflects from objects, the accuracy of the depth map can be impacted by the nature of the surrounding environment. Challenging surfaces, such as those which are highly reflective or those with high absorption, can lower the accuracy of the depth.
This accuracy is expressed through a value we call "confidence." For each depth pixel, there is a corresponding confidence value of type ARConfidenceLevel, and this value can either be low, medium or high and will help to filter depth based on the requirements of your application. Let's see how we can use the Depth API. I begin with creating an AR session and a world-tracking configuration.
There is a new frame semantic called sceneDepth which allows you to turn on the Depth API. As always, I check if the frame semantic is supported on the device using the supportsFrameSemantic method on the configuration class. Then, we can set the frame semantic to sceneDepth and run the configuration. After this, I can access the depth data from the sceneDepth property on ARFrame using the didUpdateFrame delegate method.
Additionally, if you have an AR app that uses People Occlusion feature and hence sets the personSegmentationWithDepth frame semantic, then you will automatically get scene depth on devices that support the sceneDepth frame semantic with no additional power cost to your application. Here is a demo of an app that we built using the Depth API.
The depth from the depth map is un-projected to 3D to form a point cloud. The point cloud is colored using the capturedImage on the ARFrame. By accumulating depth data across multiple ARFrames, we get a dense 3D point cloud like the one we see here. I can also filter the point clouds based on the confidence level. This is the point cloud formed by all the depth pixels, including those with low confidence.
And here is the point cloud by filtering depth whose confidence is medium or high. And this is the point cloud we get by using only the depth which has high confidence. This gives us a clear picture of how the physical properties of surfaces impact the confidence level of its depth.
Your application and its tolerance to inaccuracies in depth will determine how you will filter the depth based on its confidence level. Let's take a closer look at how we built this app. For each ARFrame, we access the sceneDepth property with the ARDepthData object, providing us with the depth and the confidence map.
The key part of the app is a Metal vertex shader called "unproject." As the name suggests, it un-projects the depth data from the depth map to the 3D space using parameters on the ARCamera such as the camera's transform, its intrinsics and the projection matrix. The shader also uses capturedImage to sample color for each depth pixel. What we get as an output of this is a 3D point cloud which is then rendered using Metal.
To summarize, we have a new Depth API in ARKit 4 which gives a highly accurate representation of the world. There is a frame semantic called sceneDepth which allows you to enable the feature. Once enabled, the depth data will be available at 60 Hz on each ARFrame. The depth data will have a depth map and a confidence map. And the API is supported on devices with a LiDAR Scanner.
One of the fundamental tasks in many AR apps is placing objects. And in ARKit 3, we introduced the raycasting API to make object placement easier. In ARKit 4, the LiDAR Scanner brings some great improvement to raycasting. Raycasting is highly optimized for object placement and makes it easy to precisely place virtual objects in your AR app. Placing objects in ARKit 4 is more precise and quicker thanks to the LiDAR Scanner. Your apps that already use raycasting will automatically benefit on a LiDAR-enabled device.
Raycasting also leverages Scene Depth or Scene Geometry when available to instantly place objects in AR. This works great, even on featureless surfaces such as white walls. In iOS 14, the raycast API is recommended over hit-testing for object placement. Before you can start raycasting, you will need to create a raycast query. A raycast query describes the direction and the behavior of the ray used for raycasting.
It is composed of a raycast target which describes the type of surface that a ray can intersect with. Existing planes correspond to planes detected by ARKit while considering the shape and size of the plane. Infinite planes are the same planes, but with their shape and size ignored. And estimated planes are planes of arbitrary orientation formed from the feature points around a surface.
The raycast target alignment specifies the alignment of surfaces that a ray can intersect with. This can be horizontal, vertical or any. There are two types of raycasts. There are single-shot raycasts which return a one-time result, and then there are tracked raycasts which continuously update the results as ARKit's understanding of the world evolves. In order to get the latest features for object placement, we're recommending migrating to the raycasting API as we deprecate hit-testing.
The code we see on the top is extracted from a sample app which uses hit-testing to place objects. It performs a hit-test with three different kinds of hit-test options, and it is usually followed by some custom heuristics to filter those results and figure out where to place the object. All of that can be replaced with a few lines of raycasting code, like the one we see below, and ARKit will do the heavy lifting under the hood to make sure that your virtual objects always stay at the right place.
Raycasting makes it easier than ever before to precisely place virtual objects in your ARKit applications. Let's move over to Face Tracking. Face Tracking allows you to detect faces in your front-camera AR experience, overlay virtual content on them and animate facial expressions in real time. This is supported on all devices with a TrueDepth camera. Now, with ARKit 4, Face Tracking support is extended to devices without a TrueDepth camera as long as they have an Apple A12 Bionic processor or later.
This includes devices without the TrueDepth camera, such as the new iPhone SE. Elements of Face Tracking, such as face anchors, face geometry and blend shapes, will be available on all supported devices. But captured depth data will be limited to devices with a TrueDepth camera. And that's ARKit 4.
With location anchors, you can now bring your AR experiences onto the global scale. And we looked at how you can use the LiDAR to build rich AR apps using the Scene Geometry and the Depth API. There are exciting improvements in raycasting to make object placement in AR easier than ever before. And finally, Face Tracking is now supported on a wider range of devices. Thank you, and we can't wait to check out all the great apps that you will build using ARKit 4.