iOS

Note

If you haven’t set up the SDK yet, make sure to go through those directions first. You’ll need to add the Core library to the app before using the specific feature API or custom model. Follow iOS setup or Android setup directions.

3D Pose Estimation on iOS combines neural networks, traditional computer vision techniques, and ARKit to estimate an object’s position in the real world.

1. Create a FritzVisionCustomPoseModel

In order to estimate the real-world coordinates of the object, you’ll need to create a custom Pose Model. This will detect the specific keypoints on the object you’d like to track.

For more information on training a model to track a specific object, please contact sales@fritz.ai.

2. Create a FritzVisionRigidBodyPoseLifting model

// Local 3D Object keypoints (specific to the object to track)
let object3DPoints: [SCNVector3] = {
  let points: [SCNVector3] = [
    SCNVector3(-10, 5, 5) / 100.0,
    SCNVector3(-10, -5, -5) / 100.0,
    SCNVector3(10, 5, 5) / 100.0,
    SCNVector3(10, -5, -5) / 100.0,
    SCNVector3(0.0, 0.0, 0.0)
    ].map { $0  }
  return points
}()

let liftingModel = FritzVisionRigidBodyPoseLifting(model: poseModel, modelPoints: object3DPoints)

Note

Model initialization

It’s important to intialize one instance of the model so you are not loading the entire model into memory on each model execution. Usually this is a property on a ViewController. When loading the model in a ViewController, the following ways are recommended:

Lazy-load the model

By lazy-loading model, you won’t load the model until the first prediction. This has the benefit of not prematurely loading the model, but it may make the first prediction take slghtly longer.

class MyViewController: UIViewController {
  lazy var model = FritzVisionPoseModel()
}

Load model in viewDidLoad

By loading the model in viewDidLoad, you’ll ensure that you’re not loading the model before the view controller is loaded. The model will be ready to go for the first prediction.

class MyViewController: UIViewController {
  let model: FritzVisionPoseModel!

  override func viewDidAppear(_ animated: Bool) {
    super.viewDidAppear(animated)
    model = FritzVisionPoseModel()
  }
}

Alternatively, you can initialize the model property directly. However, if the ViewController is instantiated by a Storyboard and is the Initial View Controller, the properties will be initialized before the appDelegate function is called. This can cause the app to crash if the model is loaded before FritzCore.configure() is called.

3. Create FritzVisionImage

FritzImage supports different image formats.

  • Using a CMSampleBuffer

    If you are using a CMSampleBuffer from the built-in camera, first create the FritzImage instance:

    let image = FritzVisionImage(buffer: sampleBuffer)
    
    FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithBuffer: sampleBuffer];
    // or
    FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithImage: uiImage];
    

    The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image. By default, if you specify FritzVisionImageMetadata the orientation will be .right:

    image.metadata = FritzVisionImageMetadata()
    image.metadata?.orientation = .left
    
    // Add metdata
    visionImage.metadata = [FritzVisionImageMetadata new];
    visionImage.metadata.orientation = FritzImageOrientationLeft;
    

    Note

    Data passed in from the camera will generally need the orientation set. When using a CMSampleBuffer to create a FritzImage the orientation will change depending on which camera and device orientation you are using.

    When using the back camera in the portrait Device Orientation, the orientation should be .right (the default if you specify FritzVisionImageMetadata on the image). When using the front facing camera in portrait Device Orientation, the orientation should be .left.

    You can initialize the FritzImageOrientation with the AVCaptureConnection to infer orientation (if the Device Orientation is portrait):

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        let image = FritzVisionImage(sampleBuffer: sampleBuffer, connection: connection)
        ...
    }
    
  • Using a UIImage

    If you are using a UIImage, create the FritzVision instance:

    let image = FritzVisionImage(image: uiImage)
    

    The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image:

    image.metadata = FritzVisionImageMetadata()
    image.metadata?.orientation = .right
    

    Note

    UIImage can have associated UIImageOrientation data (for example when capturing a photo from the camera). To make sure the model is correctly handling the orientation data, initialize the FritzImageOrientation with the image’s image orientation:

    image.metadata?.orientation = FritzImageOrientation(image.imageOrientation)
    

4. Run 2D predictions to get the location of the object’s keypoints

Estimate the object’s keypoints on the image passed in.

/// Define pose lifting options (see Advanced Options below)
lazy var liftingOptions: FritzVisionRigidBodyPoseLiftingOptions = {
  let options = FritzVisionRigidBodyPoseLiftingOptions()
  options.excludedKeypointIndices = [4]
  options.requiredKeypointsMeetingThreshold = 3
  options.keypointThreshold = 0.7
  options.smoothingOptions = PoseSmoothingOptions(minCutoff: 0.035, beta: 0.004, derivateCutoff: 0.220)
  options.orientationFlipAngleThreshold = 100
  return options
}()

guard let (pose, result) = liftingModel.run2DPrediction(image, options: liftingOptions)
  else { return }

5. Infer the 3D Pose from the 2D Keypoints

Next, use the detected 2D keypoints in order to infer the 3D Pose of the object. This pose can be applied to any 3D-model that you wish to place in AR, matching the location, rotation, and orientation of the detected rigid body.

var sceneView: ARSCNView!

/// ...

let frame = self.sceneView.session.currentFrame

let poseResult = liftingModel.infer3DPose(pose, image: image, frame: frame, options: liftingOptions)

With the poseResult, we

6. Place an AR object using ARKit

Place an AR object on top of the detected object keypoints.

// A defined AR Object to place
let SCNNode arObjectNode = ...

let pov = self.sceneView.pointOfView

SCNTransaction.begin()
let rotated = SCNMatrix4Rotate(poseResult.scenekitCameraTransform, -.pi / 2, 0, 0, 1)
arObjectNode.transform = pov.convertTransform(rotated, to: nil)
SCNTransaction.commit()

Advanced Options

Configure the FritzVisionRigidPosePredictor

You can configure the predictor with FritzVisionRigidBodyPoseLiftingOptions to return specific results.

lazy var liftingOptions: FritzVisionRigidBodyPoseLiftingOptions = {
  let options = FritzVisionRigidBodyPoseLiftingOptions()
  options.excludedKeypointIndices = [4]
  options.requiredKeypointsMeetingThreshold = 3
  options.keypointThreshold = 0.7
  options.smoothingOptions = PoseSmoothingOptions(minCutoff: 0.035, beta: 0.004, derivateCutoff: 0.220)
  options.orientationFlipAngleThreshold = 100
  return options
}()

Pose and Keypoint Smoothing

To help improve stability of predictions between frames, use the PoseSmoothingOptions class which uses 1-Euro filters.

let options = FritzVisionRigidBodyPoseLiftingOptions()
options.smoothingOptions = PoseSmoothingOptions(minCutoff: 0.035, beta: 0.004, derivateCutoff: 0.220)

“The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag.”

1-Euro filter parameters
Parameter Description
minCutoff (default: 1) Minimum frequency cutoff. Lower values will decrease jitter but increase lag.
beta (default: 0) Higher values of beta will help reduce lag, but may increase jitter.
derivateCutoff (default: 1) Max derivative value allowed. Increasing will allow more sudden movements.