iOS Pose Estimation

Note

If you haven’t set up the SDK yet, make sure to go through those directions first. You’ll need to add the Core library to the app before using the specific feature API or custom model. Follow iOS setup or Android setup directions.

You can use the FritzVisionPoseModel to detect human figures in images and video. The model detects where key body joints are located in an image.

1. Build the FritzVisionPoseModel

To create the pose estimation model, you can either include the model in your bundle or download it over the air once the user installs your app.

Include the model in your application bundle

Add the model to your Podfile

Include Fritz/VisionPoseModel in your Podfile. This will include the model file in your app bundle.

pod 'Fritz/VisionPoseModel'

Make sure to run a pod install with the latest changes.

pod install

Define FritzVisionPoseModel

Define the instance of the FritzVisionPoseModel in your code. There should only be one instance that is reused for each prediction.

import Fritz
let poseModel = FritzVisionPoseModel()
@import Fritz;

FritzVisionPoseModel *poseModel = [[FritzVisionPoseModel alloc] initWithOptionalModel:nil];

Note

Model initialization

It’s important to intialize one instance of the model so you are not loading the entire model into memory on each model execution. Usually this is a property on a ViewController. When loading the model in a ViewController, the following ways are recommended:

Lazy-load the model

By lazy-loading model, you won’t load the model until the first prediction. This has the benefit of not prematurely loading the model, but it may make the first prediction take slghtly longer.

class MyViewController: UIViewController {
  lazy var model = FritzVisionPoseModel()
}

Load model in viewDidLoad

By loading the model in viewDidLoad, you’ll ensure that you’re not loading the model before the view controller is loaded. The model will be ready to go for the first prediction.

class MyViewController: UIViewController {
  let model: FritzVisionPoseModel!

  override func viewDidAppear(_ animated: Bool) {
    super.viewDidAppear(animated)
    model = FritzVisionPoseModel()
  }
}

Alternatively, you can initialize the model property directly. However, if the ViewController is instantiated by a Storyboard and is the Initial View Controller, the properties will be initialized before the appDelegate function is called. This can cause the app to crash if the model is loaded before FritzCore.configure() is called.

Download the model over the air

Add FritzVision to your Podfile

Include Fritz/Vision in your Podfile.

pod 'Fritz/Vision'

Make sure to run a pod install with the latest changes.

pod install

Download Model

import Fritz

var poseModel: FritzVisionPoseModel?

FritzVisionPoseModel.fetchModel { model, error in
   guard let downloadedModel = model, error == nil else { return }

   poseModel = downloadedModel
}
@import Fritz;

[FritzVisionPoseModel fetchModelWithCompletionHandler:^(FritzVisionPoseModel * _Nullable model, NSError * _Nullable error) {
    // Use downloaded pose model
}];

2. Create FritzVisionImage

FritzImage supports different image formats.

  • Using a CMSampleBuffer

    If you are using a CMSampleBuffer from the built-in camera, first create the FritzImage instance:

    let image = FritzVisionImage(buffer: sampleBuffer)
    
    FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithBuffer: sampleBuffer];
    // or
    FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithImage: uiImage];
    

    The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image. By default, if you specify FritzVisionImageMetadata the orientation will be .right:

    image.metadata = FritzVisionImageMetadata()
    image.metadata?.orientation = .left
    
    // Add metdata
    visionImage.metadata = [FritzVisionImageMetadata new];
    visionImage.metadata.orientation = FritzImageOrientationLeft;
    

    Note

    Data passed in from the camera will generally need the orientation set. When using a CMSampleBuffer to create a FritzImage the orientation will change depending on which camera and device orientation you are using.

    When using the back camera in the portrait Device Orientation, the orientation should be .right (the default if you specify FritzVisionImageMetadata on the image). When using the front facing camera in portrait Device Orientation, the orientation should be .left.

    You can initialize the FritzImageOrientation with the AVCaptureConnection to infer orientation (if the Device Orientation is portrait):

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        ...
        image.metadata = FritzVisionImageMetadata()
        image.metadata?.orientation = FritzImageOrientation(connection)
        ...
    }
    
  • Using a UIImage

    If you are using a UIImage, create the FritzVision instance:

    let image = FritzVisionImage(image: uiImage)
    

    The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image:

    image.metadata = FritzVisionImageMetadata()
    image.metadata?.orientation = .right
    

    Note

    UIImage can have associated UIImageOrientation data (for example when capturing a photo from the camera). To make sure the model is correctly handling the orientation data, initialize the FritzImageOrientation with the image’s image orientation:

    image.metadata?.orientation = FritzImageOrientation(image.imageOrientation)
    

3. Run pose predictions

Configure Pose Prediction

Before running pose estimation, you can configure the prediction with a FritzVisionPoseModelOptions object.

Settings
imageCropAndScaleOption

.scaleFit (default)

Crop and Scale option for how to resize and crop the image for the model.

minPartThreshold

0.50 (default)

Minimum confidence score a part must have to be included in a pose.

minPoseThreshold

0.50 (default)

Minimum confidence score a pose must have to be included in result.

nmsRadius

20 (default)

Non-maximum suppression (NMS) distance for Part instances. Two parts

suppress each other if they are less than nmsRadius pixels away.

For example, to build a more lenient FritzVisionPoseModelOptions object:

let options = FritzVisionPoseModelOptions()
options.minPartThreshold = 0.3
options.minPoseThreshold = 0.3
[FritzVisionPoseModelOptions *options = [[FritzVisionPoseModelOptions new];
options.minPartThreshold = 0.3;
options.minPoseThreshold = 0.3;

Run Pose Estimation Model

Use the poseModel instance you created earlier to run predictions:

guard let poseResult = try? poseModel.predict(image) else { return }
// Overlays pose on input image.
let imageWithPose = poseResult.drawPose()
[FritzVisionPoseModelOptions *options = [[FritzVisionPoseModelOptions new];
[poseModel predictWithImage:visionImage options:options completion:^(FritzVisionPoseResult * _Nullable result, NSError *error) {

  UIImage* resultImage = [result drawPose];
}];

4. Get information about poses

Once you have a FritzVisionPoseResult object you can either access the pose result directly or overlay poses on the input image.

Use the pose result directly

There are several body keypoints for each pose:

  • nose
  • left eye
  • right eye
  • left ear
  • right ear
  • left shoulder
  • right shoulder
  • left elbow
  • right elbow
  • left wrist
  • right wrist
  • left hip
  • right hip
  • left knee
  • right knee
  • left ankle
  • right ankle

You can access the results and all detected keypoints from the FritzVisionPoseResult object.

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let pose = poseResult.decodePose()
FritzVisionPoseResult* poseResult;

FritzPose* pose = [poseResult decodePose];

Each Pose has a [Keypoint] and a score. Here is an example using the keypoints to detect arms from a Pose.

guard let pose = poseResult.decodePose() else { return }

let leftArmParts: [PosePart] = [.leftWrist, .leftElbow, .leftShoulder]
let rightArmParts: [PosePart] = [.rightWrist, .rightElbow, .rightShoulder]

var foundLeftArm: [Keypoint] = []
var foundRightArm: [Keypoint] = []

for keypoint in pose.keypoints {
    if leftArmParts.contains(keypoint.part) {
        foundLeftArm.append(keypoint)
    } else if rightArmParts.contains(keypoint.part) {
        foundRightArm.append(keypoint)
    }
}

Overlay pose on input image

You can overlay the pose on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let imageWithPose = poseResult.drawPose()
FritzVisionPoseResult* poseResult;

UIImage* resultImage = [poseResult drawPose];

Multi-pose Estimation

Note

Multi-pose estimation allows developers to track multiple people in the same image. This feature is available is part of the Fritz Premium Plan. To use this model, Upgrade Now.

Install Multi Pose Estimation

Add FritzVisionMultiPoseModel to your Podfile. Note, to access this model, you must be registered with the premium plan:

# Include the pose model file.
pod 'Fritz/VisionPoseModel'

# Add private pod to enable multi-pose estimation.
pod 'FritzVisionMultiPoseModel'

Make sure to run a pod install with the latest changes.

pod install

Detect multiple people

Follow the steps above in Run Pose Estimation Model to get a FritzVisionPoseResult.

Once you have a FritzVisionPoseResult object you can either access the pose result directly or overlay poses on the input image.

Use the pose result directly

You can access multiple poses and all detected keypoints from the FritzVisionPoseResult object.

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let poses = poseResult.decodeMultiplePoses(numPoses: 10)
FritzVisionPoseResult* poseResult;

NSArray<FritzPose*> * poses = [poseResult decodePoses:10];

Overlay poses on input image

You can overlay the poses on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let imageWithPose = poseResult.drawPoses(numPoses: 10)
FritzVisionPoseResult* poseResult;

UIImage* resultImage = [poseResult drawNumPoses:10];

Pose Smoothing

To help improve stability of predictions between frames, use the PoseSmoother class constrained to either the OneEuroPointFilter<Point> or SavitzkyGolayPointFilter<Point> filter classes.

1-Euro Filter

“The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag.”

- 1-Euro point filter Paper

The 1-Euro filter runs in real-time with parameters minCutoff and beta which control the amount of lag and jitter.

Parameters
minCutoff

1.0 (default)

Minimum frequency cutoff. Lower values will decrease jitter but increase lag.

beta

0.0 (default)

Higher values of beta will help reduce lag, but may increase jitter.

derivateCutoff

1.0 (default)

Max derivative value allowed. Increasing will allow more sudden movements.

To get a better understanding of how different parameter values affect the results, I recommend trying out the 1-Euro Filter Demo.

let poseSmoother = PoseSmoother<OneEuroPointFilter<Point>>(minCutoff: 1.0, beta: 0.0)

func smoothe(pose: Pose) -> Pose {

    let smoothedPose = poseSmoother.smooth(pose)

    return smoothedPose
}

Savitzky-Golay Filter

A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in a process known as convolution, by fitting successive sub-sets of adjacent data points with a low-degree polynomial by the method of linear least squares.

- Savitzky-Golay wiki

The Savitzky-Golay filter essentially fits a polynomial to a window of data and uses that to smooth data points. The size of the buffer will determine the lag using the filter. If you want to minimize lag, we recommend using the 1-Euro filter.

Parameters
leftScan

2 (default)

Number of datapoints in window to look back to approximate polynomial.

rightScan

2 (default)

Number of datapoints in window to look forward to approximate polynomial.

polonomialOrder

2 (default)

Order of polynomial to approximate.

let poseSmoother = PoseSmoother<SavitzkyGolayFilter<Point>>()

func smoothe(pose: Pose) -> Pose {

    let smoothedPose = poseSmoother.smooth(pose)

    return smoothedPose
}