iOS Pose Estimation

You can use the FritzVisionPoseModel to detect human figures in images and video. The model detects where key body joints are located in an image.

1. Build the FritzVisionPoseModel

To create the pose estimation model, you can either include the model in your bundle or download it over the air once the user installs your app.

Include the model in your application bundle

Add the model to your Podfile

Include Fritz/VisionPoseModel in your Podfile. This will include the model file in your app bundle.

pod 'Fritz/VisionPoseModel'

Make sure to run a pod install with the latest changes.

pod install

Define FritzVisionPoseModel

Define the instance of the FritzVisionPoseModel in your code. There should only be one instance that is reused for each prediction.

import Fritz
let poseModel = FritzVisionPoseModel()
@import Fritz;

FritzVisionPoseModel *poseModel = [[FritzVisionPoseModel alloc] initWithOptionalModel:nil];

Download the model over the air

Add FritzVision to your Podfile

Include Fritz/Vision in your Podfile.

pod 'Fritz/Vision'

Make sure to run a pod install with the latest changes.

pod install

Download Model

import Fritz

var poseModel: FritzVisionPoseModel?

FritzVisionPoseModel.fetchModel { model, error in
   guard let downloadedModel = model, error == nil else { return }

   poseModel = downloadedModel
}
@import Fritz;

[FritzVisionPoseModel fetchModelWithCompletionHandler:^(FritzVisionPoseModel * _Nullable model, NSError * _Nullable error) {
    // Use downloaded pose model
}];

2. Create FritzVisionImage

FritzImage supports different image formats.

  • Using a CMSampleBuffer

    If you are using a CMSampleBuffer from the built-in camera, first create the FritzImage instance:

    let image = FritzVisionImage(buffer: sampleBuffer)
    
    FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithBuffer: sampleBuffer];
    // or
    FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithImage: uiImage];
    

    The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image. By default, if you specify FritzVisionImageMetadata the orientation will be .right:

    image.metadata = FritzVisionImageMetadata()
    image.metadata?.orientation = .left
    
    // Add metdata
    visionImage.metadata = [FritzVisionImageMetadata new];
    visionImage.metadata.orientation = FritzImageOrientationLeft;
    

    Note

    Data passed in from the camera will generally need the orientation set. When using a CMSampleBuffer to create a FritzImage the orientation will change depending on which camera and device orientation you are using.

    When using the back camera in the portrait Device Orientation, the orientation should be .right (the default if you specify FritzVisionImageMetadata on the image). When using the front facing camera in portrait Device Orientation, the orientation should be .left.

    You can initialize the FritzImageOrientation with the AVCaptureConnection to infer orientation (if the Device Orientation is portrait):

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        ...
        image.metadata = FritzVisionImageMetadata()
        image.metadata?.orientation = FritzImageOrientation(connection)
        ...
    }
    
  • Using a UIImage

    If you are using a UIImage, create the FritzVision instance:

    let image = FritzVisionImage(image: uiImage)
    

    The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image:

    image.metadata = FritzVisionImageMetadata()
    image.metadata?.orientation = .right
    

    Note

    UIImage can have associated UIImageOrientation data (for example when capturing a photo from the camera). To make sure the model is correctly handling the orientation data, initialize the FritzImageOrientation with the image’s image orientation:

    image.metadata?.orientation = FritzImageOrientation(image.imageOrientation)
    

3. Run pose predictions

Configure Pose Prediction

Before running pose estimation, you can configure the prediction with a FritzVisionPoseModelOptions object.

Settings
imageCropAndScaleOption

.scaleFit (default)

Crop and Scale option for how to resize and crop the image for the model.

minPartThreshold

0.50 (default)

Minimum confidence score a part must have to be included in a pose.

minPoseThreshold

0.50 (default)

Minimum confidence score a pose must have to be included in result.

nmsRadius

20 (default)

Non-maximum suppression (NMS) distance for Part instances. Two parts

suppress each other if they are less than nmsRadius pixels away.

For example, to build a more lenient FritzVisionPoseModelOptions object:

let options = FritzVisionPoseModelOptions()
options.minPartThreshold = 0.3
options.minPoseThreshold = 0.3
[FritzVisionPoseModelOptions *options = [[FritzVisionPoseModelOptions new];
options.minPartThreshold = 0.3;
options.minPoseThreshold = 0.3;

Run Pose Estimation Model

Use the poseModel instance you created earlier to run predictions:

poseModel.predict(image) { result, error in
  guard error == nil, let poseResult = result else { return }

  // Overlays pose on input image.
  let imageWithPose = poseResult.drawPose()
}
[FritzVisionPoseModelOptions *options = [[FritzVisionPoseModelOptions new];
[poseModel predictWithImage:visionImage options:options completion:^(FritzVisionPoseResult * _Nullable result, NSError *error) {

  UIImage* resultImage = [result drawPose];
}];

4. Get information about poses

Once you have a FritzVisionPoseResult object you can either access the pose result directly or overlay poses on the input image.

Use the pose result directly

There are several body keypoints for each pose:

  • nose
  • left eye
  • right eye
  • left ear
  • right ear
  • left shoulder
  • right shoulder
  • left elbow
  • right elbow
  • left wrist
  • right wrist
  • left hip
  • right hip
  • left knee
  • right knee
  • left ankle
  • right ankle

You can access the results and all detected keypoints from the FritzVisionPoseResult object.

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let pose = poseResult.decodePose()
FritzVisionPoseResult* poseResult;

FritzPose* pose = [poseResult decodePose];

Each Pose has a [Keypoint] and a score. Here is an example using the keypoints to detect arms from a Pose.

guard let pose = poseResult.decodePose() else { return }

let leftArmParts: [PosePart] = [.leftWrist, .leftElbow, .leftShoulder]
let rightArmParts: [PosePart] = [.rightWrist, .rightElbow, .rightShoulder]

var foundLeftArm: [Keypoint] = []
var foundRightArm: [Keypoint] = []

for keypoint in pose.keypoints {
    if leftArmParts.contains(keypoint.part) {
        foundLeftArm.append(keypoint)
    } else if rightArmParts.contains(keypoint.part) {
        foundRightArm.append(keypoint)
    }
}

Overlay pose on input image

You can overlay the pose on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let imageWithPose = poseResult.drawPose()
FritzVisionPoseResult* poseResult;

UIImage* resultImage = [poseResult drawPose];

Multi-pose Estimation

Note

Multi-pose estimation allows developers to track multiple people in the same image. This feature is available is part of the Fritz Premium Plan. To use this model, Upgrade Now.

Install Multi Pose Estimation

Add FritzVisionMultiPoseModel to your Podfile. Note, to access this model, you must be registered with the premium plan:

# Include the pose model file.
pod 'Fritz/VisionPoseModel'

# Add private pod to enable multi-pose estimation.
pod 'FritzVisionMultiPoseModel'

Make sure to run a pod install with the latest changes.

pod install

Detect multiple people

Follow the steps above in Run Pose Estimation Model to get a FritzVisionPoseResult.

Once you have a FritzVisionPoseResult object you can either access the pose result directly or overlay poses on the input image.

Use the pose result directly

You can access multiple poses and all detected keypoints from the FritzVisionPoseResult object.

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let poses = poseResult.decodeMultiplePoses(numPoses: 10)
FritzVisionPoseResult* poseResult;

NSArray<FritzPose*> * poses = [poseResult decodePoses:10];

Overlay poses on input image

You can overlay the poses on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult

let imageWithPose = poseResult.drawPoses(numPoses: 10)
FritzVisionPoseResult* poseResult;

UIImage* resultImage = [poseResult drawNumPoses:10];