Pose Estimation

Pose estimation is a computer vision technique that predicts and tracks the location of important features on person or object. App developers can use Pose Estimation to build AI-powered coaches for sports and fitness, immersive AR experiences, and more.

Pose Estimation In Action

(Left to right) Human Pose Estimation, Custom Pose Estimation, Rigid Pose Estimation

Concepts

Before diving into the API, take a few minutes to familiarize yourself with the following high level concepts.

Keypoints

Keypoints are the specific features of an object we want to location in an image. For example, in the case of human pose estimation, keypoints are joints like shoulders, elbows, and knees.

On a car, keypoints might be each tire or a front headlight. 2D pose estimation models predict the (X, Y) coordinates of keypoints relative to the input image while 3D pose estimation predicts the (X, Y, Z) coordinates of keypointed, providing depth as well.

warning

Unless otherwise specified, the Pose Estimation APIs perform 2D pose estimation.

Skeleton

A Skeleton is defined by a group of keypoints and their connections. Skeletons help decode raw model outputs to make them useful application logic, data organization, and visualization.

For example, the human skeleton for human pose estimation model contains 17 keypoints and connections required to draw a stick figure on people in an image. Each pose estimation model must have an associated skeleton. If you have trained a custom pose estimation model, you'll need to define this skeleton in your application code.

note

When using the Fritz Data Collection API to gather model predictions from real world use, the skeleton associated with a model is automatically used to define annotation configurations for viewing and labeling images.

Human Pose Estimation

Human pose estimation is a specific use case of pose estimation that uses a model to predict the location of people in images.

Our pre-trained Human Pose Estimation models locates 17 body parts and joints for each person detected in an image.

Custom Pose Estimation

Custom pose estimation refers pose estimation models trained using templates and notebooks provided by Fritz AI. Custom pose models are compatible with the Pose Estimation SDK which handles pre- and post-processing as well as output visualization, data collection, and model management.

Rigid Pose Estimation

Rigid pose estimation is a method of performing 3D pose estimation by predicting and object's position in two dimensions and then using information about the camera's position and the object's size to lift that 2D pose into 3D space.

This technique requires that the object be "rigid", meaning it cannot bend or deform in any way. Rigid pose estimation is useful for augmented and virtual reality applications where physical objects can interact with virtual ones. It is not possible to perform rigid pose estimation for 3D human pose estimation.


Human Pose Estimation

Use Human Pose Estimation to track the position of people in images and video. Build an AI-powered fitness coach, immersive AR experiences, and more.

Pre-trained Model

NameExampleDescriptionKeypoints
HumanA model that tracks one or more poses in the scene. Identifies 17 body keypoints.Face: nose, eyes, ears; Torso: shoulders, elbows, wrists; Legs: hips, knees, ankles. View all COCO keypoints

Technical Specifications

ArchitectureFormat(s)Model SizeInputOutputBenchmarks
MobileNet backboneCore ML (iOS), TensorFlow Lite (Android)~5 MB353x257-pixel imagePosition of each person and body part detected, Number of people detected, The confidence associated with each detection20 FPS on iPhone X, 10 FPS on Pixel 2

iOS

Use the FritzVisionHumanPoseModel to detect human figures in images and video. The model estimates the location of body parts and joints relative to a 2D image.

1. Build the FritzVisionHumanPoseModel

To create the pose estimation model, either include the model in your bundle or download it over the air once the user installs your app.

Include the model in your application bundle

Include a pose model variant in your Podfile. This will include the model file in your app bundle.

Choose between fast, accurate, or small variants. Model variants make sure you get the right model for your use case. For more information on the tradeoffs of each variant, see Choosing a Model Variant.

```bash pod 'Fritz/VisionPoseModel/Human/Fast' ```
Define FritzVisionHumanPoseModel

Define the instance of the FritzVisionHumanPoseModel in your code. There should only be one instance that is reused for each prediction.

import Fritz
let poseModel = FritzVisionHumanPoseModelFast()
Model initialization

It's important to intialize one instance of the model so you are not loading the entire model into memory on each model execution. Usually this is a property on a ViewController. When loading the model in a ViewController, the following ways are recommended:

Lazy-load the model

By lazy-loading model, you won't load the model until the first prediction. This has the benefit of not prematurely loading the model, but it may make the first prediction take slghtly longer.

class MyViewController: UIViewController {
lazy var model = FritzVisionHumanPoseModelFast()
}

Load model in viewDidLoad

By loading the model in viewDidLoad, you'll ensure that you're not loading the model before the view controller is loaded. The model will be ready to go for the first prediction.

class MyViewController: UIViewController {
let model: FritzVisionHumanPoseModelFast!
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
model = FritzVisionHumanPoseModelFast()
}
}

Alternatively, you can initialize the model property directly. However, if the ViewController is instantiated by a Storyboard and is the Initial View Controller, the properties will be initialized before the appDelegate function is called. This can cause the app to crash if the model is loaded before FritzCore.configure() is called.

Download the model over the air
Only available on Growth plans

For more information on plans and pricing, visit our website.

Include Fritz/Vision in your Podfile.

pod 'Fritz/Vision'
pod install
import Fritz
var poseModel: FritzVisionHumanPoseModel?
FritzVisionHumanPoseModel.fetchModel { model, error in
guard let downloadedModel = model, error == nil else { return }
poseModel = downloadedModel
}

2. Create a FritzVisionImage

FritzVisionImage supports different image formats.

Using a CMSampleBuffer

If you are using a CMSampleBuffer from the built-in camera, first create the FritzVisionImage instance:

let image = FritzVisionImage(buffer: sampleBuffer)
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithBuffer: sampleBuffer];
// or
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithImage: uiImage];

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image. By default, if you specify FritzVisionImageMetadata the orientation will be .right:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .left
// Add metdata
visionImage.metadata = [FritzVisionImageMetadata new];
visionImage.metadata.orientation = FritzImageOrientationLeft;
Setting the Orientation from the Camera

Data passed in from the camera will generally need the orientation set. When using a CMSampleBuffer to create a FritzVisionImage the orientation will change depending on which camera and device orientation you are using.

When using the back camera in the portrait Device Orientation, the orientation should be .right (the default if you specify FritzVisionImageMetadata on the image). When using the front facing camera in portrait Device Orientation, the orientation should be .left.

You can initialize the FritzImageOrientation with the AVCaptureConnection to infer orientation (if the Device Orientation is portrait):

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let image = FritzVisionImage(sampleBuffer: sampleBuffer, connection: connection)
...
}

Using an UIImage

If you are using an UIImage, create the FritzVisionImage instance:

let image = FritzVisionImage(image: uiImage)

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .right
Set the image orientation

UIImage can have associated UIImageOrientation data (for example when capturing a photo from the camera). To make sure the model is correctly handling the orientation data, initialize the FritzImageOrientation with the image's image orientation:

image.metadata?.orientation = FritzImageOrientation(image.imageOrientation)

3. Run pose predictions

Configure the predictor

Before running pose estimation, you can configure the prediction with a FritzVisionPoseModelOptions object.

FritzVisionPoseModelOptions

OptionDefaultDescription
imageCropAndScaleOption.scaleFitCrop and Scale option for how to resize and crop the image for the model.
minPartThreshold0.50Minimum confidence score a part must have to be included in a pose.
minPoseThreshold0.50Minimum confidence score a pose must have to be included in result.
smoothingOptionsOneEuroPointFilter.lowPose smoothing options for predictions. By default applies light smoothing. Setting this to nil will disable pose smoothing.
nmsRadius20Non-maximum suppression (NMS) distance for Part instances. Two parts suppress each other if they are less than nmsRadius pixels away.

For example, to build a more lenient FritzVisionPoseModelOptions object:

let options = FritzVisionPoseModelOptions()
options.minPartThreshold = 0.3
options.minPoseThreshold = 0.3

Next, use the poseModel instance you created earlier to run predictions:

guard let poseResult = try? poseModel.predict(image),
let pose = poseResult.pose()
else { return }
// Overlays pose on input image.
let imageWithPose = image.draw(pose)

4. Get information about poses

Once you have a FritzVisionPoseResult object you can either access the pose result directly or overlay poses on the input image.

Use the pose result directly

There are several body keypoints for each pose:

  • nose
  • left eye
  • right eye
  • left ear
  • right ear
  • left shoulder
  • right shoulder
  • left elbow
  • right elbow
  • left wrist
  • right wrist
  • left hip
  • right hip
  • left knee
  • right knee
  • left ankle
  • right ankle

You can access the results and all detected keypoints from the FritzVisionPoseResult object. All results are by default normalized from 0 to 1, with (0, 0) in the top left of an up oriented image.

// Created from model prediction.
let poseResult: FritzVisionPoseResult
let pose = poseResult.pose()

Each Pose has a [Keypoint] and a score. Here is an example using the keypoints to detect arms from a Pose.

guard let pose = poseResult.pose() else { return }
let leftArm: [Keypoint] = [
pose.getKeypoint(for: .leftWrist),
pose.getKeypoint(for: .leftElbow),
pose.getKeypoint(for: .leftShoulder)
].compactMap { $0 }
let rightArm: [Keypoint] = [
pose.getKeypoint(for: .rightWrist),
pose.getKeypoint(for: .rightElbow),
pose.getKeypoint(for: .rightShoulder)
].compactMap { $0 }
Overlay pose on input image

You can overlay the pose on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult
let pose = poseResult.pose()
let imageWithPose = image.draw(pose: pose)

Multi-pose estimation

note

Multi-pose estimation allows developers to track multiple people in the same image.

Detect multiple people

You can access multiple poses and all detected keypoints from the FritzVisionPoseResult object.

// Created from model prediction.
let poseResult: FritzVisionPoseResult<HumanSkeleton>
let poses = poseResult.poses(limit: 10)

Overlay poses on input image

You can overlay the poses on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult<HumanSkeleton>
let poses = poseResult.poses()
let imageWithPose = image.draw(poses: poses)

Pose smoothing

To help improve stability of predictions between frames, use the PoseSmoother class constrained to either the OneEuroPointFilter or SavitzkyGolayPointFilter filter classes.

1-Euro Filter

"The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag." - 1-Euro point filter

The 1-Euro filter runs in real-time with parameters minCutoff and beta which control the amount of lag and jitter.

Parameters

OptionDefaultDescription
minCutoff1.0Minimum frequency cutoff. Lower values will decrease jitter but increase lag.
beta0.0Higher values of beta will help reduce lag, but may increase jitter.
derivateCutoff1.0Max derivative value allowed. Increasing will allow more sudden movements.

To get a better understanding of how different parameter values affect the results, I recommend trying out the 1-Euro Filter Demo.

let poseSmoother = PoseSmoother<OneEuroPointFilter, HumanSkeleton>(
options: .init(minCutoff: 1.0, beta: 0.0)
)
func smoothe(pose: Pose) -> Pose {
let smoothedPose = poseSmoother.smooth(pose)
return smoothedPose
}
Savitzky-Golay Filter

" A Savitzky–Golay filter is a digital filter that can be applied to a set of digital data points for the purpose of smoothing the data, that is, to increase the precision of the data without distorting the signal tendency. This is achieved, in a process known as convolution, by fitting successive sub-sets of adjacent data points with a low-degree polynomial by the method of linear least squares." - Savitzky-Golay wiki

The Savitzky-Golay filter essentially fits a polynomial to a window of data and uses that to smooth data points. The size of the buffer will determine the lag using the filter. If you want to minimize lag, we recommend using the 1-Euro filter.

Parameters

OptionDefaultDescription
leftScan2Number of datapoints in window to look back to approximate polynomial.
rightScan2Number of datapoints in window to look forward to approximate polynomial.
polonomialOrder2Order of polynomial to approximate.
let poseSmoother = PoseSmoother<SavitzkyGolayPointFilter, HumanSkeleton>(
options: .init()
)
func smoothe(pose: Pose) -> Pose {
let smoothedPose = poseSmoother.smooth(pose)
return smoothedPose
}

Android

1. Add the dependencies via Gradle

(Optional include model in your app) To include the HumanPoseEstimation model with your build, then you'll need to add the dependency as shown below. Note: This includes the model with your app when you publish it to the play store and will increase your app size.

note

Behind the scenes, Pose Estimation uses a TensorFlow Lite model. In order to include this with your app, you'll need to make sure that the model is not compressed in the APK by setting aaptOptions.

  • Resolution: 353x257
  • Model Size: 2.3MB
dependencies {
implementation 'ai.fritz:vision-pose-estimation-model-fast:+'
}

Now you're ready to transform images with the Pose Estimation API.

2. Get a Pose predictor

In order to use the predictor, the on-device model must first be loaded. If you followed the Optional step above and included the model dependency, you can get a predictor to use immediately:

// For fast
PoseOnDeviceModel onDeviceModel =
FritzVisionModels.getHumanPoseEstimationOnDeviceModel(ModelVariant.FAST);
// For accurate
PoseOnDeviceModel onDeviceModel =
FritzVisionModels.getHumanPoseEstimationOnDeviceModel(ModelVariant.ACCURATE);
// For small
PoseOnDeviceModel onDeviceModel =
FritzVisionModels.getHumanPoseEstimationOnDeviceModel(ModelVariant.SMALL);
FritzVisionPosePredictor predictor =
FritzVision.PoseEstimation.getPredictor(onDeviceModel);

If you did not include the on-device model with your app, you'll have to load the model before you can get a predictor. To do that, you'll use PoseEstimationManagedModel and call FritzVision.PoseEstimation.loadPredictor to start the model download.

FritzVisionPosePredictor predictor;
// For fast
PoseManagedModel managedModel =
FritzVisionModels.getHumanPoseEstimationManagedModel(ModelVariant.FAST);
// For accurate
PoseManagedModel managedModel =
FritzVisionModels.getHumanPoseEstimationManagedModel(ModelVariant.ACCURATE);
// For small
PoseManagedModel managedModel =
FritzVisionModels.getHumanPoseEstimationManagedModel(ModelVariant.SMALL);
FritzVision.PoseEstimation.loadPredictor(managedModel, new PredictorStatusListener<FritzVisionPosePredictor>() {
@Override
public void onPredictorReady(FritzVisionPosePredictor posePredictor) {
Log.d(TAG, "Pose estimation predictor is ready");
predictor = posePredictor;
}
});

3. Create FritzVisionImage

To create a FritzVisionImage from a Bitmap:

FritzVisionImage visionImage = FritzVisionImage.fromBitmap(bitmap);

To create a FritzVisionImage from a media.Image object when capturing the result from a camera, first determine the orientation of the image. This will rotate the image to account for device rotation and the orientation of the camera sensor.

// Get the system service for the camera manager
final CameraManager manager = (CameraManager) getSystemService(Context.CAMERA_SERVICE);
// Gets the first camera id
String cameraId = manager.getCameraIdList().get(0);
// Determine the rotation on the FritzVisionImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
ImageRotation imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId);

Finally, create the FritzVisionImage object with the rotation

FritzVisionImage visionImage = FritzVisionImage.fromMediaImage(image, imageRotationFromCamera);

4. Run prediction

To detect body poses in FritzVisionImage, run the following:

FritzVisionPoseResult poseResult = predictor.predict(visionImage);

The predict method returns back a FritzVisionPoseResult object that contains the following methods:

FritzVisionPoseResult methods

MethodDescription
List<Pose> getPoses()Gets a list of Pose objects.
List<Pose> getPosesByThreshold(float minConfidence)Gets a list of poses above a given threshold.

5. Access the Pose result

FritzVisionPoseResult contains several convenience methods to help draw the keypoints and body position.

Get a bitmap of the pose on the original image
List<Pose> poses = poseResult.getPoses();
Bitmap posesOnImage = visionImage.overlaySkeletons(poses);
Draw the poses onto a canvas
// Draw the pose to the canvas.
List<Pose> poses = poseResult.getPoses();
for (Pose pose : poses) {
pose.draw(canvas);
}
Access the position of specific body keypoints

There are several body keypoints for each pose:

  • nose
  • left eye
  • right eye
  • left ear
  • right ear
  • left shoulder
  • right shoulder
  • left elbow
  • right elbow
  • left wrist
  • right wrist
  • left hip
  • right hip
  • left knee
  • right knee
  • left ankle
  • right ankle

To access each body keypoint separately:

// Get the first pose
Pose pose = poseResult.getPoses().get(0);
// Get the body keypoints
Keypoint[] keypoints = pose.getKeypoints();
// Get the name of the keypoint
String partName = keypoints[0].getPartName();
PointF keypointPoisition = keypoints[0].getPosition();

Configure the predictor

You can configure the predictor with FritzVisionPosePredictorOptions to return specific results that match the options given:

FritzVisionPosePredictorOptions

OptionDefaultDescription
minPartThreshold0.50Minimum confidence score a keypoint must have to be included in a pose.
minPoseThreshold0.20Minimum confidence score a pose must have to be included in result.
maxPosesToDetect1Detect multiple poses in the image.
nmsRadius20Non-maximum suppression (NMS) distance for Part instances. Two parts suppress each other if they are less than nmsRadius pixels away.
PoseSmoothingMethodnullRun pose smoothing between predictions.
FritzVisionPosePredictorOptions options = new FritzVisionPosePredictorOptions();
options.minPoseThreshold = .6f;
predictor = FritzVision.PoseEstimation.getPredictor(onDeviceModel, options);

Improve stability of predictions between frames.

To help improve stability of predictions between frames, set the PoseSmoothingMethod.

FritzVisionPosePredictorOptions posePredictorOptions = new FritzVisionPosePredictorOptions();
posePredictorOptions.smoothingOptions = new OneEuroFilterMethod();

The 1-Euro filter runs in real-time with parameters minCutoff and beta which control the amount of lag and jitter.

Parameters

OptionDefaultDescription
minCutoff.2Minimum frequency cutoff. Lower values will decrease jitter but increase lag.
beta.01Higher values of beta will help reduce lag, but may increase jitter.
derivateCutoff.3Max derivative value allowed. Increasing will allow more sudden movements.

To get a better understanding of how different parameter values affect the results, try out the 1-Euro Filter Demo.

note

Pose smoothing is only applied to single pose estimation (maxPosesToDetect = 1)

Use the record method on the predictor to collect data

The FritzVisionPosePredictor used to make predictions has a record method allowing you to send an image, a model-predicted annotation, and a user-generated annotation back to your Fritz AI account.

FritzVisionPoseResult predictedResults = visionPredictor.predict(visionImage);
// Implement your own custom UX for users to annotate an image and store
// that as a FritzVisionPoseResult.
visionPredictor.record(visionImage, predictedResults, modifiedResults)

Custom Pose Estimation

Custom 2D pose estimation models predict the locations of keypoints on an object of your choosing.

You can train a custom model that is compatible with the Pose Estimation API by using Studio.

If you want to track the position of people in an image or video, see documentation for Human Pose Estimation

Technical Specifications

ArchitectureFormat(s)Model SizeInputOutputBenchmarks
MobileNet backboneCore ML (iOS), TensorFlow Lite (Android)<5 MB260x200-pixel imagePosition of each keypoint detected, Number of objects detected, The confidence associated with each detection", "20 FPS on iPhone X, 10 FPS on Pixel 2

iOS

1. Build the FritzVisionPoseModel

First, include Fritz/Vision in your Podfile.

pod 'Fritz/Vision'
pod install

Next, add the model to your Xcode project. Download your trained pose estimation Core ML model from the Fritz webapp and drag the file into your Xcode project.

This will trigger Xcode to generate the Swift files necessary to interact with your model.

Define a Skeleton

Define a new enum that inherits from the Skeleton class. Each value of the enum will be the name and index of a keypoint matching the output of your model.

Additionally, specify the objectName. This should be the name of the object.

For example, if your pose estimation model predicts the location of each finger tip on a hand, you might define a skeleton as follows:

public enum HandSkeleton: Int, SkeletonType {
public static let objectName = "hand"
case thumb
case index
case middle
case ring
case pinky
}
Model initialization

It's important to intialize one instance of the model so you are not loading the entire model into memory on each model execution. Usually this is a property on a ViewController. When loading the model in a ViewController, the following ways are recommended:

Lazy-load the model

By lazy-loading model, you won't load the model until the first prediction. This has the benefit of not prematurely loading the model, but it may make the first prediction take slghtly longer.

class MyViewController: UIViewController {
lazy var model = FritzVisionHumanPoseModelFast()
}

Load model in viewDidLoad

By loading the model in viewDidLoad, you'll ensure that you're not loading the model before the view controller is loaded. The model will be ready to go for the first prediction.

class MyViewController: UIViewController {
let model: FritzVisionHumanPoseModelFast!
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
model = FritzVisionHumanPoseModelFast()
}
}

Alternatively, you can initialize the model property directly. However, if the ViewController is instantiated by a Storyboard and is the Initial View Controller, the properties will be initialized before the appDelegate function is called. This can cause the app to crash if the model is loaded before FritzCore.configure() is called.

Create a custom FritzVisionPosePredictor<T> class

To leverage the built in pre- and post-processing provided by Fritz AI, create a new subclass for your model that inherits from the FritzVisionPosePredictor<T> base class.

The base class takes the skeleton defined in the previous step as a type.

import Fritz
// Register your model with the Fritz SDK
extension hand_pose_model: SwiftIdentifiedModel {
static let modelIdentifier = "your-model-id"
static let packagedModelVersion = 1
}
// Create the predictor
let handPoseModel = FritzVisionPosePredictor<HandSkeleton>(
model: hand_pose_model()
)

2. Create FritzVisionImage

FritzVisionImage supports different image formats.

Using a CMSampleBuffer

If you are using a CMSampleBuffer from the built-in camera, first create the FritzVisionImage instance:

let image = FritzVisionImage(buffer: sampleBuffer)
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithBuffer: sampleBuffer];
// or
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithImage: uiImage];

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image. By default, if you specify FritzVisionImageMetadata the orientation will be .right:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .left
// Add metdata
visionImage.metadata = [FritzVisionImageMetadata new];
visionImage.metadata.orientation = FritzImageOrientationLeft;
Setting the Orientation from the Camera

Data passed in from the camera will generally need the orientation set. When using a CMSampleBuffer to create a FritzVisionImage the orientation will change depending on which camera and device orientation you are using.

When using the back camera in the portrait Device Orientation, the orientation should be .right (the default if you specify FritzVisionImageMetadata on the image). When using the front facing camera in portrait Device Orientation, the orientation should be .left.

You can initialize the FritzImageOrientation with the AVCaptureConnection to infer orientation (if the Device Orientation is portrait):

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let image = FritzVisionImage(sampleBuffer: sampleBuffer, connection: connection)
...
}

Using an UIImage

If you are using an UIImage, create the FritzVisionImage instance:

let image = FritzVisionImage(image: uiImage)

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .right
Set the image orientation

UIImage can have associated UIImageOrientation data (for example when capturing a photo from the camera). To make sure the model is correctly handling the orientation data, initialize the FritzImageOrientation with the image's image orientation:

image.metadata?.orientation = FritzImageOrientation(image.imageOrientation)

3. Run pose predictions

Before running pose estimation, you can configure the prediction with a FritzVisionPoseModelOptions object.

Settings

OptionDefaultDescription
imageCropAndScaleOption.scaleFitCrop and Scale option for how to resize and crop the image for the model.
minPartThreshold0.50Minimum confidence score a part must have to be included in a pose.
minPoseThreshold0.50Minimum confidence score a pose must have to be included in result.
smoothingOptionsOneEuroPointFilter.lowPose smoothing options for predictions. By default applies light smoothing. Setting this to nil will disable pose smoothing.
nmsRadius20Non-maximum suppression (NMS) distance for Part instances. Two parts suppress each other if they are less than nmsRadius pixels away.

For example, to build a more lenient FritzVisionPoseModelOptions object:

let options = FritzVisionPoseModelOptions()
options.minPartThreshold = 0.3
options.minPoseThreshold = 0.3

Use the poseModel instance you created earlier to run predictions:

guard let poseResult = try? poseModel.predict(image),
let pose = poseResult.pose()
else { return }
// Overlays pose on input image.
let imageWithPose = image.draw(pose)

4. Get information about poses

Once you have a FritzVisionPoseResult object you can either access the pose result directly or overlay poses on the input image.

5. Use the record method on the predictor to collect data

The FritzVisionPosePredictor used to make predictions has a record method allowing you to send an image, a model-predicted annotation, and a user-generated annotation back to your Fritz AI account.

guard let result = try? poseModel.predict(image, options: options),
let pose = result.pose() else {}
// Implement your own custom UX for users to label an image and create a Pose
// object called modifiedPose
poseModel.record(image, predicted: pose, modified: modifiedPose)
Use the pose result directly

You can access the results and all detected keypoints from the FritzVisionPoseResult object. All results are by default normalized from 0 to 1, with (0, 0) in the top left of an up oriented image.

// Created from model prediction.
let poseResult: FritzVisionPoseResult
let pose = poseResult.pose()

Each Pose has a [Keypoint] and a score. Here is an example using the keypoints to detect whether the thumb and index fingers are visible in a Pose predicted via the hand pose model described above.

guard let pose = poseResult.decodePose() else { return }
let fingers: [Keypoint<HandSkeleton>] = [
pose.getKeypoint(for: .thumb),
pose.getKeypoint(for: .index)
].compactMap { $0 }
Overlay pose on input image

You can overlay the pose on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult
let pose = poseResult.pose()
let imageWithPose = image.draw(pose: pose)
Multi-pose estimation
note

Multi-pose estimation allows developers to track multiple people in the same image.

Detect multiple objects

You can access multiple poses and all detected keypoints from the FritzVisionPoseResult object.

// Created from model prediction.
let poseResult: FritzVisionPoseResult<HandSkeleton>
let poses = poseResult.poses(limit: 10)
Overlay poses on input image

You can overlay the poses on the input image to get an idea of how the pose detection performs:

// Created from model prediction.
let poseResult: FritzVisionPoseResult<HandSkeleton>
let poses = poseResult.poses()
let imageWithPose = image.draw(poses: poses)

5. Use the record method on the predictor to collect data

The FritzVisionPosePredictor used to make predictions has a record method allowing you to send an image, a model-predicted annotation, and a user-generated annotation back to your Fritz AI account.

guard let result = try? poseModel.predict(image, options: options),
let pose = result.pose() else {}
// Implement your own custom UX for users to label an image and create a Pose
// object called modifiedPose
poseModel.record(image, predicted: pose, modified: modifiedPose)

Android

Custom 2D pose estimation models predict the locations of keypoints on an object of your choosing.

You can train a custom model that is compatible with the Pose Estimation API by using Studio.

If you want to track the position of people in an image or video, see documentation for Human Pose Estimation

1. Add the dependencies via Gradle

Add our repository in order to download the Vision API:

repositories {
maven { url "https://fritz.mycloudrepo.io/public/repositories/android" }
}

Add renderscript support and include the vision dependency in app/build.gradle. Renderscript is used in order to improve image processing performance. You'll also need to specify aaptOptions in order to prevent compressing TensorFlow Lite models.

android {
defaultConfig {
renderscriptTargetApi 21
renderscriptSupportModeEnabled true
}
// Don't compress included TensorFlow Lite models on build.
aaptOptions {
noCompress "tflite"
}
}
dependencies {
implementation 'ai.fritz:vision:+'
}

2. Add the model to your app as an asset

Add your TFLite model file to your app as an asset. In Android studio, you can drag tflite files directly into the file navigator.

3. Define a Skeleton

Extend the Skeleton class to reflect keypoint outputs of your model. For example, if your pose estimation model predicts the location of each finger tip on a hand, you might define a skeleton as follows:

public class HandSkeleton extends Skeleton {
public static String OBJECT_NAME = "hand";
public static String[] FINGER_NAMES = {
"thumb",
"index",
"middle",
"ring",
"pinky"
};
public HandSkeleton() {
super(OBJECT_NAME, FINGER_NAMES);
}
}

4. Define a PoseOnDeviceModel

Register your model with the Fritz SDK by creating a new PoseOnDeviceModel. In addition to your model file, Fritz Model ID, and skeleton, you'll need to know the output stride of your model as well.

PoseOnDeviceModel onDeviceModel = PoseOnDeviceModel(
"file:///android_asset/hand_pose_model.tflite",
"<your model id>",
modelVersion,
HandSkeleton(),
outputStride
)
}

5. Get a Pose Predictor

In order to use the predictor, the on-device model must first be loaded.

FritzVisionPosePredictor predictor = FritzVision.PoseEstimation.getPredictor(
onDeviceModel
);

6. Create a FritzVisionImage

To create a FritzVisionImage from a Bitmap:

FritzVisionImage visionImage = FritzVisionImage.fromBitmap(bitmap);

To create a FritzVisionImage from a media.Image object when capturing the result from a camera, first determine the orientation of the image. This will rotate the image to account for device rotation and the orientation of the camera sensor.

// Get the system service for the camera manager
final CameraManager manager = (CameraManager) getSystemService(Context.CAMERA_SERVICE);
// Gets the first camera id
String cameraId = manager.getCameraIdList().get(0);
// Determine the rotation on the FritzVisionImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
ImageRotation imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId);

Finally, create the FritzVisionImage object with the rotation

FritzVisionImage visionImage = FritzVisionImage.fromMediaImage(image, imageRotationFromCamera);

7. Run prediction

To detect object poses in FritzVisionImage, run the following:

FritzVisionPoseResult poseResult = predictor.predict(visionImage);

The predict method returns back a FritzVisionPoseResult object that contains the following methods:

FritzVisionPoseResult methods

MethodDescription
List
getPoses()
Gets a list of Pose objects.
List
getPosesByThreshold(float minConfidence)
Gets a list of poses above a given threshold.

8. Access the Pose result

FritzVisionPoseResult contains several convenience methods to help draw the keypoints and position.

Get a bitmap of the pose on the original image
List<Pose> poses = poseResult.getPoses();
Bitmap posesOnImage = visionImage.overlaySkeletons(poses);
Draw the poses onto a canvas
// Draw the pose to the canvas.
List<Pose> poses = poseResult.getPoses();
for (Pose pose : poses) {
pose.draw(canvas);
}
Access the position of specific body keypoints

To access each keypoint separately:

// Get the first pose
Pose pose = poseResult.getPoses().get(0);
// Get the body keypoints
Keypoint[] keypoints = pose.getKeypoints();
// Get the name of the keypoint
String partName = keypoints[0].getPartName();
PointF keypointPoisition = keypoints[0].getPosition();

Use the record method on the predictor to collect data

The FritzVisionPosePredictor used to make predictions has a record method allowing you to send an image, a model-predicted annotation, and a user-generated annotation back to your Fritz AI account.

FritzVisionPoseResult predictedResults = visionPredictor.predict(visionImage);
// Implement your own custom UX for users to annotate an image and store
// that as a FritzVisionPoseResult.
visionPredictor.record(visionImage, predictedResults.toAnnotations(), modifiedResults.toAnnotations())

Rigid Pose Estimation

Create AR experiences using Rigid Pose Estimation to track the position and pose of any object in real-world coordinates.

Rigid Pose Estimation combines neural networks, traditional computer vision techniques, and ARKit/ARCore to estimate an object's position in the real world.

Rigid Pose Estimation

iOS

1. Create a FritzVisionPoseModel

In order to estimate the real-world coordinates of the object, you'll need to create a Custom Pose Model. This will detect the specific keypoints on the object you'd like to track.

2. Create a FritzVisionRigidBodyPoseLifting model

// Local 3D Object keypoints (specific to the object to track)
let object3DPoints: [SCNVector3] = {
let points: [SCNVector3] = [
SCNVector3(-10, 5, 5) / 100.0,
SCNVector3(-10, -5, -5) / 100.0,
SCNVector3(10, 5, 5) / 100.0,
SCNVector3(10, -5, -5) / 100.0,
SCNVector3(0.0, 0.0, 0.0)
].map { $0 }
return points
}()
let liftingModel = FritzVisionRigidBodyPoseLifting(
model: poseModel, modelPoints: object3DPoints
)
Model initialization

It's important to intialize one instance of the model so you are not loading the entire model into memory on each model execution. Usually this is a property on a ViewController. When loading the model in a ViewController, the following ways are recommended:

Lazy-load the model

By lazy-loading model, you won't load the model until the first prediction. This has the benefit of not prematurely loading the model, but it may make the first prediction take slghtly longer.

class MyViewController: UIViewController {
lazy var model = FritzVisionHumanPoseModelFast()
}

Load model in viewDidLoad

By loading the model in viewDidLoad, you'll ensure that you're not loading the model before the view controller is loaded. The model will be ready to go for the first prediction.

class MyViewController: UIViewController {
let model: FritzVisionHumanPoseModelFast!
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
model = FritzVisionHumanPoseModelFast()
}
}

Alternatively, you can initialize the model property directly. However, if the ViewController is instantiated by a Storyboard and is the Initial View Controller, the properties will be initialized before the appDelegate function is called. This can cause the app to crash if the model is loaded before FritzCore.configure() is called.

3. Create a FritzVisionImage

FritzVisionImage supports different image formats.

Using a CMSampleBuffer

If you are using a CMSampleBuffer from the built-in camera, first create the FritzVisionImage instance:

let image = FritzVisionImage(buffer: sampleBuffer)
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithBuffer: sampleBuffer];
// or
FritzVisionImage *visionImage = [[FritzVisionImage alloc] initWithImage: uiImage];

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image. By default, if you specify FritzVisionImageMetadata the orientation will be .right:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .left
// Add metdata
visionImage.metadata = [FritzVisionImageMetadata new];
visionImage.metadata.orientation = FritzImageOrientationLeft;
Setting the Orientation from the Camera

Data passed in from the camera will generally need the orientation set. When using a CMSampleBuffer to create a FritzVisionImage the orientation will change depending on which camera and device orientation you are using.

When using the back camera in the portrait Device Orientation, the orientation should be .right (the default if you specify FritzVisionImageMetadata on the image). When using the front facing camera in portrait Device Orientation, the orientation should be .left.

You can initialize the FritzImageOrientation with the AVCaptureConnection to infer orientation (if the Device Orientation is portrait):

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let image = FritzVisionImage(sampleBuffer: sampleBuffer, connection: connection)
...
}

Using an UIImage

If you are using an UIImage, create the FritzVisionImage instance:

let image = FritzVisionImage(image: uiImage)

The image orientation data needs to be properly set for predictions to work. Use FritzImageMetadata to customize orientation for an image:

image.metadata = FritzVisionImageMetadata()
image.metadata?.orientation = .right
Set the image orientation

UIImage can have associated UIImageOrientation data (for example when capturing a photo from the camera). To make sure the model is correctly handling the orientation data, initialize the FritzImageOrientation with the image's image orientation:

image.metadata?.orientation = FritzImageOrientation(image.imageOrientation)

4. Run 2D predictions to get the location of the object's keypoints

Estimate the object's keypoints on the image passed in.

/// Define pose lifting options (see Advanced Options below)
lazy var liftingOptions: FritzVisionRigidBodyPoseLiftingOptions = {
let options = FritzVisionRigidBodyPoseLiftingOptions()
options.excludedKeypointIndices = [4]
options.requiredKeypointsMeetingThreshold = 3
options.keypointThreshold = 0.7
options.smoothingOptions = PoseSmoothingOptions(
minCutoff: 0.035, beta: 0.004, derivateCutoff: 0.220
)
options.orientationFlipAngleThreshold = 100
return options
}()
guard let (pose, result) = liftingModel.run2DPrediction(
image, options: liftingOptions
)
else { return }

5. Infer the 3D Pose from the 2D Keypoints

Next, use the detected 2D keypoints in order to infer the 3D Pose of the object. This pose can be applied to any 3D-model that you wish to place in AR, matching the location, rotation, and orientation of the detected rigid body.

var sceneView: ARSCNView!
/// ...
let frame = self.sceneView.session.currentFrame
let poseResult = liftingModel.infer3DPose(
pose, image: image, frame: frame, options: liftingOptions
)

6. Place an AR object using ARKit

Place an AR object on top of the detected object keypoints.

// A defined AR Object to place
let SCNNode arObjectNode = ...
let pov = self.sceneView.pointOfView
SCNTransaction.begin()
let rotated = SCNMatrix4Rotate(
poseResult.scenekitCameraTransform, -.pi / 2, 0, 0, 1
)
arObjectNode.transform = pov.convertTransform(rotated, to: nil)
SCNTransaction.commit()

Advanced Options

Configure the FritzVisionRigidPosePredictor

You can configure the predictor with FritzVisionRigidBodyPoseLiftingOptions to return specific results.

FritzVisionRigidBodyPoseLiftingOptions

OptionDefaultDescription
requiredKeypointsMeetingThreshold: Int3Required number of keypoints meeting threshold for valid 2D pose result
keypointThreshold: Double.6Keypoint confidence score minimum needed for a keypoint to count towards 'requiredKeypointsMeetingThreshold'.
excludedKeypointIndices: [Int][]Indices of keypoints to exclude from 3D Pose.
smoothingOptions: PoseSmoothingOptionsNonePose Smoothing Options
orientationFlipAngleThreshold: DoubleNoneAngle at which keypoints are reversed. Helps prevent accidental rotations.
lazy var liftingOptions: FritzVisionRigidBodyPoseLiftingOptions = {
let options = FritzVisionRigidBodyPoseLiftingOptions()
options.excludedKeypointIndices = [4]
options.requiredKeypointsMeetingThreshold = 3
options.keypointThreshold = 0.7
options.smoothingOptions = PoseSmoothingOptions(minCutoff: 0.035, beta: 0.004, derivateCutoff: 0.220)
options.orientationFlipAngleThreshold = 100
return options
}()
Pose and Keypoint Smoothing

To help improve stability of predictions between frames, use the PoseSmoothingOptions class which uses 1-Euro filters.

let options = FritzVisionRigidBodyPoseLiftingOptions()
options.smoothingOptions = PoseSmoothingOptions(minCutoff: 0.035, beta: 0.004, derivateCutoff: 0.220)

“The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag.”

1-Euro filter parameters

OptionDefaultDescription
minCutoff1.0Minimum frequency cutoff. Lower values will decrease jitter but increase lag.
beta0.0Higher values of beta will help reduce lag, but may increase jitter.
derivateCutoff1.0Max derivative value allowed. Increasing will allow more sudden movements.

Android

note

If you haven't set up the SDK yet, make sure to go through those directions first. You'll need to add the Core library to the app before using the specific feature API or custom model. Follow iOS setup or Android setup directions.

For a full list of compatible devices, see here.

1. Add the dependencies via Gradle

dependencies {
implementation 'ai.fritz:vision-opencv:1.0.0'
}

2. Create a RigidPoseOnDeviceModel

In order to estimate the real-world coordinates of the object, you'll need to create a custom Rigid Pose Model.

/**
* A 3D Pose TensorFlow Lite model included in the assets folder of your app.
*
* @param String modelPath: the path to your model file.
* @param String modelId: the model id specified by Fritz AI for the included model.
* @param int version: the version number specified by Fritz AI for the included model.
* @param int inputHeight: the expected input height for the model
* @param int inputWidth: the expected input width for the model
* @param int outputHeight: the expected output height for the model
* @param int outputWidth: the expected output width for the model
* @param int numKeypoints: the number of output 2D keypoints for the model.
* @param List<Point3> object3DPoints: the local, 3D coordinates of the rigid body. This is used to infer the 3D coordinates from the 2D keypoints.
*/
RigidPoseOnDeviceModel onDeviceModel = new RigidPoseOnDeviceModel(
modelPath, modelId, version,
inputHeight, inputWidth,
outputHeight, outputWidth, numKeypoints,
object3DPoints);

3. Get a FritzVisionRigidPosePredictor

In order to use the predictor, the on-device model must first be loaded:

import ai.fritz.visionCV.rigidpose.FritzVisionRigidPosePredictor;
FritzVisionRigidPosePredictor posePredictor = FritzVisionCV.RigidPose.getPredictor(onDeviceModel);

If you did not include the on-device model, you'll have to load the model before you can get a predictor. To do that, you'll use a RigidPoseManagedModel object and call FritzVisionCV.RigidPose.loadPredictor to start the model download.

FritzVisionLabelPredictor predictor;
RigidPoseManagedModel managedModel = new RigidPoseManagedModel(
modelPath, modelId, version,
inputHeight, inputWidth,
outputHeight, outputWidth, numKeypoints,
object3DPoints);
FritzVisionCV.RigidPose.loadPredictor(managedModel, new PredictorStatusListener<FritzVisionRigidPosePredictor>() {
@Override
public void onPredictorReady(FritzVisionRigidPosePredictor posePredictor) {
predictor = posePredictor;
}
});

4. Create a FritzCVImage from an image or a video stream

To create a FritzCVImage from a Bitmap:
FritzCVImage visionImage = FritzCVImage.fromBitmap(bitmap);
To create a FritzCVImage from a media.Image

First determine the orientation of the image. This will rotate the image to account for device rotation and the orientation of the camera sensor.

// Get the system service for the camera manager
final CameraManager manager = (CameraManager) getSystemService(Context.CAMERA_SERVICE);
// Gets the first camera id
String cameraId = manager.getCameraIdList().get(0);
// Determine the rotation on the FritzCVImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
int imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId);

Finally, create the FritzCVImage object with the rotation

FritzCVImage visionImage = FritzCVImage.fromMediaImage(image, imageRotationFromCamera);
To create a FritzCVImage from an OpenCV Mat object
FritzCVImage visionImage = FritzCVImage.fromMatrix(image, imageRotationFromCamera);

Run prediction to get 2D Keypoints.

Next, pass the FritzCVImage into the predictor in order to get information on the 2D keypoints:

RigidPoseResult poseResult = posePredictor.predict(visionImage);

RigidPoseResult methods

MethodDescription
Point[] getKeypoints()Get a list of 2D keypoints detected in the image. Coordinates are relative to the size of the input for the model (224x224)
float[] getScores()Gets a list of confidence scores corresponding to each keypoint. Values from 0 to 1, with 1 being 100% confidence
void drawKeypoints(Mat canvas, Scalar color)Draw keypoints on a provided Mat canvas with the keypoint index. Useful for debugging

5. Infer 3D Pose from RigidPoseResult

Use pose lifting in order to infer the 3D pose com.google.ar.core.Pose from the 2D keypoint coordinates.

// Use the ARCore camera to get the intrinsic matrix.
Camera camera = frame.getCamera();
Mat cameraMatrix = FritzVisionRigidPoseLifting.getCameraIntrinsicMatrix(camera);
MatOfDouble distorsionMatrix = FritzVisionRigidPoseLifting.getDistortionMatrix();
// Create a pose lifting object to infer the 3D Pose
FritzVisionRigidPoseLifting poseLifting = new FritzVisionRigidPoseLifting(onDeviceModel, poseResult);
Pose objectPose = poseLifting.infer3DPose(cameraMatrix, distorsionMatrix);

FritzVisionRigidPoseLifting methods

MethodDescription
Mat getTvec()Get the OpenCV translation vector output
Mat getRvec()Get the OpenCV rotation vector output
infer3DPose(Mat cameraMatrix, MatOfDouble distortionMatrix)Gets an ARCore Pose object to apply to a 3D model

6. Place an AR Object / 3D model in the real world using the inferred Pose

By composing the camera pose and the object's pose, we get the real world coordinates for placing an AR object.

Pose finalPose = camera.getPose().compose(objectPose);
placeARObject(finalPose);

Advanced Options

Configure the FritzVisionRigidPosePredictor

You can configure the predictor with FritzVisionRigidPosePredictorOptions to return specific results.

OptionDefaultDescription
confidenceThreshold.3Minimum confidence threshold for each keypoint
numKeypointsAboveThreshold3Minimum number of keypoints that need to be higher than the confidence threshold
interpreterOptionsnew Interpreter.Options()Interpreter.Options for the TensorFlow Lite model
// Create a predictor with specified options.
FritzVisionRigidPosePredictorOptions options = new FritzVisionRigidPosePredictorOptions();
options.confidenceThreshold = .6f;
options.numKeypointsAboveThreshold = 4;
posePredictor = FritzVisionCV.RigidPose.getPredictor(onDeviceModel, options);
Pose and Keypoint Smoothing

To help improve stability of predictions between frames, use the RigidPoseSmoother class which uses 1-Euro filters.

// Create predictor options
RigidPoseSmoother poseSmoother =
new RigidPoseSmoother(onDeviceModel.getNumKeypoints(), minCutoff, beta, derivativeCutoff);
// Create predictor options with default options
RigidPoseSmoother poseSmoother =
new RigidPoseSmoother(onDeviceModel.getNumKeypoints());
// Smooth 2D Keypoints
RigidPoseResult smoothedResult = poseSmoother.smooth2DKeypoints(poseResult);
// Smooth 3D Pose
Pose smoothedObjectPose = poseSmoother.smoothPose(objectPose);

Pose Smoother Methods

MethodDescription
RigidPoseResult smooth2DKeypoints(RigidPoseResult poseResult)Smooth the result for all keypoints
Pose smoothPose(Pose objectPose)Smooth the object pose (rotation and translation).

“The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag.”

1-Euro filter parameters

OptionDefaultDescription
minCutoff1.0Minimum frequency cutoff. Lower values will decrease jitter but increase lag.
beta0.0Higher values of beta will help reduce lag, but may increase jitter.
derivateCutoff1.0Max derivative value allowed. Increasing will allow more sudden movements.